NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



data scientist: All content tagged as data scientist in NoSQL databases and polyglot persistence

A Different Big Data Definition and What Data Scientists Are and Do

Dr Rami Mukhtar cited by Divina Paredes reporting for PCAdvisor from Big Data Symposium in Sydney:

Big Data is the opportunity to really collect data sources, both big and small, in their source form, or their raw form, in one location or one place, unencumbered by the boundaries of a business or the boundaries of information silos across the business

Original title and link: A Different Big Data Definition and What Data Scientists Are and Do (NoSQL database©myNoSQL)


Data Scientist’s Anthem

Shamir Karkal:

Data Scientist’s anthem - We R Who We R

Andrei Savu

Original title and link: Data Scientist’s Anthem (NoSQL database©myNoSQL)

Statistical Advances: The Maximal Information Coefficient a New Method to Uncover Hidden Data Relationships

Yakir Reshef (main researcher):

“If you have a data set with 22 million relationships, the 500 relationships in there that you care about are effectively invisible to a human.”

The statistical method that Reshef and his colleagues have devised aims to crack those problems. It can spot many superimposed correlations between variables and measure exactly how tight each relationship is, on the basis of a quantity that the team calls the maximal information coefficient (MIC). The MIC is calculated by plotting data on a graph and looking for all ways of dividing up the graph into blocks or grids that capture the largest possible number of data points. MIC can then be deduced from the grids that do the best job.

The original article, Detecting Novel Associations in Large Data Sets, was published on Science, but is behind a paywall.

Original title and link: Statistical Advances: The Maximal Information Coefficient a New Method to Uncover Hidden Data Relationships (NoSQL database©myNoSQL)


What Makes a Good Data Scientist?

Watch this interview with DJ Patil, formerly LinkedIn chief scientist and now data scientist in residence at Greylock Partners, to find the answer.

Teaser: a passion for really getting to an answer.

The World's 7 Most Powerful Data Scientists According to Tim O’Reilly

Teaser (if you are not a data scientist): besides the ones you’d expect in this list, there’re also a couple of names you haven’t heard of.

Original title and link: The World’s 7 Most Powerful Data Scientists According to Tim O’Reilly (NoSQL database©myNoSQL)


Data Jujitsu and Data Karate

David F. Carr in an article about DJ Patil and his work on Big Data at LinkedIn:

That is what he means by data jujitsu, where jujitsu is the art of using an opponent’s leverage and momentum against him. In data jujitsu, you try to use the scope of the problem to create the solution—without investing disproportionate resources at the early experimental stage. That’s as opposed to data karate, which would be a direct frontal assault to hack your way through the problem.

Original title and link: Data Jujitsu and Data Karate (NoSQL database©myNoSQL)


Data Scientist Summit Videos

After seeing the excerpt from Jonathan Harris’ talk at Data Scientist Summit I really wanted to post a link to some of the videos. But they are all behind a registration gateway. Just in case you want to watch them—there are indeed some interesting titles— you’ll find them here.

Original title and link: Data Scientist Summit Videos (NoSQL database©myNoSQL)

You Need to Hire a Data Geek

What to look for when hiring a data geek—a different name of the now established data scientist role

  • A strong background in computer science is essential. Dealing with information is not easy. The data geek needs to be able to collect the data, which in many cases involves knowing about databases, some networking, and Web programming technologies (XML, HTML, etc.), for a start.
  • Statistics and mathematics are part of the game. Your data geek needs to know statistics inside out and backwards, and the software for manipulating them to develop an analysis.
  • Data visualization is key. You need data visualization tools that are in equal parts useful and appealing. Your data geek should have an eye for graphs, maps, and charts, with a feel for the right dashboards, scorecards, data mashups, or even Excel workbooks—to generate the right mix of information for the right people.
  • A bit of creativity goes a long way. The right data geek will use all the above skills to create new and improve existing ways to increase the return on investment (ROI) of your organization’s BI solutions.

Many different opinions on what data scientists should know and do.

Original title and link: You Need to Hire a Data Geek (NoSQL database©myNoSQL)


Data Scientist and Cloud Architect: The 6 Hottest New Jobs in IT

Infoworld published a non scientific research on the hottest new jobs in IT and Data scientists and Cloud architects made it in the top 6.

About data scientists:

According to Norman Nie, CEO of Revolution Analytics, data science jobs will require workers with a spectrum of skills, from entry-level data cleaners to the high-level statisticians, yielding a range of opportunities for newcomers to the field. As the business world gets increasingly social, the demand for people to plumb the depths of all that social networking clickstream data will only increase. The cliché going around is that “data is the new oil.” A career in refining that raw material sounds like a good bet.

Cloud architects:

In addition to establishing and managing a private cloud infrastructure, Ron Gula, CEO of Tenable Network Security, says cloud architects will increasingly need to be experts in choosing public cloud services. “When you get into the nuances of SLAs, you become less of an IT person and more of a lawyer,” says Gula. The ultimate goal is the hybrid cloud, where cloud architects and business management decide which cloud services make the most sense to run internally and which should be farmed out on a pay-per-use basis.

Original title and link: Data Scientist and Cloud Architect: The 6 Hottest New Jobs in IT (NoSQL database©myNoSQL)


Six Myths About Data Scientist

Ted Cuzzillo (tdwi):

  • Myth #1: Data analysts are geeks. Fact: Analysts are good communicators.
  • Myth #2: Analysis is all about insight. Fact: It’s all about impact.
  • Myth #3: Data analysis is easy. Fact: Data analysis takes time to learn.
  • Myth #4: Statistics is the most important skill. Fact: Business smarts are more important.
  • Myth #5: Analysts work at the “speed of thought.” Fact: Thought is often a slow, non-linear process.
  • Myth #6: Analysts are a rare breed. Fact: We’re all data analysts.

Actually many of these are just additional requirements for a data scientist job.

Original title and link: Six Myths About Data Scientist (NoSQL databases © myNoSQL)


What is Big Data Used for

Philipp Janert [1]:

It falls into one of two camps. The first is reporting. […].

The other camp is what I consider “generalized search.” These are scenarios like: If User A likes movies B, C, and D, what other specific movie might User A want? That’s a form of searching because you’re not actually trying to create a conceptual model of user behavior. You’re comparing individual data points; you’re trying to find the movie that has the greatest similarity to a very specific other set of predefined movies. For this kind of generalized, exhaustive search, you need a lot of data because you look for the individual data points. But that’s not really analysis as I understand it, either.

I guess ☞ Netflix competition was a bit more than generalized search as it required both inductive and deductive research.

[1] Philipp Janert: author of ☞ Data Analysis with Open Source Tools

Original title and link: What is Big Data Used for (NoSQL databases © myNoSQL)