ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

data science: All content tagged as data science in NoSQL databases and polyglot persistence

Data Science of the Facebook World

This long post from Stephen Wolfram is a true display of the fascination of data. Even if you’ll get no real data out of it, read it as a lesson on how to play, display, and interpret data.

Original title and link: Data Science of the Facebook World (NoSQL database©myNoSQL)

via: http://blog.stephenwolfram.com/2013/04/data-science-of-the-facebook-world/


A Practical Intro to Data Science

Tons of interesting links related to the data science field on Zipfian Academy’s blog post:

There are plenty of articles and discussions on the web about what data science is, what qualities define a data scientist, how to nurture them, and how you should position yourself to be a competitive applicant. There are far fewer resources out there about the steps to take in order to obtain the skills necessary to practice this elusive discipline. Here we will provide a collection of freely accessible materials and content to jumpstart your understanding of the theory and tools of Data Science.

✚ When you think about the data scientist title, you might imagine some very exciting activities. As a reality check, make sure you don’t miss Scaling Big Data mining infrastructure at Twitter which will bring you back to Earth.

Original title and link: A Practical Intro to Data Science (NoSQL database©myNoSQL)

via: http://blog.zipfianacademy.com/post/46864003608/a-practical-intro-to-data-science


Programmers Need to Learn Statistics

Zed Shaw “style”:

I have a major pet peeve that I need to confess. I go insane when I hear programmers talking about statistics like they know shit when it’s clearly obvious they do not. I’ve been studying it for years and years and still don’t think I know anything. This article is my call for all programmers to finally learn enough about statistics to at least know they don’t know shit. I have no idea why, but their confidence in their lacking knowledge is only surpassed by their lack of confidence in their personal appearance.

I took statistics and probability courses for the last 3 years of my university. But due to my age and also very bad teachers I ended up hating pretty much everything related to these fields. Not to mention that even if I passed all exams (with decent grades) I don’t remember anything. I’m still fighting some of those ghosts.

Original title and link: Programmers Need to Learn Statistics (NoSQL database©myNoSQL)

via: http://zedshaw.com/essays/programmer_stats.html


A Data Scientist's Real Job: Storytelling

Jeff Bladt and Bob Filbin for HBR:

Data gives you the what, but humans know the why.

I thought the process is a bit more different: Humans hypothesize why and data knows how true that is. Am I wrong?

Original title and link: A Data Scientist’s Real Job: Storytelling (NoSQL database©myNoSQL)

via: http://blogs.hbr.org/cs/2013/03/a_data_scientists_real_job_sto.html


The Data Scientist Concept Will Die

Kathryn Kelly for SmartDataCollective:

This is the one that really got people. Companies need solutions that enable them to use and customize their data easily, because it is the whole team, not just the individual analyst, that knows the business best. By offering business users intuitive data solutions, we bypass the need for the data scientist, who works in isolation. In fact, most data scientists are associated with the old school of business intelligence, where systems were so complicated that they needed someone with a data science background to run and get value from them. The new generation of solutions, on the other hand, is making it easy for business users to engage big data. An interdisciplinary team will see and use the visuals provided, and collaborate on the best decisions on a regular basis.

It’s better not to make predictions when you miss the point.

Original title and link: The Data Scientist Concept Will Die (NoSQL database©myNoSQL)

via: http://smartdatacollective.com/kathryn1723/101841/no-data-scientists-required-big-data-all-about-business-users


A Different Big Data Definition and What Data Scientists Are and Do

Dr Rami Mukhtar cited by Divina Paredes reporting for PCAdvisor from Big Data Symposium in Sydney:

Big Data is the opportunity to really collect data sources, both big and small, in their source form, or their raw form, in one location or one place, unencumbered by the boundaries of a business or the boundaries of information silos across the business

Original title and link: A Different Big Data Definition and What Data Scientists Are and Do (NoSQL database©myNoSQL)

via: http://mobile.pcadvisor.co.uk/news/enterprise/3349560/big-career-shift-big-data/


Data Scientist’s Anthem

Shamir Karkal:

Data Scientist’s anthem - We R Who We R

Andrei Savu

Original title and link: Data Scientist’s Anthem (NoSQL database©myNoSQL)


Data Scientists Are Hot

Based on a couple of searches on job sites and an email from a headhunter, GigaOM Barb Darrow concludes that data scientists are in high demand these days:

My client is one of the largest professional services firms in the world and they are looking for very senior data analytics experts who can apply his/her advanced analytics, predictive modeling, and data visualization skills to the fraud/dispute arena.  Exceptional compensation packages are available in the $300,000 to $500,000 range for the appropriate technical and leadership experience.

There’s no denial of the fact that data scientists are hot and Darrow is not the first one writing about it. Hal Varian, Chief Economist at Google, said many years ago: “I keep saying that the sexy job in the next 10 years will be statisticians”. Many others have already agreed that the future belongs to the companies and people that turn data into products. And I remember reading recently about some reports mentioning 150-200,000 jobs in this market in the next couple of years.

On the other hand though, there are various myths about data scientists’ role. Job descriptions will mention many years of experience with Hadoop and Big Data. But even if there are some hints about what makes a good data scientist and how to hire the right data geeks, there’s no alignment on what data science is and what is involved in the role of the data scientist.

This still feels like the early days when requirements and expectations are changing overnight. But these are also the days when most of those involved are having a lot of fun learning and discovering new ways to deal with data and defining the tomorrow.

Original title and link: Data Scientists Are Hot (NoSQL database©myNoSQL)


SQL or Hadoop: What Tools Should I Use to Process My Data?

Great decision flowchart created by Aaron Cordova to help answer the question: what tools should I use to process my data:

SQL or Hadoop

Click to view full size. Credit Aaron Cordova

Original title and link: SQL or Hadoop: What Tools Should I Use to Process My Data? (NoSQL database©myNoSQL)


Data Science and BI: Similarities and Differences

Data science and BI differ in the foci of their  investigations. DS is consumed with supporting the development of data products. As Monica Rogati of LinkedIn notes, “On one side, I’ve been working on building products … The other side is finding interesting stories in the data.” BI, on the other hand, is all about measuring and managing business performance. At their best, though, both disciplines have an evidenced-based “science of business” foundation that makes me reject the contention by some that data science has a higher calling and is more scientifically sophisticated than BI.

Steve Miller puts the accent on the difference of maturity of the two fields. I’d say the difference in the approaches is even more important.

Original title and link: Data Science and BI: Similarities and Differences (NoSQL database©myNoSQL)

via: http://www.information-management.com/blogs/data-science-BI-database-Hadoop-Enzee-10021757-1.html


Statistical Advances: The Maximal Information Coefficient a New Method to Uncover Hidden Data Relationships

Yakir Reshef (main researcher):

“If you have a data set with 22 million relationships, the 500 relationships in there that you care about are effectively invisible to a human.”

The statistical method that Reshef and his colleagues have devised aims to crack those problems. It can spot many superimposed correlations between variables and measure exactly how tight each relationship is, on the basis of a quantity that the team calls the maximal information coefficient (MIC). The MIC is calculated by plotting data on a graph and looking for all ways of dividing up the graph into blocks or grids that capture the largest possible number of data points. MIC can then be deduced from the grids that do the best job.

The original article, Detecting Novel Associations in Large Data Sets, was published on Science, but is behind a paywall.

Original title and link: Statistical Advances: The Maximal Information Coefficient a New Method to Uncover Hidden Data Relationships (NoSQL database©myNoSQL)

via: http://www.nature.com/news/tangled-relationships-unpicked-1.9660


What Makes a Good Data Scientist?

Watch this interview with DJ Patil, formerly LinkedIn chief scientist and now data scientist in residence at Greylock Partners, to find the answer.

Teaser: a passion for really getting to an answer.