NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



data science: All content tagged as data science in NoSQL databases and polyglot persistence

The Open Source Data Science Masters Curriculum

A long list of links, books, and online courses for learning yourself some “data science”(the official project page is here, but I prefer the GitHub page).

Put together by Clare Corhell:

I didn’t want to wait. I wanted to work on things I care about now. Why sleep through grad school lectures tomorrow when you can hack on interesting questions today?

Original title and link: The Open Source Data Science Masters Curriculum (NoSQL database©myNoSQL)

What is the best functional programming language for data science?

An exquisite answer to an “Ask HN” question:

If I were to make my home in one of those languages for some serious data science, I’d do it in Haskell. It’s still rough around the edges, but I feel there’s a better substrate for building more sophisticated things atop it. Clojure may be able to solve your particular problem more quickly, but my experience is that quick things written in Clojure don’t pay out over as long a period as quick things written in Haskell. Further, I think the comparative effort needed to build long-lasting libraries and tools in lower in Haskell.

If I were to just do a quick data science problem, I’d probably use R.

On the other hand, there’s a long conversation about Python vs R in data science.

Original title and link: What is the best functional programming language for data science? (NoSQL database©myNoSQL)


The data science workflow

If you want to make data science look simple, this is the workflow driving a data scientist:

Data science workflow

Taked from Josh Willis’s talk From the Lab to the Factory: Building a production machine learning infrastructure.

Original title and link: The data science workflow (NoSQL database©myNoSQL)

Data Science Wars: Python vs. R

Daniel Gutierrez posted a pretty good summary of the recent discussions about the preferred or most productive or most used data processing environments (R or Python):

While R has traditionally been the programming language of choice for data scientists, some believe it is ceding ground to Python. Here is a short list of some the arguments I’ve heard of late, along with my personal assessment of each…

The summary of a summary is that this conversation can be reduced to familiarity vs highly specialized algorithms1.

  1. While Python can get many of the specialized tools available in R, R has a lot more work to do to become a familiar environment for devs. 

Original title and link: Data Science Wars: Python vs. R (NoSQL database©myNoSQL)


Data Science of the Facebook World

This long post from Stephen Wolfram is a true display of the fascination of data. Even if you’ll get no real data out of it, read it as a lesson on how to play, display, and interpret data.

Original title and link: Data Science of the Facebook World (NoSQL database©myNoSQL)


A Practical Intro to Data Science

Tons of interesting links related to the data science field on Zipfian Academy’s blog post:

There are plenty of articles and discussions on the web about what data science is, what qualities define a data scientist, how to nurture them, and how you should position yourself to be a competitive applicant. There are far fewer resources out there about the steps to take in order to obtain the skills necessary to practice this elusive discipline. Here we will provide a collection of freely accessible materials and content to jumpstart your understanding of the theory and tools of Data Science.

✚ When you think about the data scientist title, you might imagine some very exciting activities. As a reality check, make sure you don’t miss Scaling Big Data mining infrastructure at Twitter which will bring you back to Earth.

Original title and link: A Practical Intro to Data Science (NoSQL database©myNoSQL)


Programmers Need to Learn Statistics

Zed Shaw “style”:

I have a major pet peeve that I need to confess. I go insane when I hear programmers talking about statistics like they know shit when it’s clearly obvious they do not. I’ve been studying it for years and years and still don’t think I know anything. This article is my call for all programmers to finally learn enough about statistics to at least know they don’t know shit. I have no idea why, but their confidence in their lacking knowledge is only surpassed by their lack of confidence in their personal appearance.

I took statistics and probability courses for the last 3 years of my university. But due to my age and also very bad teachers I ended up hating pretty much everything related to these fields. Not to mention that even if I passed all exams (with decent grades) I don’t remember anything. I’m still fighting some of those ghosts.

Original title and link: Programmers Need to Learn Statistics (NoSQL database©myNoSQL)


A Data Scientist's Real Job: Storytelling

Jeff Bladt and Bob Filbin for HBR:

Data gives you the what, but humans know the why.

I thought the process is a bit more different: Humans hypothesize why and data knows how true that is. Am I wrong?

Original title and link: A Data Scientist’s Real Job: Storytelling (NoSQL database©myNoSQL)


The Data Scientist Concept Will Die

Kathryn Kelly for SmartDataCollective:

This is the one that really got people. Companies need solutions that enable them to use and customize their data easily, because it is the whole team, not just the individual analyst, that knows the business best. By offering business users intuitive data solutions, we bypass the need for the data scientist, who works in isolation. In fact, most data scientists are associated with the old school of business intelligence, where systems were so complicated that they needed someone with a data science background to run and get value from them. The new generation of solutions, on the other hand, is making it easy for business users to engage big data. An interdisciplinary team will see and use the visuals provided, and collaborate on the best decisions on a regular basis.

It’s better not to make predictions when you miss the point.

Original title and link: The Data Scientist Concept Will Die (NoSQL database©myNoSQL)


A Different Big Data Definition and What Data Scientists Are and Do

Dr Rami Mukhtar cited by Divina Paredes reporting for PCAdvisor from Big Data Symposium in Sydney:

Big Data is the opportunity to really collect data sources, both big and small, in their source form, or their raw form, in one location or one place, unencumbered by the boundaries of a business or the boundaries of information silos across the business

Original title and link: A Different Big Data Definition and What Data Scientists Are and Do (NoSQL database©myNoSQL)


Data Scientist’s Anthem

Shamir Karkal:

Data Scientist’s anthem - We R Who We R

Andrei Savu

Original title and link: Data Scientist’s Anthem (NoSQL database©myNoSQL)

Data Scientists Are Hot

Based on a couple of searches on job sites and an email from a headhunter, GigaOM Barb Darrow concludes that data scientists are in high demand these days:

My client is one of the largest professional services firms in the world and they are looking for very senior data analytics experts who can apply his/her advanced analytics, predictive modeling, and data visualization skills to the fraud/dispute arena.  Exceptional compensation packages are available in the $300,000 to $500,000 range for the appropriate technical and leadership experience.

There’s no denial of the fact that data scientists are hot and Darrow is not the first one writing about it. Hal Varian, Chief Economist at Google, said many years ago: “I keep saying that the sexy job in the next 10 years will be statisticians”. Many others have already agreed that the future belongs to the companies and people that turn data into products. And I remember reading recently about some reports mentioning 150-200,000 jobs in this market in the next couple of years.

On the other hand though, there are various myths about data scientists’ role. Job descriptions will mention many years of experience with Hadoop and Big Data. But even if there are some hints about what makes a good data scientist and how to hire the right data geeks, there’s no alignment on what data science is and what is involved in the role of the data scientist.

This still feels like the early days when requirements and expectations are changing overnight. But these are also the days when most of those involved are having a lot of fun learning and discovering new ways to deal with data and defining the tomorrow.

Original title and link: Data Scientists Are Hot (NoSQL database©myNoSQL)