NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



data science: All content tagged as data science in NoSQL databases and polyglot persistence

The data analytics handbook

A free book based on interviews with data scientists, data analysts, researchers. Available here.

Original title and link: The data analytics handbook (NoSQL database©myNoSQL)

The Open Source Data Science Masters Curriculum

A long list of links, books, and online courses for learning yourself some “data science”(the official project page is here, but I prefer the GitHub page).

Put together by Clare Corhell:

I didn’t want to wait. I wanted to work on things I care about now. Why sleep through grad school lectures tomorrow when you can hack on interesting questions today?

Original title and link: The Open Source Data Science Masters Curriculum (NoSQL database©myNoSQL)

What is the best functional programming language for data science?

An exquisite answer to an “Ask HN” question:

If I were to make my home in one of those languages for some serious data science, I’d do it in Haskell. It’s still rough around the edges, but I feel there’s a better substrate for building more sophisticated things atop it. Clojure may be able to solve your particular problem more quickly, but my experience is that quick things written in Clojure don’t pay out over as long a period as quick things written in Haskell. Further, I think the comparative effort needed to build long-lasting libraries and tools in lower in Haskell.

If I were to just do a quick data science problem, I’d probably use R.

On the other hand, there’s a long conversation about Python vs R in data science.

Original title and link: What is the best functional programming language for data science? (NoSQL database©myNoSQL)


The data science workflow

If you want to make data science look simple, this is the workflow driving a data scientist:

Data science workflow

Taked from Josh Willis’s talk From the Lab to the Factory: Building a production machine learning infrastructure.

Original title and link: The data science workflow (NoSQL database©myNoSQL)

Data Science Wars: Python vs. R

Daniel Gutierrez posted a pretty good summary of the recent discussions about the preferred or most productive or most used data processing environments (R or Python):

While R has traditionally been the programming language of choice for data scientists, some believe it is ceding ground to Python. Here is a short list of some the arguments I’ve heard of late, along with my personal assessment of each…

The summary of a summary is that this conversation can be reduced to familiarity vs highly specialized algorithms1.

  1. While Python can get many of the specialized tools available in R, R has a lot more work to do to become a familiar environment for devs. 

Original title and link: Data Science Wars: Python vs. R (NoSQL database©myNoSQL)


Data Science of the Facebook World

This long post from Stephen Wolfram is a true display of the fascination of data. Even if you’ll get no real data out of it, read it as a lesson on how to play, display, and interpret data.

Original title and link: Data Science of the Facebook World (NoSQL database©myNoSQL)


A Practical Intro to Data Science

Tons of interesting links related to the data science field on Zipfian Academy’s blog post:

There are plenty of articles and discussions on the web about what data science is, what qualities define a data scientist, how to nurture them, and how you should position yourself to be a competitive applicant. There are far fewer resources out there about the steps to take in order to obtain the skills necessary to practice this elusive discipline. Here we will provide a collection of freely accessible materials and content to jumpstart your understanding of the theory and tools of Data Science.

✚ When you think about the data scientist title, you might imagine some very exciting activities. As a reality check, make sure you don’t miss Scaling Big Data mining infrastructure at Twitter which will bring you back to Earth.

Original title and link: A Practical Intro to Data Science (NoSQL database©myNoSQL)


Programmers Need to Learn Statistics

Zed Shaw “style”:

I have a major pet peeve that I need to confess. I go insane when I hear programmers talking about statistics like they know shit when it’s clearly obvious they do not. I’ve been studying it for years and years and still don’t think I know anything. This article is my call for all programmers to finally learn enough about statistics to at least know they don’t know shit. I have no idea why, but their confidence in their lacking knowledge is only surpassed by their lack of confidence in their personal appearance.

I took statistics and probability courses for the last 3 years of my university. But due to my age and also very bad teachers I ended up hating pretty much everything related to these fields. Not to mention that even if I passed all exams (with decent grades) I don’t remember anything. I’m still fighting some of those ghosts.

Original title and link: Programmers Need to Learn Statistics (NoSQL database©myNoSQL)


A Data Scientist's Real Job: Storytelling

Jeff Bladt and Bob Filbin for HBR:

Data gives you the what, but humans know the why.

I thought the process is a bit more different: Humans hypothesize why and data knows how true that is. Am I wrong?

Original title and link: A Data Scientist’s Real Job: Storytelling (NoSQL database©myNoSQL)


The Data Scientist Concept Will Die

Kathryn Kelly for SmartDataCollective:

This is the one that really got people. Companies need solutions that enable them to use and customize their data easily, because it is the whole team, not just the individual analyst, that knows the business best. By offering business users intuitive data solutions, we bypass the need for the data scientist, who works in isolation. In fact, most data scientists are associated with the old school of business intelligence, where systems were so complicated that they needed someone with a data science background to run and get value from them. The new generation of solutions, on the other hand, is making it easy for business users to engage big data. An interdisciplinary team will see and use the visuals provided, and collaborate on the best decisions on a regular basis.

It’s better not to make predictions when you miss the point.

Original title and link: The Data Scientist Concept Will Die (NoSQL database©myNoSQL)


A Different Big Data Definition and What Data Scientists Are and Do

Dr Rami Mukhtar cited by Divina Paredes reporting for PCAdvisor from Big Data Symposium in Sydney:

Big Data is the opportunity to really collect data sources, both big and small, in their source form, or their raw form, in one location or one place, unencumbered by the boundaries of a business or the boundaries of information silos across the business

Original title and link: A Different Big Data Definition and What Data Scientists Are and Do (NoSQL database©myNoSQL)


Data Scientist’s Anthem

Shamir Karkal:

Data Scientist’s anthem - We R Who We R

Andrei Savu

Original title and link: Data Scientist’s Anthem (NoSQL database©myNoSQL)