data science: All content tagged as data science in NoSQL databases and polyglot persistence
Tuesday, 30 April 2013
Data Science of the Facebook World
This long post from Stephen Wolfram is a true display of the fascination of data. Even if you’ll get no real data out of it, read it as a lesson on how to play, display, and interpret data.
Original title and link: Data Science of the Facebook World (©myNoSQL)
via: http://blog.stephenwolfram.com/2013/04/data-science-of-the-facebook-world/
Tuesday, 9 April 2013
A Practical Intro to Data Science
Tons of interesting links related to the data science field on Zipfian Academy’s blog post:
There are plenty of articles and discussions on the web about what data science is, what qualities define a data scientist, how to nurture them, and how you should position yourself to be a competitive applicant. There are far fewer resources out there about the steps to take in order to obtain the skills necessary to practice this elusive discipline. Here we will provide a collection of freely accessible materials and content to jumpstart your understanding of the theory and tools of Data Science.
✚ When you think about the data scientist title, you might imagine some very exciting activities. As a reality check, make sure you don’t miss Scaling Big Data mining infrastructure at Twitter which will bring you back to Earth.
Original title and link: A Practical Intro to Data Science (©myNoSQL)
via: http://blog.zipfianacademy.com/post/46864003608/a-practical-intro-to-data-science
Monday, 8 April 2013
Programmers Need to Learn Statistics
Zed Shaw “style”:
I have a major pet peeve that I need to confess. I go insane when I hear programmers talking about statistics like they know shit when it’s clearly obvious they do not. I’ve been studying it for years and years and still don’t think I know anything. This article is my call for all programmers to finally learn enough about statistics to at least know they don’t know shit. I have no idea why, but their confidence in their lacking knowledge is only surpassed by their lack of confidence in their personal appearance.
I took statistics and probability courses for the last 3 years of my university. But due to my age and also very bad teachers I ended up hating pretty much everything related to these fields. Not to mention that even if I passed all exams (with decent grades) I don’t remember anything. I’m still fighting some of those ghosts.
Original title and link: Programmers Need to Learn Statistics (©myNoSQL)
Wednesday, 3 April 2013
A Data Scientist's Real Job: Storytelling
Jeff Bladt and Bob Filbin for HBR:
Data gives you the what, but humans know the why.
I thought the process is a bit more different: Humans hypothesize why and data knows how true that is. Am I wrong?
Original title and link: A Data Scientist’s Real Job: Storytelling (©myNoSQL)
via: http://blogs.hbr.org/cs/2013/03/a_data_scientists_real_job_sto.html
Tuesday, 2 April 2013
The Data Scientist Concept Will Die
Kathryn Kelly for SmartDataCollective:
This is the one that really got people. Companies need solutions that enable them to use and customize their data easily, because it is the whole team, not just the individual analyst, that knows the business best. By offering business users intuitive data solutions, we bypass the need for the data scientist, who works in isolation. In fact, most data scientists are associated with the old school of business intelligence, where systems were so complicated that they needed someone with a data science background to run and get value from them. The new generation of solutions, on the other hand, is making it easy for business users to engage big data. An interdisciplinary team will see and use the visuals provided, and collaborate on the best decisions on a regular basis.
It’s better not to make predictions when you miss the point.
Original title and link: The Data Scientist Concept Will Die (©myNoSQL)
Friday, 6 April 2012
A Different Big Data Definition and What Data Scientists Are and Do
Dr Rami Mukhtar cited by Divina Paredes reporting for PCAdvisor from Big Data Symposium in Sydney:
Big Data is the opportunity to really collect data sources, both big and small, in their source form, or their raw form, in one location or one place, unencumbered by the boundaries of a business or the boundaries of information silos across the business
Original title and link: A Different Big Data Definition and What Data Scientists Are and Do (©myNoSQL)
via: http://mobile.pcadvisor.co.uk/news/enterprise/3349560/big-career-shift-big-data/
Thursday, 23 February 2012
Data Scientist’s Anthem
Data Scientist’s anthem - We R Who We R
Original title and link: Data Scientist’s Anthem (©myNoSQL)
Monday, 20 February 2012
Data Scientists Are Hot
Based on a couple of searches on job sites and an email from a headhunter, GigaOM Barb Darrow concludes that data scientists are in high demand these days:
My client is one of the largest professional services firms in the world and they are looking for very senior data analytics experts who can apply his/her advanced analytics, predictive modeling, and data visualization skills to the fraud/dispute arena. Exceptional compensation packages are available in the $300,000 to $500,000 range for the appropriate technical and leadership experience.
There’s no denial of the fact that data scientists are hot and Darrow is not the first one writing about it. Hal Varian, Chief Economist at Google, said many years ago: “I keep saying that the sexy job in the next 10 years will be statisticians”. Many others have already agreed that the future belongs to the companies and people that turn data into products. And I remember reading recently about some reports mentioning 150-200,000 jobs in this market in the next couple of years.
On the other hand though, there are various myths about data scientists’ role. Job descriptions will mention many years of experience with Hadoop and Big Data. But even if there are some hints about what makes a good data scientist and how to hire the right data geeks, there’s no alignment on what data science is and what is involved in the role of the data scientist.
This still feels like the early days when requirements and expectations are changing overnight. But these are also the days when most of those involved are having a lot of fun learning and discovering new ways to deal with data and defining the tomorrow.
Original title and link: Data Scientists Are Hot (©myNoSQL)
Monday, 16 January 2012
SQL or Hadoop: What Tools Should I Use to Process My Data?
Great decision flowchart created by Aaron Cordova to help answer the question: what tools should I use to process my data:

Click to view full size. Credit Aaron Cordova
Original title and link: SQL or Hadoop: What Tools Should I Use to Process My Data? (©myNoSQL)
Wednesday, 11 January 2012
Data Science and BI: Similarities and Differences
Data science and BI differ in the foci of their investigations. DS is consumed with supporting the development of data products. As Monica Rogati of LinkedIn notes, “On one side, I’ve been working on building products … The other side is finding interesting stories in the data.” BI, on the other hand, is all about measuring and managing business performance. At their best, though, both disciplines have an evidenced-based “science of business” foundation that makes me reject the contention by some that data science has a higher calling and is more scientifically sophisticated than BI.
Steve Miller puts the accent on the difference of maturity of the two fields. I’d say the difference in the approaches is even more important.
Original title and link: Data Science and BI: Similarities and Differences (©myNoSQL)
Wednesday, 21 December 2011
Statistical Advances: The Maximal Information Coefficient a New Method to Uncover Hidden Data Relationships
Yakir Reshef (main researcher):
“If you have a data set with 22 million relationships, the 500 relationships in there that you care about are effectively invisible to a human.”
The statistical method that Reshef and his colleagues have devised aims to crack those problems. It can spot many superimposed correlations between variables and measure exactly how tight each relationship is, on the basis of a quantity that the team calls the maximal information coefficient (MIC). The MIC is calculated by plotting data on a graph and looking for all ways of dividing up the graph into blocks or grids that capture the largest possible number of data points. MIC can then be deduced from the grids that do the best job.
The original article, Detecting Novel Associations in Large Data Sets, was published on Science, but is behind a paywall.
Original title and link: Statistical Advances: The Maximal Information Coefficient a New Method to Uncover Hidden Data Relationships (©myNoSQL)
via: http://www.nature.com/news/tangled-relationships-unpicked-1.9660
Friday, 25 November 2011
What Makes a Good Data Scientist?
Watch this interview with DJ Patil, formerly LinkedIn chief scientist and now data scientist in residence at Greylock Partners, to find the answer.
Teaser: a passion for really getting to an answer.
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling