From the Wikibon blog infographic about data science and the data scientist:
Data science can be broken down into four essential parts:
- mining data: collecting and formatting the information
- statistics: information analysis
- interpret: representation or visualization
- leverage: implications of the data, application of the data, interaction using the data and predictions formed from studying it
The skills of a data scientist:
- Hacking and Computer Science: knowing how to take advantage of computers and the internet to create data-mining formulas
- Expertise in Mathematics, Statistics, Data Mining: Pulling important statistics and coherently organizing them using mathematic prowess and computer formulas
- Creativity and Insight: Knowing what statistics are important and how to leverage them
Over the years, folks have often asked me what kind of math am I using to create large scale, real-time, context accumulating systems (e.g., NORA). Some fond of Bayesian speculate I am using Bayesian techniques. Some ask if I am using neural networks or heuristics. A math professor said I was doing advanced work in the field of Set Theory.
My answer is always, “I don’t know any math. I didn’t finish high school. But I can explain how it works, step-by-step, and it is really quite simple.”
So data science starts with the passionate interest for the data. Then you are adding tools, processes, algorithms, and science to discover the secrets hidden inside data.
Jeff Jonas: Chief Scientist, IBM Entity Analytics Group and an IBM Distinguished Engineer ↩