Data science: All content tagged as Data science in NoSQL databases and polyglot persistence
My client is one of the largest professional services firms in the world and they are looking for very senior data analytics experts who can apply his/her advanced analytics, predictive modeling, and data visualization skills to the fraud/dispute arena. Exceptional compensation packages are available in the $300,000 to $500,000 range for the appropriate technical and leadership experience.
There’s no denial of the fact that data scientists are hot and Darrow is not the first one writing about it. Hal Varian, Chief Economist at Google, said many years ago: “I keep saying that the sexy job in the next 10 years will be statisticians”. Many others have already agreed that the future belongs to the companies and people that turn data into products. And I remember reading recently about some reports mentioning 150-200,000 jobs in this market in the next couple of years.
On the other hand though, there are various myths about data scientists’ role. Job descriptions will mention many years of experience with Hadoop and Big Data. But even if there are some hints about what makes a good data scientist and how to hire the right data geeks, there’s no alignment on what data science is and what is involved in the role of the data scientist.
This still feels like the early days when requirements and expectations are changing overnight. But these are also the days when most of those involved are having a lot of fun learning and discovering new ways to deal with data and defining the tomorrow.
Original title and link: Data Scientists Are Hot ( ©myNoSQL)
Great decision flowchart created by Aaron Cordova to help answer the question: what tools should I use to process my data:
Original title and link: SQL or Hadoop: What Tools Should I Use to Process My Data? ( ©myNoSQL)
Watch this interview with DJ Patil, formerly LinkedIn chief scientist and now data scientist in residence at Greylock Partners, to find the answer.
Teaser: a passion for really getting to an answer.
After seeing the excerpt from Jonathan Harris’ talk at Data Scientist Summit I really wanted to post a link to some of the videos. But they are all behind a registration gateway. Just in case you want to watch them—there are indeed some interesting titles— you’ll find them here.
From the Wikibon blog infographic about data science and the data scientist:
Data science can be broken down into four essential parts:
- mining data: collecting and formatting the information
- statistics: information analysis
- interpret: representation or visualization
- leverage: implications of the data, application of the data, interaction using the data and predictions formed from studying it
The skills of a data scientist:
- Hacking and Computer Science: knowing how to take advantage of computers and the internet to create data-mining formulas
- Expertise in Mathematics, Statistics, Data Mining: Pulling important statistics and coherently organizing them using mathematic prowess and computer formulas
- Creativity and Insight: Knowing what statistics are important and how to leverage them
Over the years, folks have often asked me what kind of math am I using to create large scale, real-time, context accumulating systems (e.g., NORA). Some fond of Bayesian speculate I am using Bayesian techniques. Some ask if I am using neural networks or heuristics. A math professor said I was doing advanced work in the field of Set Theory.
My answer is always, “I don’t know any math. I didn’t finish high school. But I can explain how it works, step-by-step, and it is really quite simple.”
So data science starts with the passionate interest for the data. Then you are adding tools, processes, algorithms, and science to discover the secrets hidden inside data.