ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Data Science and Data Scientists

Hal Varian[1] said a couple of years ago[2]:

The sexy job in the next ten years will be statisticians.

While Hal Varian’s call it statisticians, others have been using terms like data scientists. But what is data science? O’Reilly has long but very interesting article on this subject:

The web is full of “data-driven apps.” Almost any e-commerce application is a data-driven application. There’s a database behind a web front end, and middleware that talks to a number of other databases and data services (credit card processing companies, banks, and so on). But merely using data isn’t really what we mean by “data science.” A data application acquires its value from the data itself, and creates more data as a result. It’s not just an application with data; it’s a data product. Data science enables the creation of data products.

While reading these articles, a question raised in my ming: is there a way to prepare yourself for being a data scientist? Are there any data scientists secrets? Michael E. Driscoll lists on Dataspora blog seven secrets for successful data scientists:

  1. Choose the right-sized tool
  2. Compress everything: we live in an IO-bound world, where the dominant bottlenecks to data flow are disk read-speed and network bandwidth
  3. Split up your data: “monolithic” is a bad word in software development
  4. Sample your data
  5. Smart borrows, but genius uses open source
  6. Keep your head in the cloud
  7. Don’t be clever: when dealing with big data, embrace standards and use commonly available tools. Most of all, keep it simple, because simplicity scales.

As with every “craft” there’s no simple path but learning the technologies and the tools for the job, and keeping your mind and eyes open.


  1. Hal Varian: Google Chief Economist  ()
  2. The part relevant to BigData:

    The ability to take data - to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it’s going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data. So the complimentary scarce factor is the ability to understand that data and extract value from it.

    I think statisticians are part of it, but it’s just a part. You also want to be able to visualize the data, communicate the data, and utilize it effectively. But I do think those skills - of being able to access, understand, and communicate the insights you get from data analysis - are going to be extremely important. Managers need to be able to access and understand the data themselves.

    The complete interview with Hal Varian can be found ☞ here

Original title and link: Data Science and Data Scientists (NoSQL databases © myNoSQL)