ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

The future belongs to the companies and people that turn data into products

An article on the next generation apps built on top of data intelligence, talking also about the NoSQL space and big data processing.

Why do we suddenly care about statistics and about data?

In this post, I examine the many sides of data science — the technologies, the companies and the unique skill sets.

An (attempt) to summarize the core ideas:

  • I keep saying that the sexy job in the next 10 years will be statisticians.

    ☞ Hal Varian, Chief Economist at Google

  • Data is the next Intel Inside

    — Tim O’Reilly

  • user generated data does contain intelligence. It is just a matter of us making sense of it

  • data comes from everywhere and various formats
  • Google, Amazon, Facebook, LinkedIn, etc. are the first doing it in different areas
  • Most of the organizations that have built data platforms have found it necessary to go beyond the relational database model. Traditional relational database systems stop being effective at this scale. Managing sharding and replication across a horde of database servers is difficult and slow. The need to define a schema in advance conflicts with reality of multiple, unstructured data sources, in which you may not know what’s important until after you’ve analyzed the data.

    Simply put this is about complexity: the new dimension of scalability and operational costs as seen in Twitter migrating to Cassandra.

  • Storing data is only part of building a data platform, though. Data is only useful if you can do something with it, and enormous datasets present computational problems.

    We are following closely Hadoop, Pig, Hive, and Cascalog, but also new approaches for a common NoSQL query language like Toad for Cloud as alternatives to put NoSQL data to work.

via: http://radar.oreilly.com/2010/06/what-is-data-science.html