An article on the next generation apps built on top of data intelligence, talking also about the NoSQL space and big data processing.
Why do we suddenly care about statistics and about data?
In this post, I examine the many sides of data science — the technologies, the companies and the unique skill sets.
An (attempt) to summarize the core ideas:
I keep saying that the sexy job in the next 10 years will be statisticians.
— ☞ Hal Varian, Chief Economist at Google
Data is the next Intel Inside
— Tim O’Reilly
user generated data does contain intelligence. It is just a matter of us making sense of it
- data comes from everywhere and various formats
- Google, Amazon, Facebook, LinkedIn, etc. are the first doing it in different areas
Most of the organizations that have built data platforms have found it necessary to go beyond the relational database model. Traditional relational database systems stop being effective at this scale. Managing sharding and replication across a horde of database servers is difficult and slow. The need to define a schema in advance conflicts with reality of multiple, unstructured data sources, in which you may not know what’s important until after you’ve analyzed the data.
Simply put this is about complexity: the new dimension of scalability and operational costs as seen in Twitter migrating to Cassandra.
Storing data is only part of building a data platform, though. Data is only useful if you can do something with it, and enormous datasets present computational problems.
We are following closely Hadoop, Pig, Hive, and Cascalog, but also new approaches for a common NoSQL query language like Toad for Cloud as alternatives to put NoSQL data to work.