NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Big Data and the 4 Vs: Volume, Velocity, Variety, Variability

In a recent post Curt Monash disagrees with Forrester’s definition of BigData as the 4 Vs: volume, velocity, variety, and variability:

It actually is reasonable to say that Volume and Velocity of data go together.[…] It also is reasonable to say that Variety and Variability go together; […]

But while we can whittle four concepts down to two, the reduction should stop there. I say this because any of four combinations is possible (and not just in edge cases):

  • Data can be both big and poly-structured. For example, consider the classic Hadoop log-collection use case, or the bigger of MarkLogic’s databases, or of Splunk’s, or even the dynamic-schema parts of relational data warehouses built by Zynga and eBay. And yes, also consider some of the NoSQL-based short-request systems Hopkins was surely thinking of as well.
  • Data can be both big and simply-structured. I think most of Teradata’s and Vertica’s petabyte-scale installations would fit that description, the partial counterexamples at eBay and Zynga notwithstanding.
  • Data can be not-so-big and poly-structured. Consider, for example, a typical user of Intersystems Cache’.
  • Data can be not-so-big and simply-structured. Consider, for example, most of the traditional RDBMS world.

To pretend that those four possibilities are only two — “big data” and otherwise — is a travesty.

I completely disagree with Curt Monash’s unification of volume and velocity and respectively variety and variability. All these are orthogonal aspects of data storage, processing, and analysis. As a consequence I cannot agree that the 4 scenarios described above do (or do not) represent Big Data scenarios.

On the other hand, that doesn’t mean I agree with Forrester’s definition of BigData. If taken ad literam, Forrester’s definition would mean that for every 3 V combination there is either no problem or there’s already a viable solution. We’ll need to come up with scenarios for each of these combinations and validate this hypothesis.

Original title and link: Big Data and the 4 Vs: Volume, Velocity, Variety, Variability (NoSQL database©myNoSQL)