Big Data and the 4 Vs: Volume, Velocity, Variety, Variability
In a recent post Curt Monash disagrees with Forrester’s definition of BigData as the 4 Vs: volume, velocity, variety, and variability:
It actually is reasonable to say that Volume and Velocity of data go together.[…] It also is reasonable to say that Variety and Variability go together; […]
But while we can whittle four concepts down to two, the reduction should stop there. I say this because any of four combinations is possible (and not just in edge cases):
- Data can be both big and poly-structured. For example, consider the classic Hadoop log-collection use case, or the bigger of MarkLogic’s databases, or of Splunk’s, or even the dynamic-schema parts of relational data warehouses built by Zynga and eBay. And yes, also consider some of the NoSQL-based short-request systems Hopkins was surely thinking of as well.
- Data can be both big and simply-structured. I think most of Teradata’s and Vertica’s petabyte-scale installations would fit that description, the partial counterexamples at eBay and Zynga notwithstanding.
- Data can be not-so-big and poly-structured. Consider, for example, a typical user of Intersystems Cache’.
- Data can be not-so-big and simply-structured. Consider, for example, most of the traditional RDBMS world.
To pretend that those four possibilities are only two — “big data” and otherwise — is a travesty.
I completely disagree with Curt Monash’s unification of volume and velocity and respectively variety and variability. All these are orthogonal aspects of data storage, processing, and analysis. As a consequence I cannot agree that the 4 scenarios described above do (or do not) represent Big Data scenarios.
On the other hand, that doesn’t mean I agree with Forrester’s definition of BigData. If taken ad literam, Forrester’s definition would mean that for every 3 V combination there is either no problem or there’s already a viable solution. We’ll need to come up with scenarios for each of these combinations and validate this hypothesis.
Original title and link: Big Data and the 4 Vs: Volume, Velocity, Variety, Variability (©myNoSQL)