NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



6 Criteria for Real Column Stores

Michael Stonebraker has published on Vertica blog an article presenting 6 criteria for characterizing the completeness of a column store implementation:

I/O Characteristics

  • IO-1 (basic column store): Every storage block contains data from only ONE column.
  • IO-2: Aggressive compression
  • IO-3: No record-ids

CPU Characteristics

  • CPU-4: A column executor
  • CPU-5: Executor runs on compressed data
  • CPU-6: Executor can process columns that are key sequence or entry sequence

Michael’s post is going after big fishes in the ocean (SybaseIQ, EMC Greenplum, Aster Data, Oracle) and in case this is the area that interests you, you should also check Curt Monash’s follow up.

But getting back to these 6 criteria for column stores, I confess that this time these seem to make a lot of sense. So, I’m wondering how NoSQL column-stores — Cassandra, HBase, and Hypertable — are doing from this perspective. I’d really appreciate some expert comments so we have a follow up with the status of NoSQL column-stores according to these criteria.

Update: Alex Feinberg pointed me to Daniel Abadi’s article that clarifies the distinction between solutions Michael’s post is mentioning and the new NoSQL column stores.

While not remembering exactly this article, I’ve continued to maintain this separation and my post’s intention is to make sure the separation is kept, but also to get experts feedback on the following questions:

  • do any of these criteria apply to NoSQL column stores?
  • if a criterion applies than how NoSQL column stores score at it?
  • if a criterion doesn’t apply, why doesn’t it apply?

Original title and link: 6 Criteria for Real Column Stores (NoSQL databases © myNoSQL)