Michael Stonebraker has published on Vertica blog an article presenting 6 criteria for characterizing the completeness of a column store implementation:
- IO-1 (basic column store): Every storage block contains data from only ONE column.
- IO-2: Aggressive compression
- IO-3: No record-ids
- CPU-4: A column executor
- CPU-5: Executor runs on compressed data
- CPU-6: Executor can process columns that are key sequence or entry sequence
Michael’s post is going after big fishes in the ocean (SybaseIQ, EMC Greenplum, Aster Data, Oracle) and in case this is the area that interests you, you should also check Curt Monash’s follow up.
But getting back to these 6 criteria for column stores, I confess that this time these seem to make a lot of sense. So, I’m wondering how NoSQL column-stores — Cassandra, HBase, and Hypertable — are doing from this perspective. I’d really appreciate some expert comments so we have a follow up with the status of NoSQL column-stores according to these criteria.
While not remembering exactly this article, I’ve continued to maintain this separation and my post’s intention is to make sure the separation is kept, but also to get experts feedback on the following questions:
- do any of these criteria apply to NoSQL column stores?
- if a criterion applies than how NoSQL column stores score at it?
- if a criterion doesn’t apply, why doesn’t it apply?