NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Characterizing Enterprise Systems using the CAP theorem

When building your next distributed system, you will have to make sure that all subsystems are able to deliver the combination of consistency-availability-partition tolerance that you are looking for.

Taylor’s article is a great start for categorizing according to the CAP theorem some of the (enterprise) systems out there: Terracota, Oracle Coherence, GigaSpaces, but also RDBMS and a couple of NoSQL solutions like Amazon Dynamo, BigTable, Cassandra, CouchDB and Project Voldemort.

Another interesting aspect of the article is that it tries to identify how these systems are coping with the missing CAP dimension. Unfortunately, there are a couple of things in the RDBMS analysis that I do not agree with.

An RDBMS provides availability, but only when there is connectivity between the client accessing the RDBMS and the RDBMS itself.

[…] there are several well-known approaches that can be employed to compensate for the lack of Partition tolerance. One of these approaches is commonly referred to as master/slave replication.

RDBMS are not available by themselves. Leaving aside the connectivity issue, RDBMS can become busy performing complex operations or run out of resources and so they can be unavailable.

What the article identifies as a solution for dealing with partition tolerance, master/slave setups are meant in fact to provide some level of availability. But with master/slave consistency becomes only “eventual consistency”.

The other approach mentioned — sharding — is indeed a solution meant to provide some level of partition tolerance. But without replication it gives up to availability.

As side notes:

  • it was interesting to learn that GigaSpaces can behave as either an CA or AP system, depending on the configurable replication scheme (sync vs async).
  • I am wondering if there are any CP solutions out there. I’d speculate that financial services would probably be required to be CP (if distributed).