NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



consistency: All content tagged as consistency in NoSQL databases and polyglot persistence

Papers on transactions and consistency from SOSP’13

Thanks to Murat Demirbas, I got a link to the SOSP‘13 papers. Maybe is because of my myopic interest, but it seems like quite a few papers this year were focus on the topic of transactions and consistency:

  • Speedy Transactions in Multicore In-Memory Databases by Stephen Tu, Wenting Zheng (MIT), Eddie Kohler (Harvard), Barbara Liskov, Samuel Madden (MIT)
  • From ARIES to MARS: Transaction Support for Next-Generation, Solid-State Drives by Joel Coburn, Trevor Bunker, Meir Schwarz, Rajesh K. Gupta, Steven Swanson (University of California, San Diego)
  • Transaction Chains: Achieving Serializability with Low Latency in Geo-Distributed Storage Systems1 by Yang Zhang, Russell Power, Siyuan Zhou, Yair Sovran (NYU), Marcos K. Aguilera (Microsoft Research), Jinyang Li (NYU)
  • Consistency-Based Service Level Agreement for Cloud Storage by Douglas B. Terry, Vijayan Prabhakaran, Ramakrishna Kotla, Mahesh Balakrishnan, Marcos K. Aguilera (Microsoft Research), Hussam Abu-Libdeh (Cornell University)
  • Everything You Always Wanted to Know about Synchronization but Were Afraid to Ask by Tudor David, Rachid Guerraoui, Vasileios Trigonakis (EPFL)

This SOSP‘13 page has download links for all papers.

  1. There’s an updated version of the paper here

Original title and link: Papers on transactions and consistency from SOSP’13 (NoSQL database©myNoSQL)

Is Eventual Consistency Useful?

As a continuation to The NoSQL Partition Tolerance Myth, Jeff Darcy:

Every once in a while, somebody comes up with the “new” idea that eventually consistent systems (or AP in CAP terminology) are useless. Of course, it’s not really new at all; the SQL RDBMS neanderthals have been making this claim-without-proof ever since NoSQL databases brought other models back into the spotlight. In the usual formulation, banks must have immediate consistency and would never rely on resolving conflicts after the fact … except that they do and have for centuries.

Original title and link: Is Eventual Consistency Useful? (NoSQL database©myNoSQL)


C in CAP != C in ACID

Alex Feinberg explains it again:

Just to expand on this, the “C” in CAP corresponds (roughly) to the “A” and “I” in ACID. Atomicity across multiple nodes requires consensus. According to FLP Impossibility Result (CAP is a very elegant and intuitive re-statement of FLP), consensus is impossible in a network that may drop or deliver packets. Serializable isolation level requires that operations are totally ordered: total ordering on multiple nodes, requires solving the “atomic multicast” problem which is a private instance of the general consensus problem.

In practice, you can achieve consensus across multiple nodes with a reasonable amount of fault tolerance if you are willing to accept high (as in, hundreds of milliseconds) latency bounds. That’s a loss of availability that’s not acceptable to many applications.

This means, that you can’t build a low-latency multi-master system that achieves the “A” and “I” guarantees. Thus, distributed systems that wish to achieve a greater form of consistency typically (Megastore from Google being a notable exception, at the cost of 140ms latency) choose master slave systems (with “floating masters” for fault tolerance). In these systems availability is lost for a short period of time in case the master fails. BigTable (or HBase) is an example of this: (grand simplification follows) when a tablet master (RegionServer in HBase) for a specific token range fails, availability is lost until other nodes take over the “master-less” token range.

These are not binary “on/off” switches: see Yahoo’s PNUTS for a great “middle of the road” system. The paper has an intuitive example explaining the various consistency models.

Note: in a partitioned system, the scope of consistency guarantees (that is, any consistency guarantees: eventual or not) is typically limited to (at best) a single partition of a “table group”/”entity group” (in Microsoft Azure Cloud SQL Server and Google Megastore, respectively), a single partition of a table (usual sharded MySQL setups) or just a single row in a table (BigTable) or document in a document oriented store. Atomic and isolated cross row transactions are impractical on commodity hardware (and are limited even in systems that mandate the use of infiband interconnect and high-performance SSDs).

Alex and Sergio Bossa have previously had an interesting conversation on the topic of consistency from the ACID and CAP perspectives.

Original title and link: C in CAP != C in ACID (NoSQL databases © myNoSQL)

Consistency in the ACID and CAP Perspectives

Following a tweet from Nathan Marz:

The problem with relational databases is that they conflate the notions of data and views

Sergio Bossa and Alex Feinberg had a very interesting exchange about the meaning of consistency in the context of ACID and consistency in CAP theorem perspective.

Alex: @nathanmarz That’s reason for confusion between C in ACID and C in CAP: C in ACID means consistent view of data which can be done w/ quorums

Sergio: @strlen That’s a common misconception: ACID C just means your write operations do not break data constraints. It’s not about the view.

Alex: @sbtourist It also refers to not allowing reads of intermediate states i.e., serializability. W/o a quorum, an EC system could allow such.

Alex: @sbtourist On the other hand, an async system where node B is behind node A is still C in the ACID sense without being C in the CAP sense.

Sergio: @strlen Nope, that’s the isolation level (ACID I). Again, ACID C has a precise meaning and it’s about constraints.

Alex: @sbtourist Yeah, I think you are right: serializability would be “I”, with consensus (strongest form of CAP “C”) being about “A” (atomicity)

Sergio: @strlen That said, I strongly agree with you about ACID C being different than CAP C.

Alex: @sbtourist Yes. Both “consistent” and “atomic” mean diff things in DBs than they do elsewhere in systems (e.g., way that “ln -s” is atomic)

There have been many discussions about the loose definitions of the terms in the CAP theorem. Daniel Abadi exposed an interesting perspective on the subject proposing instead PACELC:

To me, CAP should really be PACELC – if there is a partition (P) how does the system tradeoff between availability and consistency (A and C); else (E) when the system is running as normal in the absence of partitions, how does the system tradeoff between latency (L) and consistency (C)?

Original title and link: Consistency in the ACID and CAP Perspectives (NoSQL databases © myNoSQL)

Dealing With Distributed State

Jeff Darcy[1]:

The general rule to avoid these kinds of unresolvable conflicts is: don’t pass around references to values that might be inconsistent across systems. It’s like passing a pointer from one address space to a process in another; you just shouldn’t expect it to work. Either pass around the actual values or do calculations involving those values and replicate the result.

When dealing with distributed state think about the actor model.

  1. Jeff Darcy: @Obdurodon  

Original title and link: Dealing With Distributed State (NoSQL databases © myNoSQL)