CAP: All content tagged as CAP in NoSQL databases and polyglot persistence
Monday, 10 October 2011
How to Achieve the High Availability Imperative
Dr. John Busch (founder, Chairman, and CTO of Schooner):
Tightly-coupled database architectures utilizing parallel synchronous replication exploiting commodity multi-core servers can achieve 99.999% availability with full data integrity; unlimited scaling with exceptional performance and high data consistency; and greatly simplified administration including instantaneous, automatic fail-over and on-line scaling and upgrades.
Tightly-coupled architectures and high availability used in the same phrase.
Original title and link: How to Achieve the High Availability Imperative (©myNoSQL)
via: http://gigaom.com/cloud/dr-john-busch-on-high-availability/
Friday, 8 July 2011
Comments on Urban Myths About NoSQL
Dan Weinreb comments on Michael Stonebraker’s Urban Myths about SQL (PDF) :
Dr. Michael Stonebraker recently posted a presentation entitled “Urban Myths about NoSQL”. Its primary point is to defend SQL, i.e. relational, database systems against the claims of the new “NoSQL” data stores. Dr. Stonebraker is one of the original inventors of relational database technology, and has been one of the most eminent database researchers and practitioners for decades.
In fact, Michael Stonebraker bashes everything that is not his current product—this GigaOm interview is the latest example.
For now, I’m filing this away until VoltDB is sold.
Original title and link: Comments on Urban Myths About NoSQL (©myNoSQL)
Tuesday, 5 July 2011
MongoDB Journaling and Replication Interaction
How do we know our data won’t be rolled back? The answer is that a write is truly committed in a replica set when it has been written at a majority of set members. We can confirm this with the getLastError command. For example, if our write has made it to the journal on two out of three total set members, we know the data is committed even if nodes fail in a cascading sequence, and even if a minority of nodes are permanently lost.
Journaling was added in MongoDB 1.8 for crash safetiness and recovery. But the way I read this post about the way MongoDB journaling and replication works makes me think that MongoDB data is not safe without always using getLastError. And this approach is both decreasing MongoDB speed and might lead to unavailability scenarios.
Original title and link: MongoDB Journaling and Replication Interaction (©myNoSQL)
via: http://blog.mongodb.org/post/6254464258/how-journaling-and-replication-interact
Thursday, 12 May 2011
C in CAP != C in ACID
Alex Feinberg explains it again:
Just to expand on this, the “C” in CAP corresponds (roughly) to the “A” and “I” in ACID. Atomicity across multiple nodes requires consensus. According to FLP Impossibility Result (CAP is a very elegant and intuitive re-statement of FLP), consensus is impossible in a network that may drop or deliver packets. Serializable isolation level requires that operations are totally ordered: total ordering on multiple nodes, requires solving the “atomic multicast” problem which is a private instance of the general consensus problem.
In practice, you can achieve consensus across multiple nodes with a reasonable amount of fault tolerance if you are willing to accept high (as in, hundreds of milliseconds) latency bounds. That’s a loss of availability that’s not acceptable to many applications.
This means, that you can’t build a low-latency multi-master system that achieves the “A” and “I” guarantees. Thus, distributed systems that wish to achieve a greater form of consistency typically (Megastore from Google being a notable exception, at the cost of 140ms latency) choose master slave systems (with “floating masters” for fault tolerance). In these systems availability is lost for a short period of time in case the master fails. BigTable (or HBase) is an example of this: (grand simplification follows) when a tablet master (RegionServer in HBase) for a specific token range fails, availability is lost until other nodes take over the “master-less” token range.
These are not binary “on/off” switches: see Yahoo’s PNUTS for a great “middle of the road” system. The paper has an intuitive example explaining the various consistency models.
Note: in a partitioned system, the scope of consistency guarantees (that is, any consistency guarantees: eventual or not) is typically limited to (at best) a single partition of a “table group”/”entity group” (in Microsoft Azure Cloud SQL Server and Google Megastore, respectively), a single partition of a table (usual sharded MySQL setups) or just a single row in a table (BigTable) or document in a document oriented store. Atomic and isolated cross row transactions are impractical on commodity hardware (and are limited even in systems that mandate the use of infiband interconnect and high-performance SSDs).
Alex and Sergio Bossa have previously had an interesting conversation on the topic of consistency from the ACID and CAP perspectives.
Original title and link: C in CAP != C in ACID (NoSQL databases © myNoSQL)
Tuesday, 5 April 2011
Consistency in the ACID and CAP Perspectives
Following a tweet from Nathan Marz:
The problem with relational databases is that they conflate the notions of data and views
Sergio Bossa and Alex Feinberg had a very interesting exchange about the meaning of consistency in the context of ACID and consistency in CAP theorem perspective.
Alex: @nathanmarz That’s reason for confusion between C in ACID and C in CAP: C in ACID means consistent view of data which can be done w/ quorums
Sergio: @strlen That’s a common misconception: ACID C just means your write operations do not break data constraints. It’s not about the view.
Alex: @sbtourist It also refers to not allowing reads of intermediate states i.e., serializability. W/o a quorum, an EC system could allow such.
Alex: @sbtourist On the other hand, an async system where node B is behind node A is still C in the ACID sense without being C in the CAP sense.
Sergio: @strlen Nope, that’s the isolation level (ACID I). Again, ACID C has a precise meaning and it’s about constraints.
Alex: @sbtourist Yeah, I think you are right: serializability would be “I”, with consensus (strongest form of CAP “C”) being about “A” (atomicity)
Sergio: @strlen That said, I strongly agree with you about ACID C being different than CAP C.
Alex: @sbtourist Yes. Both “consistent” and “atomic” mean diff things in DBs than they do elsewhere in systems (e.g., way that “ln -s” is atomic)
There have been many discussions about the loose definitions of the terms in the CAP theorem. Daniel Abadi exposed an interesting perspective on the subject proposing instead PACELC:
To me, CAP should really be PACELC – if there is a partition (P) how does the system tradeoff between availability and consistency (A and C); else (E) when the system is running as normal in the absence of partitions, how does the system tradeoff between latency (L) and consistency (C)?
Original title and link: Consistency in the ACID and CAP Perspectives (NoSQL databases © myNoSQL)
Tuesday, 22 March 2011
How Scalable is VoltDB?
Percona guys[1] have run, analyzed, and concluded about VoltDB scalability:
VoltDB is very scalable; it should scale to 120 partitions, 39 servers, and 1.6 million complex transactions per second at over 300 CPU cores
Considering the definition: “A system whose performance improves after adding hardware, proportionally to the capacity added, is said to be a scalable system.”, the conclusion should be slightly updated:
VoltDB can scale up to 120 partitions on 39 servers with 300 CPU cores and 1.6 million TPS.
Bottom line:
- if you can fit your data into 40 servers’ memory
- you need ACID and SQL
- you are OK precompiled Java based stored procedures
- you don’t need multi data center deployments
now you can estimate how far you can go with VoltDB.
-
The company specialized on MySQL services and behind the MySQL Performance Blog ↩
Original title and link: How Scalable is VoltDB? (NoSQL databases © myNoSQL)
via: http://www.mysqlperformanceblog.com/2011/02/28/is-voltdb-really-as-scalable-as-they-claim/
Sunday, 20 March 2011
Dealing With Distributed State
Jeff Darcy[1]:
The general rule to avoid these kinds of unresolvable conflicts is: don’t pass around references to values that might be inconsistent across systems. It’s like passing a pointer from one address space to a process in another; you just shouldn’t expect it to work. Either pass around the actual values or do calculations involving those values and replicate the result.
When dealing with distributed state think about the actor model.
-
Jeff Darcy: @Obdurodon ↩
Original title and link: Dealing With Distributed State (NoSQL databases © myNoSQL)
Wednesday, 2 March 2011
9 Things to Acknowledge about NoSQL Databases
Excellent list:
- Understand how ACID compares with BASE (Basically Available, Soft-state, Eventually Consistent)
- Understand persistence vs non-persistence, i.e., some NoSQL technologies are entirely in-memory data stores
- Recognize there are entirely different data models from traditional normalized tabular formats: Columnar (Cassandra) vs key/value (Memcached) vs document-oriented (CouchDB) vs graph oriented (Neo4j)
- Be ready to deal with no standard interface like JDBC/ODBC or standarized query language like SQL; every NoSQL tool has a different interface
- Architects: rewire your brain to the fact that web-scale/large-scale NoSQL systems are distributed across dozens to hundreds of servers and networks as opposed to a shared database system
- Get used to the possibly uncomfortable realization that you won’t know where data lives (most of the time)
- Get used to the fact that data may not always be consistent; ‘eventually consistent’ is one of the key elements of the BASE model
- Get used to the fact that data may not always be available
- Understand that some solutions are partition-tolerant and some are not
Print it out and distribute it among your colleagues.
Original title and link: 9 Things to Acknowledge about NoSQL Databases (NoSQL databases © myNoSQL)
via: http://www.evidentsoftware.com/nosql-basics-for-the-rdbms-savvy/
Monday, 21 February 2011
Amazon SimpleDB, Google Megastore & CAP
Nati Shalom (Gigaspaces) pulls out a couple of references from James Hamilton’s posts[1] on Amazon SimpleDB and Google Megastore consistency model concluding:
It is interesting to see that the reality is that even Google and Amazon - which I would consider the extreme cases for big data - realized the limitation behind eventual consistency and came up with models that can deal with scaling without forcing a compromise on consistency as I also noted in one of my recent NoCAP series
But he lefts out small details like these:
Update rates within a entity group are seriously limited by:
- When there is log contention, one wins and the rest fail and must be retried
- Paxos only accepts a very limited update rate (order 10^2 updates per second)
and
Cross entity group updates are supported by:
- two-phase commit with the fragility that it brings
- queueing ans asynchronously applying the changes
Original title and link: Amazon SimpleDB, Google Megastore & CAP (NoSQL databases © myNoSQL)
Tuesday, 18 January 2011
Clustrix Sierra Clustered Database Engine
In the light of publicly announcing customers, I wanted to read a bit about Clustrix Clustered Database Systems.
The company homepage is describing the product:
- scalable database appliaces for Internet-scale work loads
- Linearly Scalable: fully distributed, parallel architecture provides unlimited scale
- SQL functionality: full SQL relational and data consistency (ACID) functionality
- Fault-Tolerant: highly available providing fail-over, recovery, and self-healing
- MySQL Compatible: seamless deployment without application changes.
All these sounded pretty (too) good. And I’ve seen a very similar presentation for Xeround: Elastic, Always-on Storage Engine for MySQL.
So, I’ve continued my reading with the Sierra Clustered Database Engine whitepaper (PDF).
Here are my notes:
- Sierra is composed of:
- database personality module: translates queries into internal representation
- distributed query planner and compiler
- distributed shared-nothing execution engine
- persistent storage
- NVRAM transactional storage for journal changes
- inter-node Infiniband
- queries are decomposed into query fragments which are the unit of work. Query fragments are sent for execution to nodes containing the data.
- query fragments are atomic operations that can:
- insert, read, update data
- execute functions and modify control flow
- perform synchronization
- send data to other nodes
- format output
- query fragments can be executed in parallel
- query fragments can be cached with parameterized constants at the node level
- determining where to sent the query fragments for execution is done using either range-based rules or hash function
- tables are partitioned into slices, each slice having redundancy replicas
- size of slices can be automatically determined or configured
- adding new nodes to the cluster results in rebalancing slices
- slices contained on a failed device are reconstructed using their replicas
- one of the slices is considered primary
- writes go to all replicas and are transactional
- all reads fo the the slice primary
The paper also exemplifies the execution of 4 different queries:
SELECT * FROM T1 WHERE uid=10
SELECT uid, name FROM T1 JOIN T2 on T1.gid = T2.gid WHERE uid=10
SELECT * FROM T1 WHERE uid<100 and gid>10 ORDER BY uid LIMIT 5
INSERT INTO T1 VALUES (10,20)
Questions:
- who is coordinating transactions that may be executed on different nodes?
- who is maintains the topology of the slices? In case of a node failure, you’d need to determine:
- what slices where on the failing node
- where are the replicas for each of these slices
- where new replicas will be created
- when will new replicas become available for writes
- who elects the slice primary?

Original title and link: Clustrix Sierra Clustered Database Engine (NoSQL databases © myNoSQL)
Monday, 17 January 2011
Data Consistency and Synchronization in Scalable NoSQL Solutions
On Xeround blog:
What would be sacrificed if synchronization and transaction mechanisms were added on top of NoSQL foundations? The answer is performance. Distributed and consistent data services will always be slower than pure raw distributed data service.
… and availability.
Original title and link: Data Consistency and Synchronization in Scalable NoSQL Solutions (NoSQL databases © myNoSQL)
via: http://blog.xeround.com/2010/12/bending-the-rules-for-sql
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling