cap: All content tagged as cap in NoSQL databases and polyglot persistence
A reminder to those thinking that networks never fail and automation can solve everything. Christina Ilvento, on behalf of the App Engine team:
The root cause of the outage was a combination of two factors during a scheduled network maintenance in one of our datacenters. As part of the scheduled maintenance, network capacity to and from this datacenter was reduced. This alone was expected, and was not a problem. However, this maintenance exposed a previously existing misconfiguration in the system that manages network bandwidth capacity.
Ordinarily, the bandwidth management system helps isolate and prioritize traffic. When capacity is reduced because of maintenance, network failure, or due to an excess of normal traffic, the bandwidth management system keeps things running smoothly by throttling back the rate of low priority traffic. However, as mentioned, the bandwidth management system had a latent misconfiguration which did not show up until capacity was reduced due to the scheduled maintenance. This misconfiguration under-reported the available network capacity to and from the datacenter, causing the network modeler to believe that there was less overall capacity than actually existed.
The configuration error in the bandwidth management system, when combined with an expected reduction in capacity due to the scheduled maintenance, led the system to conclude that there was insufficient bandwidth available for current traffic demand to and from this datacenter. (In reality, there was more than sufficient excess capacity, as otherwise the maintenance would not have been allowed to go forward.) Because of this combination of misconfiguration and scheduled maintenance, a number of services were automatically blocked from sending network traffic. […]
The outage occurred because two independent systems failed at the same time, which resulted in mistakes in our usual escalation procedures which significantly impacted the duration of the outage.
Original title and link: Networks Never Fail ( ©myNoSQL)
- Dynamo (key-value)
- Voldemort (key-value)
- Tokyo Cabinet (key-value)
- KAI (key-value)
- Cassandra (column-oriented/tabular)
- CouchDB (document-oriented)
- SimpleDB (document-oriented)
- Riak (document-oriented)
A couple of clarifications to the list above:
- Dynamo has never been available to the public. On the other hand DynamoDB is not exactly Dynamo
- Tokyo Cabinet is not a distributed database so it shouldn’t be in this list
- CouchDB isn’t a distributed database either, but one could argue that with its peer-to-peer replication it sits right at the border. On the other hand there’s BigCouch.
Original title and link: Which NoSQL Databases Are Robust to Net-Splits? ( ©myNoSQL)
- The cost of strong consistency to Amazon is low, if not zero. To you? 2x.
- If you were to run your own distributed database, you wouldn’t incur this cost (although you’d have to factor in hardware and ops costs).
- Offering a “consistent write” option instead would save you money and latency.
- If Amazon provided SLAs so users knew how well eventual consistency worked, users could make more informed decisions about their app requirements and DynamoDB. However, Amazon probably wouldn’t be able to charge so much for strong consistency.
It is not the first time I’ve heard this discussion, but it is the first time I’ve found it in a detailed form. I have no reasons to defend Amazon’s DynamoDB pricing strategy, but:
- Comparing the costs of operating self hosted with managed highly available distributed databases seems to me to be out of place and cannot lead to a real conclusion.
- While consistent writes could be a solution for always having consistent reads, it would require Amazon to reposition the DynamoDB offer from a highly available database to something else. Considering Amazon has always explained their rationale for building highly available systems I find this difficult to believe it would happen.
Getting back to the consistent vs eventually consistent reads, what one needs to account for is a combination of:
- costs for cross data center access
- costs for maintaining the request capacity SLA
- costs for maintaining the request latency promise
- penalty costs for not meeting the service commitment
I agree thought it’s almost impossible to estimate each of these and decide if they lead or not to the increased consistent read price.
Original title and link: Why DynamoDB Consistent Reads Cost Twice or What’s Wrong With Amazon’s DynamoDB Pricing? ( ©myNoSQL)
Eventual and Strong Consistency, Sloppy and Strict Quorums, and Other Lessons and Thoughts on Distributed Systems
Anything I’d write would just steal from your time to read and think about the email Joseph Blomstedt posted to the Riak list.
Original title and link: Eventual and Strong Consistency, Sloppy and Strict Quorums, and Other Lessons and Thoughts on Distributed Systems ( ©myNoSQL)
Just to expand on this, the “C” in CAP corresponds (roughly) to the “A” and “I” in ACID. Atomicity across multiple nodes requires consensus. According to FLP Impossibility Result (CAP is a very elegant and intuitive re-statement of FLP), consensus is impossible in a network that may drop or deliver packets. Serializable isolation level requires that operations are totally ordered: total ordering on multiple nodes, requires solving the “atomic multicast” problem which is a private instance of the general consensus problem.
In practice, you can achieve consensus across multiple nodes with a reasonable amount of fault tolerance if you are willing to accept high (as in, hundreds of milliseconds) latency bounds. That’s a loss of availability that’s not acceptable to many applications.
This means, that you can’t build a low-latency multi-master system that achieves the “A” and “I” guarantees. Thus, distributed systems that wish to achieve a greater form of consistency typically (Megastore from Google being a notable exception, at the cost of 140ms latency) choose master slave systems (with “floating masters” for fault tolerance). In these systems availability is lost for a short period of time in case the master fails. BigTable (or HBase) is an example of this: (grand simplification follows) when a tablet master (RegionServer in HBase) for a specific token range fails, availability is lost until other nodes take over the “master-less” token range.
These are not binary “on/off” switches: see Yahoo’s PNUTS for a great “middle of the road” system. The paper has an intuitive example explaining the various consistency models.
Note: in a partitioned system, the scope of consistency guarantees (that is, any consistency guarantees: eventual or not) is typically limited to (at best) a single partition of a “table group”/”entity group” (in Microsoft Azure Cloud SQL Server and Google Megastore, respectively), a single partition of a table (usual sharded MySQL setups) or just a single row in a table (BigTable) or document in a document oriented store. Atomic and isolated cross row transactions are impractical on commodity hardware (and are limited even in systems that mandate the use of infiband interconnect and high-performance SSDs).