NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



fault tolerance: All content tagged as fault tolerance in NoSQL databases and polyglot persistence

MongoDB Is Still Broken by Design 5-0

My score after the first period was 4-1. But Emin Gün Sirer contested the 1 in the follow up post to 10gen’s reply:

Until recently, MongoDB did not talk about requestStart() and requestDone() in any context except when talking about how to ensure a very weak consistency requirement. Namely, if you don’t use this pair of operations, then a write to the database followed by a read from the database, by the same client, can return old values. So, I write 42 for key k with a WriteConcern.SAFE, read key k, and get some other number, because the Mongo driver can, by default, very well send the first request to one node over one connection, and the second one to another, over another connection. So requestStart() and requestDone() were billed as a mechanism to avoid that scenario; I saw no mention that they were required for correctness in multithreaded settings. I bet there is plenty of multithreaded code that does not follow that pattern. Such code is broken; if you’re a Mongo user, it’d be a good idea to check if you ever use getLastError without a bracketing requestStart() and Done().


Original title and link: MongoDB Is Still Broken by Design 5-0 (NoSQL database©myNoSQL)


10gen: MongoDB’s Fault Tolerance Is Not Broken…

Sitting comfortably? Check. Popcorn? Check. Let’s press play.

In an interview for InfoQ, 10gen’s Jared Rosoff replies to Emin Gün Sirer: “Broken by Design: MongoDB Fault Tolerance“.

  1. MongoDB lies when it says that a write has succeeded

    JR: “[…] Today the default behavior of official MongoDB drivers is Receipt Acknowledged, which means that you wait until the server has processed your write before returning to the client.”

    The key word here is today. As clearly explained by Sirer, the behavior he described was in all versions of MongoDB and all the drivers until the 2.2 release and the drivers update in Nov.2012.

    Basically the “fire-and-forget” behavior (nb: even this description is not accurate; a better one would be “load-and-forget”) has been the default for almost 3 years. With the 2.2 release and the corresponding drivers update it was changed to “Receipt acknowledged”. But new default acknowledges that the data was received on the server, but not that it was written anywhere. If you want your data to exist on multiple machines or be on a disk you need to use different settings.

  2. Using getLastError slows down write operations.

    JR: “GetLastError is underlying command in the MongoDB protocol that is used to implement write concerns. Intuitively, waiting for an operation to complete on the server is slower than not waiting for it.”

    The problem here is not that the operation slows down because it has to wait for the acknowledgement. The real problem is that getLastError requires an extra network roundtrip. Not to mention that your old code was probably polluted by all these extra calls.

  3. getLastError doesn’t work pipelined

    JR: “[…] For many bulk loads, performing multiple inserts with periodic checks of getLastError is the right choice. […]”

    Read the above.

  4. getLastError doesn’t work multi-threaded

    JR: ” Threads do not see getLastError responses for other thread’s operations. MongoDB’s getLastError command applies to the previous operation on a connection to the database, and not simply whatever random operation was performed last. […]”

    If I’m reading this correctly, it seems like Sirer’ hypothesis was that connections can be shared acrossed threads. I have to agree that many drivers do not provide thread-safe connections. So, linking getLastError behavior to the current connection seems OK.

  5. Write Concerns are broken

    JR: “As described in the above sections, WriteConcerns provide a flexible toolset for controlling the durability of write operations applied to the database. You can choose the level of durability you want for individual operations, balanced with the performance of those operations. With the power to specify exactly what you want comes the responsibility to understand exactly what it is you want out of the database.”

    Let’s look at what Sirer wrote:

    […] one could use WriteConcern.SAFE, FSYNC_SAFE or REPLICAS_SAFE for the insert operation [2]. There are 13 different concern levels, 8 of which seem to be distinct and presumably the remaining 5 are just kind of there in case you mistype one of the other ones. WriteConcern is at least well-named: it corresponds to “how concerned would you be if we lost your data?” and the potential answers are “not at all!”, “be my guest”, and “well, look like you made an effort, but it’s ok if you drop it.” Specifically, that’s three different kinds of SAFE, but none of them give you what you want: (1) SAFE means acknowledged by one replica but not written to disk, so a node failure can obliterate that data, (2) FSYNC_SAFE means written to a single disk, so a single disk crash can obliterate that data, and (3) REPLICAS_SAFE means it has been written to two replicas, but there is no guarantee that you will be able to retrieve it later.

    If you want a different explanation: even if there are 13 different WriteConcern types available, there is none that offers the option of having the data written to disk on more replicas.

The period is over and in my mind the score is clearly 4-1. But I know that this is not a real game and there will be some concluding that I’m not getting it. I’m OK living with that though.

Original title and link: 10gen: MongoDB’s Fault Tolerance Is Not Broken… (NoSQL database©myNoSQL)

When Data Is Worthless - Give MongoDB What Is MongoDB's

Emin Gün Sirer concluding a follow up post to MongoDB fault tolerance is broken by design:

So let us give onto Mongo what is clearly its: it’s mature software with a large install base. If it loses data, it’ll likely do so because of a deliberate design decision, rather than a bug. It’s easy to find Mongo hosting and it’s relatively easy to find people who are experienced with it. So if all your data is really of equal and low value, and you can afford to lose some of it, and your app’s needs are unlikely to grow, then MongoDB can be a fine pick for your application.

Looks like we mostly agree.

Original title and link: When Data Is Worthless - Give MongoDB What Is MongoDB’s (NoSQL database©myNoSQL)


Hardware Components Relative Failure Rates Chart

A bit old and most probably not statistically significant, but I’d say it looks correct in general.


Credit Alex Gorbachev, Pythian

Do you agree? Any other sources containing statistics about hardward failure rates?

Original title and link: Hardware Components Relative Failure Rates Chart (NoSQL database©myNoSQL)

Four Golden Rules of High Availability. Is Self-Healing a Requirement of Highly Available Systems?

Jared Wray enumerates the following 4 rules for High Availability :

  • No Single Point of failure
  • Self-healing is Required
  • It will go down so plan on it
  • It is going to cost more: […] The discussion instead should be what downtime is acceptable for the business.

I’m not sure there’s a very specific definition of high availability, but the always correct Wikipedia says:

High availability is a system design approach and associated service implementation that ensures a prearranged level of operational performance will be met during a contractual measurement period.

This got me thinking if self-healing is actually a requirement? Could I translated this into asking: is it possible to control the MTTF? Control in the sense of planning operations that would push MTTF into a range that is not consider to break the SLA.

Jim Gray and Daniel P. Siewiorek wrote in their “High Availability Computer Systems”:

The key concepts and techniques used to build high availability computer systems are (1) modularity, (2) fail-fast modules, (3) independent failure modes, (4) redundancy, and (5) repair. These ideas apply to hardware, to design, and to software. They also apply to tolerating operations faults and environmental faults.

Notice the lack of the “self” part. So is self-healing a requirement of highly available systems?

Original title and link: Four Golden Rules of High Availability. Is Self-Healing a Requirement of Highly Available Systems? (NoSQL database©myNoSQL)

Hadoop Chaos Monkey: The Fault Injection Framework

Do you remember the 5 lessons Netflix learned while using the Amazon Web Services—judging by how much Netflix shared about their experience in the cloud including Amazon SimpleDB I’d say these 5 are only the tip of the iceberg—where they talked about the Chaos Monkey?

One of the first systems our engineers built in AWS is called the Chaos Monkey. The Chaos Monkey’s job is to randomly kill instances and services within our architecture. If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most – in the event of an unexpected outage.

Hadoop provides a similar framework: Fault Injection Framework :

The idea of fault injection is fairly simple: it is an infusion of errors and exceptions into an application’s logic to achieve a higher coverage and fault tolerance of the system. Different implementations of this idea are available today. Hadoop’s FI framework is built on top of Aspect Oriented Paradigm (AOP) implemented by AspectJ toolkit.

As a sidenote, this is one of the neatest usages of AspectJ I’ve read about.

Update: Abhijit Belapurkar says that Fault injection using AOP was part of Recovery Oriented Computing research at Stanford/UCB many years ago: JAGR: An Autonomous Self-Recovering Application Server.

Original title and link: Hadoop Chaos Monkey: The Fault Injection Framework (NoSQL database©myNoSQL)

Distributed Systems: The Phi Accrual Failure Detector Paper


Detecting failures is a fundamental issue for fault-tolerance in distributed systems. […]

We present a novel abstraction, called accrual failure detectors, that emphasizes flexibility and expressiveness and can serve as a basic building block to implementing failure detectors in distributed systems. Instead of providing information of a boolean nature (trust vs. suspect), accrual failure detectors output a suspicion level on a continuous scale.

The architectural difference between traditional and accrual failure detectors:

Traditional and Accrual Failure Detectors

Credit Naohiro Hayashibara, Xavier Défago Rami Yared, and Takuya Katayama

Original title and link: Distributed Systems: The Phi Accrual Failure Detector Paper (NoSQL databases © myNoSQL)