fault tolerance: All content tagged as fault tolerance in NoSQL databases and polyglot persistence
Sitting comfortably? Check. Popcorn? Check. Let’s press play.
MongoDB lies when it says that a write has succeeded
JR: “[…] Today the default behavior of official MongoDB drivers is Receipt Acknowledged, which means that you wait until the server has processed your write before returning to the client.”
The key word here is today. As clearly explained by Sirer, the behavior he described was in all versions of MongoDB and all the drivers until the 2.2 release and the drivers update in Nov.2012.
Basically the “fire-and-forget” behavior (nb: even this description is not accurate; a better one would be “load-and-forget”) has been the default for almost 3 years. With the 2.2 release and the corresponding drivers update it was changed to “Receipt acknowledged”. But new default acknowledges that the data was received on the server, but not that it was written anywhere. If you want your data to exist on multiple machines or be on a disk you need to use different settings.
Using getLastError slows down write operations.
JR: “GetLastError is underlying command in the MongoDB protocol that is used to implement write concerns. Intuitively, waiting for an operation to complete on the server is slower than not waiting for it.”
The problem here is not that the operation slows down because it has to wait for the acknowledgement. The real problem is that getLastError requires an extra network roundtrip. Not to mention that your old code was probably polluted by all these extra calls.
getLastError doesn’t work pipelined
JR: “[…] For many bulk loads, performing multiple inserts with periodic checks of getLastError is the right choice. […]”
Read the above.
getLastError doesn’t work multi-threaded
JR: ” Threads do not see getLastError responses for other thread’s operations. MongoDB’s getLastError command applies to the previous operation on a connection to the database, and not simply whatever random operation was performed last. […]”
If I’m reading this correctly, it seems like Sirer’ hypothesis was that connections can be shared acrossed threads. I have to agree that many drivers do not provide thread-safe connections. So, linking getLastError behavior to the current connection seems OK.
Write Concerns are broken
JR: “As described in the above sections, WriteConcerns provide a flexible toolset for controlling the durability of write operations applied to the database. You can choose the level of durability you want for individual operations, balanced with the performance of those operations. With the power to specify exactly what you want comes the responsibility to understand exactly what it is you want out of the database.”
Let’s look at what Sirer wrote:
[…] one could use
REPLICAS_SAFEfor the insert operation . There are 13 different concern levels, 8 of which seem to be distinct and presumably the remaining 5 are just kind of there in case you mistype one of the other ones. WriteConcern is at least well-named: it corresponds to “how concerned would you be if we lost your data?” and the potential answers are “not at all!”, “be my guest”, and “well, look like you made an effort, but it’s ok if you drop it.” Specifically, that’s three different kinds of SAFE, but none of them give you what you want: (1) SAFE means acknowledged by one replica but not written to disk, so a node failure can obliterate that data, (2) FSYNC_SAFE means written to a single disk, so a single disk crash can obliterate that data, and (3) REPLICAS_SAFE means it has been written to two replicas, but there is no guarantee that you will be able to retrieve it later.
If you want a different explanation: even if there are 13 different WriteConcern types available, there is none that offers the option of having the data written to disk on more replicas.
The period is over and in my mind the score is clearly 4-1. But I know that this is not a real game and there will be some concluding that I’m not getting it. I’m OK living with that though.
Original title and link: 10gen: MongoDB’s Fault Tolerance Is Not Broken… ( ©myNoSQL)
A bit old and most probably not statistically significant, but I’d say it looks correct in general.
Do you agree? Any other sources containing statistics about hardward failure rates?
Original title and link: Hardware Components Relative Failure Rates Chart ( ©myNoSQL)
Jared Wray enumerates the following 4 rules for High Availability :
- No Single Point of failure
- Self-healing is Required
- It will go down so plan on it
- It is going to cost more: […] The discussion instead should be what downtime is acceptable for the business.
I’m not sure there’s a very specific definition of high availability, but the always correct Wikipedia says:
High availability is a system design approach and associated service implementation that ensures a prearranged level of operational performance will be met during a contractual measurement period.
This got me thinking if self-healing is actually a requirement? Could I translated this into asking: is it possible to control the MTTF? Control in the sense of planning operations that would push MTTF into a range that is not consider to break the SLA.
Jim Gray and Daniel P. Siewiorek wrote in their “High Availability Computer Systems”:
The key concepts and techniques used to build high availability computer systems are (1) modularity, (2) fail-fast modules, (3) independent failure modes, (4) redundancy, and (5) repair. These ideas apply to hardware, to design, and to software. They also apply to tolerating operations faults and environmental faults.
Notice the lack of the “self” part. So is self-healing a requirement of highly available systems?
Original title and link: Four Golden Rules of High Availability. Is Self-Healing a Requirement of Highly Available Systems? ( ©myNoSQL)
Do you remember the 5 lessons Netflix learned while using the Amazon Web Services—judging by how much Netflix shared about their experience in the cloud including Amazon SimpleDB I’d say these 5 are only the tip of the iceberg—where they talked about the Chaos Monkey?
One of the first systems our engineers built in AWS is called the Chaos Monkey. The Chaos Monkey’s job is to randomly kill instances and services within our architecture. If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most – in the event of an unexpected outage.
Hadoop provides a similar framework: Fault Injection Framework :
The idea of fault injection is fairly simple: it is an infusion of errors and exceptions into an application’s logic to achieve a higher coverage and fault tolerance of the system. Different implementations of this idea are available today. Hadoop’s FI framework is built on top of Aspect Oriented Paradigm (AOP) implemented by AspectJ toolkit.
As a sidenote, this is one of the neatest usages of AspectJ I’ve read about.
Update: Abhijit Belapurkar says that Fault injection using AOP was part of Recovery Oriented Computing research at Stanford/UCB many years ago: JAGR: An Autonomous Self-Recovering Application Server.
Original title and link: Hadoop Chaos Monkey: The Fault Injection Framework ( ©myNoSQL)
Detecting failures is a fundamental issue for fault-tolerance in distributed systems. […]
We present a novel abstraction, called accrual failure detectors, that emphasizes flexibility and expressiveness and can serve as a basic building block to implementing failure detectors in distributed systems. Instead of providing information of a boolean nature (trust vs. suspect), accrual failure detectors output a suspicion level on a continuous scale.
The architectural difference between traditional and accrual failure detectors:
Original title and link: Distributed Systems: The Phi Accrual Failure Detector Paper (NoSQL databases © myNoSQL)