The MongoDB team post about MongoDB’s durability made some waves last week. While I’d still recommend reading the original post, I’m including below a summary of the most important points from the post:
First, there are many scenarios in which that server loses all its data no matter what.
In the real world, traditional durability often isn’t even done correctly.
Given all this, we’re not saying durability isn’t important, we just think that single server durability isn’t the best way to get true durability. We think the right path to durability is replication (local and remote) and snapshotting. […] We are currently planning many more enhancements to replication to make it better.
Now - there are definitely some cases where single server durability is the best option. It is on our road map, its just not on the short list right now.
I have no intention to judge the decisions MongoDB team made in designing their tool. But I do feel that the above arguments are inaccurate and that MondoDB durability should be seen as a tradeoff for the performance you are getting from it.
Firstly, the fact that there are scenarios in which you can loose all your data, despite every prevention, is not a good reason to remove durability features. The same applies to ignoring a feature because others might not do it right. Both these arguments are kind of childish and I suppose everyone reading the post have already ignored them.
So, I’d like to focus on the important part: “We think the right path to durability is replication”. That’s definitely a more relevant and possibly more valid argument. But there are are couple of aspects that I’d like to cover:
- while the probability of data loss is significantly reduced by using 2 machines instead of 1, you also have to take into account the probability of a network partition. As a side note, even if MongoDB would support replica sets instead of the currently supported replica pairs, the argument would remain valid. I don’t have any statistics about hardware failure rates vs network failure rates to compute the impact on the probability of data loss, but below plot should give you an idea of what I mean:
- with an eventually consistent distributed system there is an uncontrollable window in which data loss may still occur. Remember that for the time being MongoDB’s replication is asynchronous and so, even if replicated, data loss can still occur during the sync window. Keep in mind also that the synchronization speed depends on the reliability of the network.
- there is a cost impact that you should be aware — I think that the cost of a battery backed RAID is much lower than the cost of an additional server
I’d also like to note that as a consequence of MongoDB’s approach to durability, any benchmarks comparing MongoDB to other more durable stores will result in not so accurate results.
Summarizing, while the MongoDB team is working on improving the replication mechanisms:
- psuedo real-time with optional blocking for writes until on multiple servers
- replica sets instead of replica pairs
- easier to create new slaves with large data sets
I still think that everyone should be aware of MongoDB’s approach to durability and accept it as a tradeoff for other MongoDB features.