To Lose or Not to Lose Data… There’s No Question
Jeff Darcy (@Obdurodon) has two great posts (☞ here and ☞ here) about what makes/keep your data safe or not so safe (nb we had a similar, but brief attempt when writing about file system durability). These two posts could easily result in a list of DOs and DONTs to keep around:
DONT’s or How you can lose data
- don’t provide full redundancy at all levels of your system
- be careless about non-battery-backed disk caches
- be careless about data ordering in the kernel
- be careless about your own data ordering
- don’t provide any reasonable way to take a backup
DOs or How to keep your data safe
Make sure that you took care of all the above points. Jeff also presents some approaches for ensuring data protection:
- immutable and/or append-only files, based on log structured filesystem
- copy on write
It’s worth noting also that, especially in a distributed environment, these approaches can be combined. For example, VoldFS itself uses a COW approach but most of the actual or candidate data stores from which it allocates its blocks are themselves more log-oriented. As always it’s horses for courses, and different systems – or even different parts of the same system – might be best served by different approaches. That’s why I thought it was worth describing multiple alternatives and the tradeoffs between them.
You surely can say he’s obsess with data safeness — this is not a bad thing, but rather something all of us should always keep in mind — , as recently we’ve learned from him what is needed to secure data in NoSQL databases.
Original title and link for this post: To Lose or Not to Lose Data… There’s No Question (NoSQL databases © myNoSQL)