NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



To Lose or Not to Lose Data… There’s No Question

Jeff Darcy (@Obdurodon) has two great posts (☞ here and ☞ here) about what makes/keep your data safe or not so safe (nb we had a similar, but brief attempt when writing about file system durability). These two posts could easily result in a list of DOs and DONTs to keep around:

DONT’s or How you can lose data

  • don’t provide full redundancy at all levels of your system
  • be careless about non-battery-backed disk caches
  • be careless about data ordering in the kernel
  • be careless about your own data ordering
  • don’t provide any reasonable way to take a backup

DOs or How to keep your data safe

Make sure that you took care of all the above points. Jeff also presents some approaches for ensuring data protection:

  • immutable and/or append-only files, based on log structured filesystem
  • copy on write

It’s worth noting also that, especially in a distributed environment, these approaches can be combined. For example, VoldFS itself uses a COW approach but most of the actual or candidate data stores from which it allocates its blocks are themselves more log-oriented. As always it’s horses for courses, and different systems – or even different parts of the same system – might be best served by different approaches. That’s why I thought it was worth describing multiple alternatives and the tradeoffs between them.

You surely can say he’s obsess with data safeness — this is not a bad thing, but rather something all of us should always keep in mind — , as recently we’ve learned from him what is needed to secure data in NoSQL databases.

Original title and link for this post: To Lose or Not to Lose Data… There’s No Question (NoSQL databases © myNoSQL)