NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Why doesn't disk usage immediately decrease when I remove data in Cassandra?

Jonathan Ellis (@spyced) explains the complexity of performing a delete operation in a distributed, eventually consistent system and how Cassandra deals with this operation.

Thus, a delete operation can’t just wipe out all traces of the data being removed immediately […] So, instead of wiping out data on delete, Cassandra replaces it with a special value called a tombstone. The tombstone can then be propagated to replicas that missed the initial remove request. […] Cassandra does what distributed systems designers frequently do when confronted with a problem we don’t know how to solve: define some additional constraints that turn it into one that we do. Here, we defined a constant, GCGraceSeconds, and had each node track tombstone age locally.

The post also includes some details about how Cassandra is dealing with eventual consistency by supporting hinted handoff ☞, read repair ☞ and anti entropy ☞ for reducing the inconsistency window.