NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Cassandra Partitions and Token Selection

There’s a script for that:

Since adding new nodes to a Cassandra cluster is an expensive operation, the challenge is to add nodes in the least disruptive manner possible. This means you want to add nodes in the right places, and then move the existing old nodes with the smallest possible change. (This isn’t strictly speaking true, mostly because of how much the bootstrapping process still sucks, but someday, somewhere over the rainbow, it should be true.) In Cassandra 0.6.x, the anti-compaction process degrades the node you are taking data from the most, and thankfully this is changed in 0.7.

My questions now:

  • is there any advantage of working with these raw numbers instead of say virtual nodes/buckets? Basically, you define the total number of virtual nodes in the cluster and then just assign how many virtual nodes are handled per each physical node
  • is it possible to automate this process?

On that last note, I’m wondering what this operation implies in HBase and Riak cases. Oh, MongoDB and BigCouch too.

Original title and link: Cassandra Partitions and Token Selection (NoSQL databases © myNoSQL)