There’s a script for that:
Since adding new nodes to a Cassandra cluster is an expensive operation, the challenge is to add nodes in the least disruptive manner possible. This means you want to add nodes in the right places, and then move the existing old nodes with the smallest possible change. (This isn’t strictly speaking true, mostly because of how much the bootstrapping process still sucks, but someday, somewhere over the rainbow, it should be true.) In Cassandra 0.6.x, the anti-compaction process degrades the node you are taking data from the most, and thankfully this is changed in 0.7.
My questions now:
- is there any advantage of working with these raw numbers instead of say virtual nodes/buckets? Basically, you define the total number of virtual nodes in the cluster and then just assign how many virtual nodes are handled per each physical node
- is it possible to automate this process?
On that last note, I’m wondering what this operation implies in HBase and Riak cases. Oh, MongoDB and BigCouch too.
Original title and link: Cassandra Partitions and Token Selection (NoSQL databases © myNoSQL)