Found a reference to these three papers on load balancing in the Cassandra mailing list.
Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems
David R. Karger, Matthias Ruhl:
Load balancing is a critical issue for the efficient oper- ation of peer-to-peer networks. We give two new load- balancing protocols whose provable performance guar- antees are within a constant factor of optimal. Our proto- cols refine the consistent hashing data structure that un- derlies the Chord (and Koorde) P2P network. Both pre- serve Chord’s logarithmic query time and near-optimal data migration cost. Our first protocol balances the distribution of the key address space to nodes, which yields a load-balanced system when the DHT maps items “randomly” into the address space. To our knowledge, this yields the first P2P scheme simultaneously achieving O(logn) degree, O(logn) look-up cost, and constant-factor load balance (previous schemes settled for any two of the three). Our second protocol aims to directly balance the dis- tribution of items among the nodes. This is useful when the distribution of items in the address space cannot be randomized—for example, if we wish to support range- searches on “ordered” keys. We give a simple protocol that balances load by moving nodes to arbitrary locations “where they are needed.” As an application, we use the last protocol to give an optimal implementation of a dis- tributed data structure for range searches on ordered data.
PDF available ☞ here.
Load Balancing in Structured P2P Systems
Ananth Rao, Karthik Lakshminarayanan, Sonesh Surana, Richard Karp, Ion Stoica:
Most P2P systems that provide a DHT abstraction dis- tribute objects among “peer nodes” by choosing random identifiers for the objects. This could result in an O(log N) imbalance. Besides, P2P systems can be highly heteroge- neous, i.e. they may consist of peers that range from old desktops behind modem lines to powerful servers connected to the Internet through high-bandwidth lines. In this paper, we address the problem of load balancing in such P2P sys- tems. We explore the space of designing load-balancing algo- rithms that uses the notion of “virtual servers”. We present three schemes that differ primarily in the amount of infor- mation used to decide how to re-arrange load. Our simu- lation results show that even the simplest scheme is able to balance the load within 80% of the optimal value, while the most complex scheme is able to balance the load within 95% of the optimal value.
PS document available ☞ here.
Simple Load Balancing for Distributed Hash Tables
John Byers, Jeffrey Considine, Michael Mitzenmacher:
Distributed hash tables have recently become a useful building block for a variety of distributed applications. However, current schemes based upon consistent hashing require both considerable implementation complexity and substantial storage overhead to achieve desired load balancing goals. We argue in this paper that these goals can be achieved more simply and more cost-effectively. First, we suggest the direct application of the “power of two choices” paradigm, whereby an item is stored at the less loaded of two (or more) random alternatives. We then consider how associating a small constant number of hash values with a key can naturally be extended to support other load balancing strategies, including load-stealing or load-shedding, as well as providing natural fault-tolerance mechanisms.
PS document available ☞ here.