distributed systems: All content tagged as distributed systems in NoSQL databases and polyglot persistence
A slide deck by Alexander Shraer providing a shorter version of the Dynamic Reconfiguration of Primary/Backup Clusters in Apache ZooKeeper paper.
There are still some scenarios where the proposal algorithm will not work, but I cannot tell how often these will occur:
- Quorum of new ensemble must be in sync
- Another reconfig in progress
- Version condition check fails
One of the most interesting slides in the deck is the one explaining the failure free flow:
A paper by Alexander Shraer, Benjamin Reed, Flavio Junqueira (Yahoo) and Dahlia Malkhi (Microsoft):
Dynamically changing (reconfiguring) the membership of a replicated distributed system while preserving data consistency and system availability is a challenging problem. In this paper, we show that reconfiguration can be simplified by taking advantage of certain properties commonly provided by Primary/Backup systems. We describe a new reconfiguration protocol, recently implemented in Apache Zookeeper. It fully automates configuration changes and minimizes any interruption in service to clients while maintaining data consistency. By leveraging the properties already provided by Zookeeper our protocol is considerably simpler than state of the art.
The corresponding ZooKeeper issue has been created in 2008 and the new protocol should be part of ZooKeeper 3.5.0
Google’s paper about their large-scale distributed systems tracing solution Dapper which inspired Twitter’s Zipkin:
Here we introduce the design of Dapper, Google’s production distributed systems tracing infrastructure, and describe how our design goals of low overhead, application-level transparency, and ubiquitous deployment on a very large scale system were met. Dapper shares conceptual similarities with other tracing systems, particularly Magpie  and X-Trace , but certain design choices were made that have been key to its success in our environment, such as the use of sampling and restricting the instrumentation to a rather small number of common libraries.
Download or read the paper after the break.