Last night, Mark Phillips sent me the following message:
Yesterday a Riak community member, Joseph Blumstedt released a pair of repos: riak_zab and riak_zab_example. In short, riak_zab is an extension for riak_core that provides totally ordered atomic broadcast capabilities.
If your first reaction was something along the line “so what?”, then you are not alone. Anyway I hope I’ve done my homework and now I understand why this is both interesting and important.
ZooKeeper is a coordination service created by Yahoo! to allow large scale applications to perform coordination tasks such as leader election, status propagation, and rendezvous. Embedded into ZooKeeper is a totally ordered broadcast protocol: Zab.
ZooKeeper makes the following requirements on the Zab broadcast protocol:
- Reliable delivery
- If a message, m, is delivered by one server, then it will be eventually delivered by all correct servers.
- Total order
- If message a is delivered before message b by one server, then every server that delivers a and b delivers a before b.
- Causal order
- If message a causally precedes message b and both messages are delivered, then a must be ordered before b.
- Prefix property:
- If m is the last message delivered for a leader L, any message proposed before m by L must also be delivered
Putting it together:
riak_zab is an extension for riak_core that provides totally ordered atomic broadcast capabilities. This is accomplished through a pure Erlang implementation of Zab, the Zookeeper Atomic Broadcast protocol invented by Yahoo! Research.
Basically this means that many operations that were typically hard to do well using the Dynamo eventual consistency model which Riak implements with riak_core are now theoretically possible. In case you’d like to read more about riak_core I’d suggest these posts Building blocks of Dynamo-like distributed systems, riak_core: building distributed applications without shared state, and Where to start with riak_core .
Indeed there are other Dynamo aspects like synchronization, scaling, ring rebalancing, handoff that riak_zab had to address and its approach is documented on the project page
Technically, riak_zab isn’t focused on message delivery. Rather, riak_zab provides consistent replicated state.
So why is this important? While parts of an application can deal with eventual consistency, there are typically a few aspects of an app where stronger consistency is needed for various reasons. A few such examples:
- Distributed Counters — just ask Digg people how they implemented distributed counters in Cassandra or check how Twitter built a ZooKeeper and Cassandra based solution for realtime analytics .
- Ordered Messaging
- Compare-and-Swap Operations
- Set Operations
- Sub-document updates
- Multi-key transactions
riak_zab will theoretically allow users to reap the availability of the standard eventually consistent model Riak provides and opt in to stronger consistency for the parts of the app that require it. In other words, it expands the number of CAP trade offs that a developer can make beyond the R, W and N model and makes possible functionality that relies on consistency guarantees.
After connecting all these dots, I’ve finally got it (nb: thanks Mark for sending this out!). And it definitely sounds like an exciting addition to riak_core. The caveat here is that even if this code has been tested in an academic environment the project is still an alpha version. Hopefully other members of the Riak community will pick it up and together with Basho guys will make it production ready.