After learning about LinkedIn’s Databus low latency data transfer system, I’ve had a short chat with Sid Anand focused on understanding what makes Databus unique.
As I’ve mentioned in my post about Databus, Databus looks at first as a data-oriented ESB. But what is innovative about Databus comes from decoupling the data source from the consumers/clients thus being able to offer speed to a large number of subscribers that are up-to-date, but also help clients that fall behind or are just bootstrapping without adding load on the source database.
Databus clients are smart enough to:
- ask for Consolidated Deltas since time T if they fall behind
- ask for a Consistent Snapshot and then for a Consolidated Delta if they bootstrap
and Databus is build so it can serve both Consolidate Deltas and Consistent Snapshots without any impact on the original data source.
The “catching-up” and boostrapping processes are described in much more details in Sid Anand’s article.
Databus is the single and only way that data is replicated from LinkedIn’s databases to search indexes, the graph, Memcached, Voldemort, etc.
Original title and link: What Is Unique About LinkedIn’s Databus ( ©myNoSQL)