LinkedIn: All content tagged as LinkedIn in NoSQL databases and polyglot persistence
After learning about LinkedIn’s Databus low latency data transfer system, I’ve had a short chat with Sid Anand focused on understanding what makes Databus unique.
As I’ve mentioned in my post about Databus, Databus looks at first as a data-oriented ESB. But what is innovative about Databus comes from decoupling the data source from the consumers/clients thus being able to offer speed to a large number of subscribers that are up-to-date, but also help clients that fall behind or are just bootstrapping without adding load on the source database.
Databus clients are smart enough to:
- ask for Consolidated Deltas since time T if they fall behind
- ask for a Consistent Snapshot and then for a Consolidated Delta if they bootstrap
and Databus is build so it can serve both Consolidate Deltas and Consistent Snapshots without any impact on the original data source.
The “catching-up” and boostrapping processes are described in much more details in Sid Anand’s article.
Databus is the single and only way that data is replicated from LinkedIn’s databases to search indexes, the graph, Memcached, Voldemort, etc.
Original title and link: What Is Unique About LinkedIn’s Databus ( ©myNoSQL)
The abstract of a new paper from a team at LinkedIn (Roshan Sumbaly, Jay Kreps, Lei Gao, Alex Feinberg, Chinmay Soman, Sam Shah):
Current serving systems lack the ability to bulk load massive immutable data sets without affecting serving performance. The performance degradation is largely due to index creation and modification as CPU and memory resources are shared with request serving. We have ex- tended Project Voldemort, a general-purpose distributed storage and serving system inspired by Amazon’s Dy- namo, to support bulk loading terabytes of read-only data. This extension constructs the index offline, by leveraging the fault tolerance and parallelism of Hadoop. Compared to MySQL, our compact storage format and data deploy- ment pipeline scales to twice the request throughput while maintaining sub 5 ms median latency. At LinkedIn, the largest professional social network, this system has been running in production for more than 2 years and serves many of the data-intensive social features on the site.
Read or download the paper after the break.
Michael Stack (StumbleUpon & Hadoop PMC) presents on some of the more interesting HBase deployments, HBase scenario usages, HBase and HDFS, and near-future of HBase: