scalability: All content tagged as scalability in NoSQL databases and polyglot persistence
These slides have generated quite a reaction on Twitter. I’ll let you decide for yourself the reasons:
While there have been lots of retweets, here’s just a glimpse of what type of reactions I’m referring to:
Martin Schneider (Basho) trying to answer the question in the title:
Riak can be a data store to a purpose-built enterprise app; a caching layer for an Internet app, or part of the distributed fabric and DNA of a Global app. Those are of course highly arbitrary and vague examples, but it shows how flexible Riak is as a platform.
“Can be” is not quite equivalent with being the right solution and less so with being the best solution. And Martin’s answer to this is:
For super scalable enterprise and global apps — those where the data inside is inherently valuable and dependability of the system to capture, process and store data/writes is imperative — well I see Riak outperforming any perceived competitor in the space in providing value here.
But even for these scenarios, there’s competition from solutions like Cassandra, HBase, and Hypertable — the whole spectrum of scalable storage solutions based on Google BigTable and Amazon Dynamo being covered: HBase (a BigTable implementation), Cassandra (a solution using the BigTable data model and the Dynamo distributed model), and Riak (a solution based mainly on the Amazon Dynamo paper).
While Riak presents itself as the cleanest Dynamo based solution, I would venture to say that both Cassandra and HBase come to table with some interesting characteristics that cannot be ignored:
- Strong communities and community driven development processes — both HBase and Cassandra are top Apache Foundation projects
- Excellent integration with Hadoop, the leading batch processing solution. DataStax, the company offering services for Cassandra, went the extra-mile of creating a custom Hadoop solution, Brisk, making this integration even better.
Bottom line, I don’t think we can declare a winner in this space and I believe all three solutions will stay around for a while competing for every scenario requiring dependability of the system to capture, process and store data.
From Gavin Heavyside’s slides:
- Launch successful service
- Read saturation: add caching
- Write saturation: add hardware
- Queries slow down: denormalize
- Reads still too slow: prematerialise common queries, stop joining
- Writes too slow: drop secondary indexes and triggers
Even if not focused on NoSQL, the videos from the Surge conference are covering very interesting aspects related to scalability. Here are a couple of examples:
- Theo Schlossnagle: Scalable Design Patterns
- Justin Sheehy: Embracing Concurrency at Scale
- Ronald Bradford: The most common MySQL scalability mistakes, and how to avoid them
- Ruslan Belkin: Going 0 to 60: Scaling LinkedIn
- Robert Treat: Database Scalability Patterns
- Artur Bergman: Scaling and Loadbalancing Wikia Across The World
- Mike Malone: Working with Dimensional Data in a Distributed Hash Table
- Gavin M. Roy: Scaling myYearbook.com - Lessons Learned From Rapid Growth
- Benjamin Black: Go with the flow - Meditations on network infrastructure analysis
- John Allspaw: The “Go or No-Go”: Operability and Contingency at Etsy
- Rod Cope: Top 10 Lessons Learned from Deploying Hadoop in a Private Cloud
Last but not least there’s also a “SQL vs NoSQL” panel featuring Geir Magnusson Jr (Moderator), Robert Treat, Baron Schwartz, Mike Malone and Justin Sheehy.