NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



scalability: All content tagged as scalability in NoSQL databases and polyglot persistence

Why are you using MySQL?

Mark Callaghan puts out a great explanation of why pitching new databases to large MySQL users will almost always fail:

Leaving out quality of service, a simple definition for scalability is that a given workload requires A people, B hardware units and C lines of automation code. For something to scale better than MySQL it should reduce some of A, B and C. For many web-scale deployments the cost of C has mostly been paid and migrating to something new means a large cost for C. Note that B represents many potential bottlenecks. The value of B might be large to get more IOPs for IO-bound workloads with databases that are much bigger than RAM. It might be large to get more RAM to keep everything cached. Unfortunately, some deployments are not going to fully describe that context (some things are secret). The value of A is influenced by the features in C and the manageability features in the DBMS but most web-scale companies don’t disclose the values of B and A.

I can still see some good reasons why a new database “vendor” should continue to try to get big users to take a look at their product:

  1. it’s the only way to learn:

    1. what’s unique for these users
    2. what’s at the top of their concerns, and
    3. how they address them

    While you won’t be able to make them switch, if your product addresses their issues, it will be prepared for the next big user. 2. there’re still chances for smaller, greenfield, internal projects to start using your product

It’s common sense to build a product using the tools you are already familiar with — it helps reducing the risks and also cutting down the time to market . What happens though is that there are always people that don’t follow this rule starting new products using tools that are they are not that familiar with. There’re also products that grow faster than the team’s know-how evolves. These are just two quick examples where a new product that has learned from big users can help.

Original title and link: Why are you using MySQL? (NoSQL database©myNoSQL)


Paper: An Analysis of Linux Scalability to Many Cores

A paper authored by a team from MIT CSAIL whose goal is to identify various scalability issues in the Linux kernel:

This paper analyzes the scalability of seven system applications (Exim, memcached, Apache, PostgreSQL, gmake, Psearchy, and MapReduce) running on Linux on a 48- core computer. Except for gmake, all applications trigger scalability bottlenecks inside a recent Linux kernel. Us-ing mostly standard parallel programming techniques—this paper introduces one new technique, sloppy counters—these bottlenecks can be removed from the kernel or avoided by changing the applications slightly. Modifying the kernel required in total 3002 lines of code changes. A speculative conclusion from this analysis is that there is no scalability reason to give up on traditional operating system organizations just yet.

Interesting choice of tools. Note that the team used an in-memory file system to eliminate the disk-related bottlenecks.

Original title and link: Paper: An Analysis of Linux Scalability to Many Cores (NoSQL database©myNoSQL)


How Do I Freaking Scale Oracle?

Andrew Oliver for InfoWorld:

That said, many companies I work with have spent 20 years painting themselves into an Oracle corner. While they may have one eye on a brighter future, they still must ensure their Oracle database is high-performance and highly available — and scales as well as possible. Despite what you may read in NoSQL vendor marketing materials (or even in my blog [2]), it is possible to scale Oracle.

If you can actually find the answer to the question in the title, please teach me or give me links. This article left me in the dark. Or the only light I’ve seen involves way too many additional products that most probably cost a bit.

Original title and link: How Do I Freaking Scale Oracle? (NoSQL database©myNoSQL)


The Myth of Auto Scaling as a Capacity Planning Approach

A quite old, but very educative post by James Golick dissecting the mythical extra server capacity:

There’s this idea floating around that we can scale out our data services “just in time”. Proponents of cloud computing frequently tout this as an advantage of such a platform. Got a load spike? No problem, just spin up a few new instances to handle the demand. It’s a great sounding story, but sadly, things don’t quite work that way.

This is the Mythical Man-Month of the IT department.

John Allspaw

Original title and link: The Myth of Auto Scaling as a Capacity Planning Approach (NoSQL database©myNoSQL)


Two Sides of the OMGPOP Cloud and Couchbase Scalability Story

Many media sites published on Friday the PR release of OMGPOP growth story citing the usage of cloud services and Couchbase as their scaling solution (GigaOm, BusinessInsider).

When reading it, I’ve jotted down:

  1. The good: using a combination of cloud and a NoSQL database (Couchbase) allowed OMGPOP to scale
  2. The bad: OMGPOP had to call in people from Couchbase to help out with scaling

Question is if you can throw in more iron and hire experts wouldn’t many other database solutions be able to cope with OMGPOP’s growth?

Original title and link: Two Sides of the OMGPOP Cloud and Couchbase Scalability Story (NoSQL database©myNoSQL)

What other popular paradigms/architectures can handle large scale computational problems?

Interesting answers on Quora mostly expanding on Krishna Sankar’s short answer:

There are two ways one can address large scale computational problems:

  • Task Parallelism : This is where MPI and so forth fit in
  • Data Parallelism : This is the sweet spot for map/reduce

Original title and link: What other popular paradigms/architectures can handle large scale computational problems? (NoSQL database©myNoSQL)

Auto Scaling in the Amazon Cloud: Netflix's Approach and Lessons Learned

Another great post for today from the engineering team at Netflix:

Auto scaling is a very powerful tool, but it can also be a double-edged sword. Without the proper configuration and testing it can do more harm than good. A number of edge cases may occur when attempting to optimize or make the configuration more complex. As seen above, when configured carefully and correctly, auto scaling can increase availability while simultaneously decreasing overall costs.

Original title and link: Auto Scaling in the Amazon Cloud: Netflix’s Approach and Lessons Learned (NoSQL database©myNoSQL)


Asking for Performance and Scalability Advice on StackOverflow

How many times have you got an answer that applies to your specific scenario when providing a short list of performance and scalability requirements? MySQL/InnoDB can do 750k qps, Cassandra is scaling linearly, MongoDB can do 8 mil ops/s. Is any of these the answer for your application?


  • How many times did you get all the requirements right at the spec time?

  • How many times did requirements remain the same during the development cycle?

  • How many times did production reality confirmed your bullet list requirements?

Original title and link: Asking for Performance and Scalability Advice on StackOverflow (NoSQL database©myNoSQL)

Podcast: MySQL Cluster News: Performance Improvements,New NoSQL Access

Mat Keep and Bernd Ocklin discuss what’s new in the second milesone release of MySQL Cluster 7.2: performance improvements, new NoSQL access (memcached protocol), cross data center scalability. Download the mp3.

Original title and link: Podcast: MySQL Cluster News: Performance Improvements,New NoSQL Access (NoSQL database©myNoSQL)

The Story of Etsy's Architecture

Ars Technica’s Sean Gallagher summarizes a presentation given at Surge conference covering the evolution of Etsy’s architecture from a centralized PostgreSQL stored procedures based solution to a sharded MySQL and going through a failed service oriented-like architecture:

And the team started to shift feature by feature away from a semi-monolithic Postgres back-end to sharded MySQL databases. “It’s a battle-tested approach,” Snyder said. “Flickr is using it on an enormous scale. It scales horizontally, basically, to near infinity, and there’s no single point of failure—it’s all master to master replication.”

Original title and link: The Story of Etsy’s Architecture (NoSQL database©myNoSQL)


Help CouchDB Break the C10K Barrier

Over the weekend, I was experimenting with CouchDB to see if it can pass the C10K barrier. Some of the performance optimizations I made along the way are really OS-level optimizations that affect MochiWeb (erlang web server) and fairly well documented in many blogs. This one by @metabrew in particular is a pretty good read, since it focuses on Erlang and MochiWeb. While I am a performance junkie, I am not an Erlang hacker. So this is a call for help to the CouchDB hackers for recommendations on scaling out CouchDB.

The initial tweaks made by the guys, took CouchDB from under 1000 concurrent users to around 2300 concurrent users. There’s still a long way to 10k concurrent users and they’d appreciate your help.

Original title and link: Help CouchDB Break the C10K Barrier (NoSQL database©myNoSQL)