One of the most often heard questions about CouchDB is how do I scale CouchDB? While dealing with Google-size data is not really a problem many are facing, more and more engineers are looking for viable solutions for having less SPOFs in their systems.
But replication is just one part of scaling. One other part which is more related to the size of data is data partitioning or sharding and I think that the question about how to scale CouchDB is mostly referring to it. So, let’s see what options are there for sharding CouchDB.
Right now, probably the best known solution for addressing CouchDB sharding is ☞ Lounge, a solution developed by the Meebo.com team:
The Lounge is a proxy-based partitioning/clustering framework for CouchDB.
There are two main pieces:
- dumbproxy: An NGINX module to handle simple gets/puts for everything that isn’t a view.
- smartproxy: A python/twisted daemon that handles mapping/reducing view requests to all shards in the cluster.
While manually repartitioning a CouchDB database is doable, I’d rather have an automatic way of doing it since I don’t want to make mistakes. […]
Now I’ve released version 0.3 of Pillow. This version supports automatic resharding, routing of requests to the right shard and views. Reducers need to be written in erlang, but a summing reducer is in place and mappers without reducers are supported out of the box. As such, this version of Pillow has all the functionality I set out to develop, but it does not support the full CouchDB API.
A more generic solution for partitioning data can be built using the Twitter’s ☞ Gizzard framework for creating distributed datastores:
As per the above diagram, Gizzard would become the “smart” middleware that would handle data distribution between the different stores. There are quite a few interesting features available in Gizzard that can make it a good candidate for dealing with the data partitioning problem: programmable support for partitioning strategies, support for replication trees, fault-tolerance, winged migrations, just to name a few.
Last, but not least, during the Berlin Buzzwords conference, I have also heard about another solution that is in the works and will be released soon.
So, having quite a few possible solutions already available, scaling CouchDB should not be considered an issue anymore.
Important update: CouchDB has a new scaling solution through BigCouch