NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



What is Membase?

It is kind of difficult to figure out a complete description of what Membase is as the ratio of signal to noise in today’s announcement is still very low[1]. Anyways, here is what I’ve been able to put together:

  • a cache using memcached protocol
  • Apache licensed open source version of NorthScale Membase Server[2]
  • project homepage is and (some) code can be found on GitHub
  • can persist data
  • supports replication (note: source code repository contains a reference to master-slave setup)
  • elastic, allowing addition and removal of new nodes and automatic rebalancing
  • used by Zynga and NHN[3], which are also listed as project contributors

While details are extremely scarce, this sounds a lot like Gear6 Memcached.

Membase Persistency

According to this paper the execution of a write operation involves the following steps

  1. The set arrives into the membase listener-receiver.
  2. Membase immediately replicates the data to replica servers – the number of replica copies is user defined. Upon arrival at replica servers, the data is persisted.
  3. The data is cached in main memory.
  4. The data is queued for persistence and de-duplicated if a write is already pending. Once the pending write is pulled from the queue, the value is retrieved from cache and written to disk (or SSD).
  5. Set acknowledgment return to application.

There is also:

In membase 1.6, data migration is based on an LRU algorithm, keeping recently used items in low-latency media while “aging out” colder items; first to SSD (if available) and then to spinning media.

A couple of comments:

  1. it looks like a write operation is blocking until data is completely replicated
  2. it is not completely clear if “hot data” is persisted to disk on a write operation or only once it’s becoming “cold”

Membase Replication

Membase uses the notion of virtual buckets or vBucket (currently it supports up to 4096) which contains or owns a subset of the key space (note this is similar to Riak Vnodes[4]). Each vBucket replication can be configured independently, but at any time there is only 1 master node that coordinates reads and writes.

Membase Rebalancing

Membase runs on each node a couple of “processes” that are dealing with data rebalancing (part of a so called: cluster manager). Once it is determined that a master node (the coordinator for all reads and writes for a particular virtual bucket) becomes unavailable, a Rebalance Orchestrator process will coordinate the migration of the virtual buckets (note: both master and replica data of the virtual bucket will be moved).

When machines are scheduled to join or leave the cluster, these are placed in a pending operation set that is used upon the next rebalancing operation. I’m not sure, but I think it is possible to manually trigger a rebalancing op.

  1. Sources:  ()
  2. NorthScale Membase Server web page  ()
  3. While I read that “Membase is currently serving data for some of the busiest web applications on the planet.”, I couldn’t find any other users besides Zynga and NHN.  ()
  4. Riak is using a similar notion: vnode. While the terms are the same you should not confuse Riak buckets for membase buckets though.  ()