NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



gear6: All content tagged as gear6 in NoSQL databases and polyglot persistence

From No Cache to Membase: The Knot

Jason Sirota is telling the story of how The Knot (a media company) went from no cache to Membase passing through memcached and Gear6.

In talking to Membase and through our own research, we found that Membase solved all of our original problems, plus our new problems with Gear6.

  1. Membase provides a rich set of both GUI and programmatic tools to manage and monitor the cache.

  2. Membase not only runs on multiple physical nodes but balances keys across those nodes using the vBuckets

  3. Membase runs on Windows and can handle quite a bit more capacity (evidenced by Zynga) than we could possibly use.

  4. Membase uses both HA replication and distributed nodes for different solutions, in our case, it easily supports the 5 node-configuration

  5. Membase provides Buckets that can be configured by Port to allow different teams to have a set amount of space

  6. Hardware can be added both horizontally and vertically to a Membase cluster. However, one limitation is that all nodes have to run the same cache limit so you do need to think carefully about your node size

  7. No company is immune to going under but, in addition to their strong financial state, the risk for Membase is mitigated by two factors:

If you want the simplified version:

  • a typical story where to maintain the quality of the service, caching had to used
  • a typical story where with scale came also the need for better administration and monitoring tool
  • a typical story where op costs should be kept as much under control and even reduced if possible

What made Membase the winning solution for The Knot?

Some would say the feature set, which I’ll probably agree — pointing out though that such features can be found in other NoSQL databases too.

I’d say it’s Membase usage of a well-established protocol. That didn’t require The Knot to completely rewrite the whole persistence layer. Even if Membase would not have had all required features, using the memcached protocol made it the easiest solution to try out as no application changes were needed.

Original title and link: From No Cache to Membase: The Knot (NoSQL databases © myNoSQL)


What is Membase?

It is kind of difficult to figure out a complete description of what Membase is as the ratio of signal to noise in today’s announcement is still very low[1]. Anyways, here is what I’ve been able to put together:

  • a cache using memcached protocol
  • Apache licensed open source version of NorthScale Membase Server[2]
  • project homepage is and (some) code can be found on GitHub
  • can persist data
  • supports replication (note: source code repository contains a reference to master-slave setup)
  • elastic, allowing addition and removal of new nodes and automatic rebalancing
  • used by Zynga and NHN[3], which are also listed as project contributors

While details are extremely scarce, this sounds a lot like Gear6 Memcached.

Membase Persistency

According to this paper the execution of a write operation involves the following steps

  1. The set arrives into the membase listener-receiver.
  2. Membase immediately replicates the data to replica servers – the number of replica copies is user defined. Upon arrival at replica servers, the data is persisted.
  3. The data is cached in main memory.
  4. The data is queued for persistence and de-duplicated if a write is already pending. Once the pending write is pulled from the queue, the value is retrieved from cache and written to disk (or SSD).
  5. Set acknowledgment return to application.

There is also:

In membase 1.6, data migration is based on an LRU algorithm, keeping recently used items in low-latency media while “aging out” colder items; first to SSD (if available) and then to spinning media.

A couple of comments:

  1. it looks like a write operation is blocking until data is completely replicated
  2. it is not completely clear if “hot data” is persisted to disk on a write operation or only once it’s becoming “cold”

Membase Replication

Membase uses the notion of virtual buckets or vBucket (currently it supports up to 4096) which contains or owns a subset of the key space (note this is similar to Riak Vnodes[4]). Each vBucket replication can be configured independently, but at any time there is only 1 master node that coordinates reads and writes.

Membase Rebalancing

Membase runs on each node a couple of “processes” that are dealing with data rebalancing (part of a so called: cluster manager). Once it is determined that a master node (the coordinator for all reads and writes for a particular virtual bucket) becomes unavailable, a Rebalance Orchestrator process will coordinate the migration of the virtual buckets (note: both master and replica data of the virtual bucket will be moved).

When machines are scheduled to join or leave the cluster, these are placed in a pending operation set that is used upon the next rebalancing operation. I’m not sure, but I think it is possible to manually trigger a rebalancing op.

  1. Sources:  ()
  2. NorthScale Membase Server web page  ()
  3. While I read that “Membase is currently serving data for some of the busiest web applications on the planet.”, I couldn’t find any other users besides Zynga and NHN.  ()
  4. Riak is using a similar notion: vnode. While the terms are the same you should not confuse Riak buckets for membase buckets though.  ()