What is Membase?
It is kind of difficult to figure out a complete description of what Membase is as the ratio of signal to noise in today’s announcement is still very low[1]. Anyways, here is what I’ve been able to put together:
- a cache using memcached protocol
- Apache licensed open source version of NorthScale Membase Server[2]
- project homepage is membase.org and (some) code can be found on GitHub
- can persist data
- supports replication (note: source code repository contains a reference to master-slave setup)
- elastic, allowing addition and removal of new nodes and automatic rebalancing
- used by Zynga and NHN[3], which are also listed as project contributors
While details are extremely scarce, this sounds a lot like Gear6 Memcached.
Membase Persistency
According to this paper the execution of a write operation involves the following steps
- The set arrives into the membase listener-receiver.
- Membase immediately replicates the data to replica servers – the number of replica copies is user defined. Upon arrival at replica servers, the data is persisted.
- The data is cached in main memory.
- The data is queued for persistence and de-duplicated if a write is already pending. Once the pending write is pulled from the queue, the value is retrieved from cache and written to disk (or SSD).
- Set acknowledgment return to application.
There is also:
In membase 1.6, data migration is based on an LRU algorithm, keeping recently used items in low-latency media while “aging out” colder items; first to SSD (if available) and then to spinning media.
A couple of comments:
- it looks like a write operation is blocking until data is completely replicated
- it is not completely clear if “hot data” is persisted to disk on a write operation or only once it’s becoming “cold”
Membase Replication
Membase uses the notion of virtual buckets or vBucket (currently it supports up to 4096) which contains or owns a subset of the key space (note this is similar to Riak Vnodes[4]). Each vBucket replication can be configured independently, but at any time there is only 1 master node that coordinates reads and writes.
Membase Rebalancing
Membase runs on each node a couple of “processes” that are dealing with data rebalancing (part of a so called: cluster manager). Once it is determined that a master node (the coordinator for all reads and writes for a particular virtual bucket) becomes unavailable, a Rebalance Orchestrator process will coordinate the migration of the virtual buckets (note: both master and replica data of the virtual bucket will be moved).
When machines are scheduled to join or leave the cluster, these are placed in a pending operation set that is used upon the next rebalancing operation. I’m not sure, but I think it is possible to manually trigger a rebalancing op.
- Sources:
(↩)
- NorthScale Unleashes Membase Server (NorthScale blog)
- NothScale, Zynga team up on NoSQL (CNET)
- Open Sourced Membase Joins NoSQL Party (GigaOm)
- NorthScale Releases High-Performance NoSQL Database (marketwire.com)
- NorthScale Membase Server web page (↩)
- While I read that
“Membase is currently serving data for some of the busiest web applications on the planet.”
, I couldn’t find any other users besides Zynga and NHN. (↩) - Riak is using a similar notion: vnode. While the terms are the same you should not confuse Riak buckets for membase buckets though. (↩)