memcached: All content tagged as memcached in NoSQL databases and polyglot persistence
A long and interesting discussion on comparing Redis and Memcached performance. It all started ☞ here:
After crunching all of these numbers and screwing around with the annoying intricacies of OpenOffice, I’m giving Redis a big thumbs down. My initial sexual arousal from the feature list is long gone. Granted, Redis might have its place in a large architecture, but certainly not a replacement to memcache. When your site is hammering 20,000 keys per second and memcache latency is heavily dependent on delivery times, it makes no business sense to transparently drop in Redis. The features are neat, and the extra data structures could be used to offload more RDBMS activity… but 20% is just too much to gamble on the heart of your architecture.
Salvatore Sanfilippo ☞ followed up:
[…] this is why the sys/toilet benchmark is ill conceived.
- All the tests are run using a single client into a busy loop.
- when you run single clients benchmarks what you are metering actually is, also: the round trip time between the client and the server, and all the other kind of latencies involved, and of course, the speed of the client library implementation.
- The test was performed with very different client libraries
But he also published a new benchmark. And Dormando ☞ published an update picking on the previous two:
The “toilet” bench and antirez’s benches both share a common issue; they’re busy-looping a single client process against a single daemon server. The antirez benchmark is written much better than the original one; it tries to be asyncronous and is much more efficient.
And it didn’t stop here, as Salvatore felt ☞ something was still missing:
The test performed by @dormando was missing an interesting benchmark, that is, given that Redis is single threaded, what happens if I run an instance of Redis per core?
I assume everyone is asking by now: which one of Redis and Memcached performed better? And the answer is: it depends (even if some would like to believe differently).
But why is this the “answer”? Firstly, because creating good benchmarks is really difficult. Most of the benchmarks are focusing on the wrong thing or they are covering not very real-life like problems.
This would be my very simple advise:
- basic benchmarks will not give you real answers
- you are better testing for your very specific scenario (data size, concurrency level,
Unfortunately both of them are just new examples of useless benchmarks:
- only 1000 keys
- the benchmark doesn’t vary the size of keys and values
- no concurrency
- no mixed reads/writes
It is kind of difficult to figure out a complete description of what Membase is as the ratio of signal to noise in today’s announcement is still very low. Anyways, here is what I’ve been able to put together:
- a cache using memcached protocol
- Apache licensed open source version of NorthScale Membase Server
- project homepage is membase.org and (some) code can be found on GitHub
- can persist data
- supports replication (note: source code repository contains a reference to master-slave setup)
- elastic, allowing addition and removal of new nodes and automatic rebalancing
- used by Zynga and NHN, which are also listed as project contributors
While details are extremely scarce, this sounds a lot like Gear6 Memcached.
According to this paper the execution of a write operation involves the following steps
- The set arrives into the membase listener-receiver.
- Membase immediately replicates the data to replica servers – the number of replica copies is user defined. Upon arrival at replica servers, the data is persisted.
- The data is cached in main memory.
- The data is queued for persistence and de-duplicated if a write is already pending. Once the pending write is pulled from the queue, the value is retrieved from cache and written to disk (or SSD).
- Set acknowledgment return to application.
There is also:
In membase 1.6, data migration is based on an LRU algorithm, keeping recently used items in low-latency media while “aging out” colder items; first to SSD (if available) and then to spinning media.
A couple of comments:
- it looks like a write operation is blocking until data is completely replicated
- it is not completely clear if “hot data” is persisted to disk on a write operation or only once it’s becoming “cold”
Membase uses the notion of virtual buckets or vBucket (currently it supports up to 4096) which contains or owns a subset of the key space (note this is similar to Riak Vnodes). Each vBucket replication can be configured independently, but at any time there is only 1 master node that coordinates reads and writes.
Membase runs on each node a couple of “processes” that are dealing with data rebalancing (part of a so called: cluster manager). Once it is determined that a master node (the coordinator for all reads and writes for a particular virtual bucket) becomes unavailable, a Rebalance Orchestrator process will coordinate the migration of the virtual buckets (note: both master and replica data of the virtual bucket will be moved).
When machines are scheduled to join or leave the cluster, these are placed in a pending operation set that is used upon the next rebalancing operation. I’m not sure, but I think it is possible to manually trigger a rebalancing op.
- NorthScale Unleashes Membase Server (NorthScale blog)
- NothScale, Zynga team up on NoSQL (CNET)
- Open Sourced Membase Joins NoSQL Party (GigaOm)
- NorthScale Releases High-Performance NoSQL Database (marketwire.com)
- NorthScale Membase Server web page (↩)
- While I read that
“Membase is currently serving data for some of the busiest web applications on the planet.”, I couldn’t find any other users besides Zynga and NHN. (↩)
- Riak is using a similar notion: vnode. While the terms are the same you should not confuse Riak buckets for membase buckets though. (↩)
- Gear6 memcached provides an enhanced API that allows querying the key/value space
- Gear6 memcached is looking to support more data types by using Redis support for types like lists, sets, ordered sets, hashes
- or Gear6 is looking to provide commercial support for Redis
These left me with the question: why would you use memcached on top of Redis?
- if the integration would preserve the same memcached API (nb I am not sure though this would be possible) then
- such a product might be useful for projects needing both RDBMS and Redis (note: but in the end the project would still need to be aware of both storage APIs)
such a product might be useful for transitioning towards Redis alone
the integration would just add features missing from the current version of Redis (f.e. elastic scaling, sharding, etc.)
Do you see any other reasons for using memcahed on top of Redis?
- ☞ NoSQL player questions big data (nb the title has pretty much nothing to do with the article)
- ☞ Gear6 Enhances Memcached to Include Native Query Support and Redis Integration
 The only documentation I’ve found about cache query is ☞ here and the only mention to Redis integration found ☞ here talks only about support for Redis: (↩)
Gear6 currently offers commercial support for Memcached. If you are interested in purchasing support for Redis please contact us.
Gear6 will soon contribute a number of enhancements to the Redis community.
-  You can read more about Redis data types ☞ here (↩)
A couple of days before 2009 ended, Salvatore Sanfilippo ( @antirez) has announced his intention to implement virtual memory in Redis. In his message to the Redis user group, he has also mentioned some of the goals or advantages of virtual memory in Redis:
- If the dataset access pattern is not random, but there is a bias towards a subset of keys (let’s call this subset of keys the “hot spot”), with VM Redis can deliver performance similar to the case where you have in memory only the hot spot, using only the memory required to hold the hot spot.
- Your hotspot is much bigger than your RAM, but you are willing to pay a performance penalty because you want to use Redis.
Today, Salvatore has reported that the first phase of implementing virtual memory in Redis was completed and the Redis Twitter-clone app is already running on this new version.
According to the initial plan, the first phase is a blocking implementation VM.
This means that Redis will work as usually, but will have a new layer to access keys that will be able to understand if a key is in memory or swapped out on disk: when Redis tries to access an on-disk key, it will block to load the key from disk to memory (this includes not only I/O, but also CPU time needed to convert the serialized object into the memory representation).
Right now it is not yet decided if this is just an intermediary step before implementing a non blocking VM or it will become part of a release.
While I am neither a concurrency nor a Redis expert, I must confess that my previous experience with a similar solution to Redis single threaded approach was disappointing — I am referring to the Jackrabbit, the Apache JCR implementation where we had to circumvent the serialized single threaded access for read only clients. On the other hand, there are other well known systems (f.e. memcached) which are using the same solution (some will point out that as opposed to Redis, memcached is never touching the disk, while Jackrabbit has a behavior much closer to Redis).
Anyway, we will always have around these Redis benchmarks for sanity checks.