NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Memcached: All content tagged as Memcached in NoSQL databases and polyglot persistence

RethinkDB Launches 1.0 Version With Memcached Compatibility Only

Just as I speculated , RethinkDB has finally launched the 1.0 version with Memcached compatibility only. Jason Kincaid (Techcrunch) writes:

RethinkDB has just launched its 1.0 release to the public, and it’s offering a product geared toward NoSQL installations — and it will work on SSDs, traditional drives, and cloud-based services like AWS. The startup has also moved away from MySQL and now fully supports Memcached.

But RethinkDB is not the first product providing a Memcached compatible (disk) persistent storage engine. One year ago Membase was launched promising not only a persistent Memcached compatible solution, but also elastic scalability.

RethinkDB has also published a performance report (PDF) demonstrating RethinkDB speed compared to Membase and MySQL. But if I’m reading those numbers correctly, while RethinkDB leads the majority of query-per-second (QPS) benchmarks, MySQL is consistently showing better latency numbers (which is kind of weird). For a strong durability scenario, the benchmark shows MySQL delivering 2x QPS compared to RethinkDB.

Another interesting aspect of the RethinkDB 1.0 release is the licensing model —which I don’t fully get:

RethinkDB Basic is currently identical in feature-set to RethinkDB Premium and Enterprise. However, the paid versions of RethinkDB include phone and email support, access to all future updates, and volume licensing options.

Or spelled out on the TechCrunch post :

Akhmechet says that the free version will get security updates, but that it won’t necessarily receive new features in the future, whereas the premium version will.

Original title and link: RethinkDB Launches 1.0 Version With Memcached Compatibility Only (NoSQL databases © myNoSQL)

Optimizing Memcached Performance on a Rapidly Growing Site

Predicting operational growth by monitoring the correct metrics:

Such simple data can reveal a wealth of insights. Most important is the cache’s miss rate: how frequently do we need to regenerate data? It is the miss rate that ultimately impacts site performance. Using such data, we were shocked to discover that we were caching a lot less than we thought, and that our cache actually behaved quite erratically, with a greater than 2x difference between peak and trough miss rates

The story reminded me of the Foursquare accident.

Original title and link: Optimizing Memcached Performance on a Rapidly Growing Site (NoSQL databases © myNoSQL)


How Digg is Built? Using a Bunch of NoSQL technologies

The picture should speak for Digg’s polyglot persistency approach:

Digg Data Storage Architecture

But here is also a description of the data stores in use:

Digg stores data in multiple types system depending on the type of data and the access patterns, and also for historical reasons in some cases :)

  • Cassandra: The primary store for “Object-like” access patterns for such things as Items (stories), Users, Diggs and the indexes that surround them. Since the Cassandra 0.6 version we use does not support secondary indexes, these are computed by application logic and stored here. […]

  • HDFS: Logs from site and API events, user activity. Data source and destination for batch jobs run with Map-Reduce and Hive in Hadoop. Big Data and Big Compute!

  • MySQL: This is mainly the current store for the story promotion algorithm and calculations, because it requires lots of JOIN heavy operations which is not a natural fit for the other data stores at this time. However… HBase looks interesting.

  • Redis: The primary store for the personalized news data because it needs to be different for every user and quick to access and update. We use Redis to provide the Digg Streaming API and also for the real time view and click counts since it provides super low latency as a memory-based data storage system.

  • Scribe: the log collecting service. Although this is a primary store, the logs are rotated out of this system regularly and summaries written to HDFS.

I know this will sound strange, but isn’t it too much in there?


Original title and link: How Digg is Built? Using a Bunch of NoSQL technologies (NoSQL databases © myNoSQL)


MongoDB, memcached, EHCache: Compared as Distributed L2 Caches

As can be seen, whether the off-host process that manages the cache-data is MongoD or MemcacheD or Terracotta-Server, architecturally they all look equivalent - i.e. a pure-L2 with no-L1 - so that all data needs to be retrieved from over the network and then massaged into a POJO for consumption by the application.

MongoDB, memcached, EHCache compared

When speaking about caching systems, I’d also include criteria like:

  • warm up strategy
  • locking strategy
  • single-machine memory spill strategy

Original title and link: MongoDB, memcached, EHCache: Compared as Distributed L2 Caches (NoSQL databases © myNoSQL)


Redis: Le système de cache parfait

I love how this sounds in French:

Après 3 ans d’une histoire d’amour fidèle avec Memcached; le serveur de cache notamment utilisé par Facebook, Youtube ou Twitter; je suis au bord de la rupture après avoir rencontré redis.

The author, Julien Crouzet, mentions three key features of Redis:

  • non-volatile data
  • performance
  • support for data types

On these points:

But Redis’ support for data types (lists, sets, sorted sets, and hashes) is not up for debate.

Original title and link: redis : Le système de cache parfait (NoSQL databases © myNoSQL)


How to Maintain a Set in Memcached

Could you imagine a solution for storing a set into memcached satisfying these requirements:

  • must minimize round trips to the servers
  • O(1) add (for both current size and new items coming in)
  • O(1) remove (for both current size and items being removed)
  • O(1) fetch
  • lock and wait free
  • easy to use
  • easy to understand
  • no required explicit maintenance

Dustin Sallings describes how to achieve it using just three memcached operations and some clever but artificial data encoding.

Nuno Job suggests the right answer is using Redis.

Original title and link: How to Maintain a Set in Memcached (NoSQL databases © myNoSQL)


From No Cache to Membase: The Knot

Jason Sirota is telling the story of how The Knot (a media company) went from no cache to Membase passing through memcached and Gear6.

In talking to Membase and through our own research, we found that Membase solved all of our original problems, plus our new problems with Gear6.

  1. Membase provides a rich set of both GUI and programmatic tools to manage and monitor the cache.

  2. Membase not only runs on multiple physical nodes but balances keys across those nodes using the vBuckets

  3. Membase runs on Windows and can handle quite a bit more capacity (evidenced by Zynga) than we could possibly use.

  4. Membase uses both HA replication and distributed nodes for different solutions, in our case, it easily supports the 5 node-configuration

  5. Membase provides Buckets that can be configured by Port to allow different teams to have a set amount of space

  6. Hardware can be added both horizontally and vertically to a Membase cluster. However, one limitation is that all nodes have to run the same cache limit so you do need to think carefully about your node size

  7. No company is immune to going under but, in addition to their strong financial state, the risk for Membase is mitigated by two factors:

If you want the simplified version:

  • a typical story where to maintain the quality of the service, caching had to used
  • a typical story where with scale came also the need for better administration and monitoring tool
  • a typical story where op costs should be kept as much under control and even reduced if possible

What made Membase the winning solution for The Knot?

Some would say the feature set, which I’ll probably agree — pointing out though that such features can be found in other NoSQL databases too.

I’d say it’s Membase usage of a well-established protocol. That didn’t require The Knot to completely rewrite the whole persistence layer. Even if Membase would not have had all required features, using the memcached protocol made it the easiest solution to try out as no application changes were needed.

Original title and link: From No Cache to Membase: The Knot (NoSQL databases © myNoSQL)


Tarantool/Silverbox: Another In-Memory Key-Value Store from Mail.Ru, one of the most popular Russian web sites, has open sourced ☞ Tarantool which among other components includes also (another) in-memory key-value store.

From the ☞ project home:

  • The system is optimized for work with large volumes of data;
  • Tarantool uses snapshot files, which contain the state of the database at the time of copy to disk;
  • Transaction logging in binary log files preserves all changes to database state, allowing automatic restoration of information after system reboot;
  • The system provides high availability, automatic switchover to an available replica in case of crash of any part of the system;
  • The system is fully compatible with the memcached protocol;
  • Local replicas allow system update without interruption to client services;
  • The system provides data replication over the network;
  • Tarantool supplies a simply binary protocol for replication, supporting the creation of additional logic.

It sounds like an improved, HA memcached, which would place it close to products like Membase[1].


  1. Details about Tarantool are still scarce, so I’m not 100% about it.  ()

Original title and link: Tarantool/Silverbox: Another In-Memory Key-Value Store from Mail.Ru (NoSQL databases © myNoSQL)

Microsoft coaches NoSQL options for Azure cloud

The Register writing about Microsoft initiative to bring NoSQL databases to the Azure cloud, Membase and MongoDB being mentioned in the article[1]:

The addition of NoSQL suits Microsoft - by bringing more people to Azure - and it suits the NoSQLers, because they get more Windows devs to support.

You can run NoSQL options like Mongo and Memcached on Azure after some fiddling and configuring. The goal now is to deliver a development, deployment, and management experience already familiar to those on Windows, SQL Server, and Visual Basic.

Is VMWare/Spring making the same bet for the Java world? Judging by the Spring Data initiative, plus Grails support for Redis, Grails support for MongoDB, I’d say they are.

A question that I’d like to clarify to myself is how popular is memcached in the Java world? My impression is that Java people have stayed away from memcached so far, using Java based solutions like EHCache or Terracotta, but I might be completely wrong.

Original title and link: Microsoft coaches NoSQL options for Azure cloud (NoSQL databases © myNoSQL)


Why Redis? And Memcached, Cassandra, Lucene, ElasticSearch

Why do we keep jumping from one storage engine to another? Can’t we make up our minds already and settle with the “best” storage engine that meets our needs?

In short, No.

A common misconception is the belief that all storage engines are created equal, all designed to simply “store stuff” and provide fast access to your data. Unless your application performs one clearly defined simple task, it is a dire mistake to expect a single storage engine will effectively fulfill all of your data warehousing and processing needs.

I don’t think I need to say that I’m a proponent of polyglot persistence. And that I believe in Unix tools philosophy. But while adding more components to your system, you should realize that such a system complexity is “exploding” and so will operational costs grow too (nb: do you remember why Twitter started to into using Cassandra?) . Not to mention that the more components your system has the more attention and care must be invested figuring out critical aspects like overall system availability, latency, throughput, and consistency.

Original title and link: Why Redis? And Memcached, Cassandra, Lucene, ElasticSearch (NoSQL databases © myNoSQL)


Memcached/Membase: Writing Your Own Storage Engine

Not sure how many will need to implement their own storage engine, but knowing there’re a couple of projects that support pluggable engines (Project Voldemort, Riak) it might be that for special scenarios special engines could perform better. Now you can learn how to do it for Membase:

Right now we’ve got an engine capable of running get and set load, but it is doing synchronous filesystem IO. We can’t serve our client faster than we can read the item from disk, but we might serve other connections while we’re reading the item off disk.

Original title and link: Memcached/Membase: Writing Your Own Storage Engine (NoSQL databases © myNoSQL)