NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Memcached: All content tagged as Memcached in NoSQL databases and polyglot persistence

DataSift Using MySQL, HBase, Memcached to Deal With Twitter Firehose

A new great article from Todd Hoff dissecting the DataSift architecture:

DataSift architecture

Click for a larger image

In terms of data store, DataSift architecture includes:

  • MySQL (Percona server) on SSD drives
  • HBase cluster (currently, ~30 hadoop nodes, 400TB of storage)
  • Memcached (cache)
  • Redis (still used for some internal queues, but probably going to be dismissed soon)

Leave whatever you were doing and go read it now.

Original title and link: DataSift Using MySQL, HBase, Memcached to Deal With Twitter Firehose (NoSQL database©myNoSQL)

Memcached Internals: Memory Allocation, Eviction Policy, Consistent Hashing

Earlier today when writing about a benchmark of MongoDB, Redis, Memcached-based Rails caches, I’ve refered to eviction policies and their possible impact on the cache performance. But as I wasn’t sure about how Memcached works in a couple of areas, I did a bit of research and found a tweet from Tim de Pater pointing to 2 great articles about Memcached.

Joshua Thijssen’s post covers 4 topics: Memcached operations Big-O, LRU eviction policy, memory allocation, consistent hashing.

Now, in order to combat this “malloc()” problem, memcache does its own memory management by default (you can let memcache use the standard malloc() function, but that would not be advisable). Memcache’s memory manager will allocate the maximum amount of memory from the operating system that you have set (for instance, 64Mb, but probably more) through one malloc() call. From that point on, it will use its own memory manager system called the slab allocator.

Then this post describes in great detail how Memcached eviction policy works:

The other day I was chatting with a colleague about Memcached.  Eviction policy came up, and I casually mentioned that Memcache isn’t strictly LRU.  But a quick Bing search said Memcache is LRU, like this Wikipedia entry.  Hmm, I was 99.9% sure Memcache is not LRU, something to do with how it manages memory, but maybe I was wrong all these years.  After reading through some Danga mailing lists and documentation, the answer is, Memcached is LRU per slab class, but not globally LRU.  

Memcached Eviction Policy

Original title and link: Memcached Internals: Memory Allocation, Eviction Policy, Consistent Hashing (NoSQL database©myNoSQL)

Rails Caching Benchmarked: MongoDB, Redis, Memcached

A couple of Rails caching solutions—file, memcached, MongoDB, and Redis—benchmarked firstly here by Steph Skardal and then here by Thomas W. Devol. Thomas W. Devol concludes:

Though it looks like mongo-store demonstrates the best overall performance, it should be noted that a mongo server is unlikely to be used solely for caching (the same applies to redis), it is likely that non-caching related queries will be running concurrently on a mongo/redis server which could affect the suitability of these benchkmarks.

I’m not a Rails user, so please take these with a grain of salt:

  • without knowing the size of the cached objects, at 20000 iterations most probably neither MongoDB, nor Redis have had to persist to disk.

    This means that all three of memcached, MongoDB, Redis stored data in memory only[1]

  • if no custom object serialization is used by any of the memcached, MongoDB, Redis caches, then the performance difference is mostly caused by the performance of the driver

  • it should not be a surprise to anyone that the size of the cached objects can and will influence the results of such benchmarks

  • there doesn’t seem to be any concurrent access to caches. Concurrent access and concurrent updates of caches are real-life scenarios and not including them in a benchmark greatly reduces the value of the results

  • none of these benchmarks doesn’t seem to contain code that measure the performance of cache eviction

  1. Except the case where any of these forces a disk write  

Original title and link: Rails Caching Benchmarked: MongoDB, Redis, Memcached (NoSQL database©myNoSQL)

Griffon and NoSQL Databases

Andres Almiray:

The following list enumerates all NoSQL options currently supported by Griffon via plugins:

  • BerkeleyDB
  • CouchDB
  • Memcached
  • Riak
  • Redis
  • Terrastore
  • Voldemort
  • Neo4j
  • Db4o
  • Neodatis

The first 7 are Key/Value stores. Neo4j is a Graph based database. The last two are object stores. All of them support multiple datasources, data bootstrap and a Java friendly API similar to the one shown earlier.

Griffon is a Groovy-based framework for developing desktop applications. While the coolness factor of Java-based desktop apps is close to zero, having some multi-platform management utilities for these NoSQL databases might be interesting.

Original title and link: Griffon and NoSQL Databases (NoSQL database©myNoSQL)


Twitter's Real-Time URL Fetcher Using Cassandra and Memcached

Twitter’s real-time URL fetcher, code named SpiderDuck, is an excellent example of how NoSQL databases fit in the architecture of today’s systems:

Metadata Store: This is a Cassandra-based distributed hash table that stores page metadata and resolution information keyed by URL, as well as fetch status for every URL recently encountered by the system. This store serves clients across Twitter that need real-time access to URL metadata.

SpiderDuck is also using memcached:

Memcached: This is a distributed cache used by the fetchers to temporarily store robots.txt files.

SpiderDuck Architecture Cassandra Memcached

Original title and link: Twitter’s Real-Time URL Fetcher Using Cassandra and Memcached (NoSQL database©myNoSQL)


How to Cache PHP Sessions in Membase

Why Membase is the next step after Memcached:

Memcache is great, but once you start running low on memory (as you cache more info) lesser-used items in the cache will be destroyed to free up more space for new items. This can result in users getting logged out.  Also, if one of the servers in the pool fails or gets rebooted, all the data it was holding is lost, and then the cache must get “warmed up” again.

Membase is memcache with data persistence. The improvement of having data persistence is that if you need to bring down a server, you don’t have to worry about all that dainty, floaty data in memory that is gonna get burned. Since membase has replication built-in, you can feel free to restart a troublesome server with fear of your database getting pounded as the caches need to refill, or that a set of unlucky users will get logged out.  I’ll let you read about all the many other advantages of membase here.  It’s much more than I’ve mentioned here.

Original title and link: How to Cache PHP Sessions in Membase (NoSQL database©myNoSQL)


Memcached and Sherpa for Yahoo! News Activity Data Service

Mixer, the recently announced Yahoo’s new data service for news activities, uses Memcached and Sherpa for its data backend. Plus a combination of asynchronous libraries and task execution tools:

Mixer - Memcached Sherpa Yahoo News Activity

The data processing model and the clear separation between read and write data solutions is not only compelling, but essential for maintaining the SLA (max. 250ms/response):

Memcache maintains two types of materialized views: 1) Consumer-pivoted, and 2) Producer-pivoted. Consumer-pivoted views (e.g. user’s friends’ latest read activity) are refreshed at query time by refresh tasks. Producer-pivoted views (e.g. user’s latest read activity) are refreshed at update time (i.e. when “read” event is posted). And producer-pivoted views are used to refresh consumer-pivoted views.

Sherpa is Yahoo!’s cloud-based NoSql data store that provides low-latency reads and writes of key-value records and short range scans. Efficient range scans are particular important for the Mixer use cases. The “read” event is stored in the Updates table. The Updates table is a Sherpa Distributed Ordered Table that is ordered by “user,timestamp desc”. This provides efficient scans through a user’s latest read activity. A reference to the “read” record is stored in the UpdatesIndex table to support efficient point lookups. UpdatesIndex is a Sherpa Distributed Hash Table

Original title and link: Memcached and Sherpa for Yahoo! News Activity Data Service (NoSQL database©myNoSQL)


From Memcached to Membase Memcached Buckets


But this post isn’t about switching from a volatile cache to a persistent solution. It is about removing the dumb part from the memcached setup.

So I thought I’ll read about the advantages of virtual nodes/buckets and elastic clusters, cold vs warm caches, cluster recoverability, the widely used memcached protocol and the possibility to use extensions in future versions, etc. Instead I’ve learned about Moxy-based cluster configuration discoverability and how stupid the memcached PHP libraries are.

But I really enjoyed Matt Ingenthron’ quote:

at Membase Inc they view Memcached as a rabbit. It is fast, but it is pretty dumb and procreates quickly. Before you know it, it will be running wild all over your system.

Original title and link: From Memcached to Membase Memcached Buckets (NoSQL database©myNoSQL)


Use Membase and You'll Never Want to Mess With Memcached Servers Again

All I can say is WOW. I’ll never use stand alone memcached server(s) again.

  • crazy easy to install and make a cluster.

  • 0 changes to your app code. Operates seamlessly with memcached protocol. If you want to take advantage of advanced features, you need to modify app code.

  • you can dynamically add and remove nodes without losing all your keys/data.

  • 2 bucket types:

    1. Membase: supports data persistence (writes them ionicely to disk) and replication (one node dies, you dont lose your key/value pairs). It sends data to disk as fast as it can (while giving priority to getting data back from disk). This is done asynchronously (with an option for synchronous), so clients shouldn’t be able to perceive a difference between Membase and memcached data buckets.
    2. Memcached: no persistence or replication. all in memory. I would highly recomend going membase bucket unless you have some I/O concerns (like you get charged for I/O in the cloud).
  • Awesome admin web UI.

  • lots of documentation

  • helpful community

The only concern I could think one would have to replace memached with Membase is the maturity of the cluster solution. But on this front, things will only get better, probably before memcached will get an auto-scaling solution.

Original title and link: Use Membase and You’ll Never Want to Mess With Memcached Servers Again (NoSQL database©myNoSQL)


Powered by Redis: and other related web properties are using Redis’ hashes, lists, and sets (sorted and unsorted) for fragment caching and third party responses caching:

We used Redis as our cache store for two reasons. First, we were already using it for other purposes, so reusing it kept the technology stack simpler. But more importantly, Redis’ wildcard key matching makes cache expiration a snap. It’s well known that cache expiration is one of two hard things in computer science, but using wildcard key searching, it’s dirt simple to pull back all keys that begin with “views” and contain the word “articles” and expire them everytime an article is changed. Memcached has no such ability.

Original title and link: Powered by Redis: (NoSQL database©myNoSQL)


Memcached in the Cloud: Amazon ElastiCache

Amazon announced today a new service Amazon ElastiCache or Memcached in the cloud. The new service is still in beta and available only in the US East (Virginia) Region.

While many will find this new service useful, it is a bit of a disappointement that Amazon took the safe route and went with pure Memcached. The only notable feature of Amazon ElastiCache is automatic failure detection and recovery. But compared with Membase (and the soon to be released Couchbase 2.0) it is missing clustering, replication, support for virtual nodes, etc. Even if advertising a push-button scaling, ElastiCache will lose cached data on adding or removing instances.

The pace at which Amazon is launching new services is indeed impressive. I’m wondering what will be the first NoSQL database that will get official Amazon support.

Original title and link: Memcached in the Cloud: Amazon ElastiCache (NoSQL database©myNoSQL)