NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Memcached: All content tagged as Memcached in NoSQL databases and polyglot persistence

You want NoSQL? I’ll give you memcached

Tony Darnell in Use MySQL to store NoSQL and SQL data in the same database using memcached and InnoDB | Scripting MySQL:

With MySQL version 5.6 (and above), you have the ability to store and retrieve NoSQL data, using NoSQL commands, while keeping the data inside a MySQL InnoDB database. So, you can use NoSQL and SQL at the same time, on the same data, stored in the same database. And the beauty is that it takes just a few minutes to setup. This post will provide you with a quick lesson on how to setup NoSQL on a MySQL InnoDb database.

I see this trivialization of the term NoSQL quite frequently in the communications signed by Oracle: “Oh, you want NoSQL? Take memcached. Now shut up!” This is quite disrespectful to their customers and the developer community in general.

Original title and link: You want NoSQL? I’ll give you memcached (NoSQL database©myNoSQL)

InnoDB Memcached plugin benchmark: 1mil QPS with in MySQL 5.7.3

On the InnoDB team blog:

As you probably already know, in MySQL 5.7.3 release, InnoDB Memcached reached a record of over 1 million QPS on a read only load. The overview of the benchmark and testing results can be seen in an earlier blog by Dimitri. In this blog, I will spend sometime on the detail changes we have made to achieve this number.

There’s another post detailing the benchmark:

The test was executed in “standalone” mode (both server and client are running on the same server). So, we used our biggest HW box we have in the LAB - a 48cores machine.

That’s a_good_ number. But if you think about it, the per-core QPS is not that high; if I remember correctly Redis can go up to 70k/s.

Original title and link: InnoDB Memcached plugin benchmark: 1mil QPS with in MySQL 5.7.3 (NoSQL database©myNoSQL)

memcached turns 10 years old

Ars Technica:

This week, memcached, a piece of software that prevents much of the Internet from melting down, turns 10 years old. Despite its age, memcached is still the go-to solution for many programmers and sysadmins managing heavy workloads. Without memcached, Ars Technica would likely be unable to serve this article to you at all.

According to a commenter on HN, memcached’s 10th birthday would actually be tomorrow (May 22nd). I’m pretty sure it will outlive today.

✚ I’ve left a comment myself on HN in which I express my admiration for obvious tools that have changed the face of software development (e.g. memcached, JUnit, etc.)

Original title and link: memcached turns 10 years old (NoSQL database©myNoSQL)


Memcached vs InnoDB Memcached in MySQL 5.6

Some numbers from comparing Memcached with InnoDB Memcached in MySQL 5.6:

Keep in mind that the entire data set fits into the buffer pool, so there are no reads from disk. However, there is write activity stemming from the fact that this is using InnoDB under the hood (redo logs, etc).

There is a significant impact on the speed so deciding which solution to use gets down to analysing the costs and complexity of maintaining another tool, the cost of Memcached warmup and the performance drop of using InnoDB Memcached.

Original title and link: Memcached vs InnoDB Memcached in MySQL 5.6 (NoSQL database©myNoSQL)


A Key-Value Cache for Flash Storage: Facebook's McDipper and What Preceded It

A post on Facebook Engineering’s blog:

The outgrowth of this was McDipper, a highly performant flash-based cache server that is Memcache protocol compatible. The main design goals of McDipper are to make efficient use of flash storage (i.e. to deliver performance as close to that of the underlying device as possible) and to be a drop-in replacement for Memcached. McDipper has been in active use in production at Facebook for nearly a year.

I know at least 3 companies that have attacked this problem with different approaches and different results:

  1. Couchbase (ex-Membase, ex-NorthScale) started as a persistent clustered Memcached implementation. It was not optimized for Flash storage though. Today’s Couchbase product is still based on the memcache protocol, but it adding new features inspired by CouchDB.
  2. RethinkDB, a YC company and the company that I work for, has worked and released in 2011 a Memcache compatible storage engine optimized for SSDs. Since then, RethinkDB has been building and released an enhanced product, a distributed JSON store with advanced data manipulation support.
  3. Aerospike (ex Citrusleaf) sells a storage engine for flash drives. Its API is not Memcache compatible though.

People interested in this market segment have something to learn from this.

Original title and link: A Key-Value Cache for Flash Storage: Facebook’s McDipper and What Preceded It (NoSQL database©myNoSQL)


Migrating Live Memcached Servers

Ian Winter describes the possible approaches to migrate their memcached server farm:

  1. Big Bang. Update all the configuration files and simply push that code live. This has a huge hit on performance as all the new instances are cold, it is however the quickest option.
  2. Migration instance by instance. Each instance would be moved and relevant configuration files pushed live. This way only 1 or X instances is “cold”. This limits impact, but, also takes time as only 1 instance can be done at any one time.
  3. Write to both. The application code could be altered to make all writes to both the existing set of instances and the new. This isn’t great as we’d need a full QA cycle to validate the code works. It’s also something we’d have to implement then pull back out down the line.
  4. Mirror traffic. Similar to the above, but, this time lower down the stack. Essentially duplicate all the TCP level traffic so it warms in parallel and keeps both instances in sync meaning new writes, deletes occur on both existing and new sets.

Can you tell which approach they used?

PS: For the first two approaches I’d use different names: Stop the World and Rolling upgrade respectively.

Original title and link: Migrating Live Memcached Servers (NoSQL database©myNoSQL)


Using Riak as Cache Layer

Sean Cribbs explains how to use Riak as a caching solution:

  1. Bitcask or Memory backends
  2. The possibility of configuring the cluster for lower guarantees of per-key availability

Then benchmark the system for your scenario.

Original title and link: Using Riak as Cache Layer (NoSQL database©myNoSQL)


Redis and Memcached Benchmark on Amazon Cloud

Garantia Data, providers of Redis and Memcached as-a-Service-in-the-Amazon-Cloud, published the results of a throughput and latency benchmark for different AWS deployment models:

The first thing we looked at when putting together our benchmark was the various architectural alternatives we wanted to compare. Users typically choose the most economical AWS instance based on the initial size estimate of their dataset, however, it’s crucial to also keep in mind that other AWS users might share the same physical server that runs your data (as nicely explained by Adrian Cockcroft here). This is especially true if you have a small-to-medium dataset, because instances between m1.small and m1.large are much more likely to be shared on a physical server than large instances like m2.2xlarge and m2.4.xlarge, which typically run on a dedicated physical server. Your “neighbours” may become “noisy” once they start consuming excess I/O and CPU resources from your physical server. In addition, small-to-medium instances are by nature weaker in processing power than large instances.

Only two comments:

  1. it’s not clear if there were multiple instances of Redis used per machine when the chosen instances had multi-cores
  2. I would have really liked to also have a pricing comparison in the conclusion section

Original title and link: Redis and Memcached Benchmark on Amazon Cloud (NoSQL database©myNoSQL)


A Quick Test of the New MySQL Memcached Plugin With (J)Ruby

Gabor Vitez:

With a new post hitting Hacker News again on MySQL’s memcached plugin, I really wanted to do a quick-and-dirty benchmark on it, just to see what good it is – does this interface offer any extra speed when compared to SQL+ActiveRecord? Does it have it’s place in the software stack? How much work is needed to get this combination off the ground?

When running into micro-benchmarks, I really try my best to figure out if they hide any value. But it’s hard to find any in one that uses different libraries, runs both the benchmark and the server on the same machine and uses no concurrency.

Original title and link: A Quick Test of the New MySQL Memcached Plugin With (J)Ruby (NoSQL database©myNoSQL)


New JRuby Memcached Gem

Richard Huang about a new memcached gem built on top of spymemcached for JRuby that is compatible with Evan Weaver’s memcached gem :

Quickly I replaced xmemcached to spymemcached, and memcached get time decreased to 40+ ms and it only generates 2 threads, awesome. And its hash and distribution algorithms are 100% compatible to libmemcached 0.32.

I just went through a similar experience trying to find a memcached library that works with both Ruby MRI and JRuby. I’ve ended up not finding it.

Original title and link: New JRuby Memcached Gem (NoSQL database©myNoSQL)


Pinterest Architecture Numbers

Todd Hoff caught some new numbers about Pinterest architecture and from those the ones interesting from the data point of view:

  • 125 EC2 memcached instances, from which 90 for production and 35 for internal usage:

    Another 90 EC2 instances are dedicated towards caching, through memcache. “This allows us to keep a lot of data in memory that is accessed very often, so we can keep load off of our database system,” Park said. Another 35 instances are used for internal purposes.

  • 70 master MySQL databases on EC2

    • sharded at 50% capacity
    • backup databases in different regions

    Behind the application, Pinterest runs about 70 master databases on EC2, as well as another set of backup databases located in different regions around the world for redundancy.

    In order to serve its users in a timely fashion, Pinterest sharded its database tables across multiple servers. When a database server gets more than 50% filled, Pinterest engineers move half its contents to another server, a process called sharding. Last November, the company had eight master-slave database pairs. Now it has 64 pairs of databases. “The sharded architecture has let us grow and get the I/O capacity we need,” Park said.

  • 80 million/410TB objects stored in S3

  • no details about Redis

Original title and link: Pinterest Architecture Numbers (NoSQL database©myNoSQL)

Algorithm for Automatic Cache Invalidation

Jakub Łopuszański describes in much detail and with examples an algorithm for cache invalidation:

Imagine a bipartite graph which on the left hand side has one vertex per each possible subspace of a write query, and on the right side has vertices corresponding to subspaces of read queries. Actually both sets are equal, but we will focus on edges.

Edge goes from left to right, if a query on the left side affects results of a query on the right side. As said before, both sets are infinite, but that’s not the problem. There are infinitely many edges, but it’s also not bad. What’s bad is that there are nodes on the left side with the infinite degree, which means, we need to invalidate infinitely many queries. What the above tricky algorithm does, is adding a third layer to the graph, in the middle between the two, such that the transitive closure of the resulting graph is still the same (in other words: you can still get by using two edges anywhere you could by one edge in the original graph), yet each node on the left, and each node on the right, have finite (actually constant) degree. This middle layer corresponds to the artificial subspaces with “?” marks, and serves as a connecting hub for all the mess. Now, when a query on the left executes, it needs to inform only its (small number of) neighbours about the change, moving the burden of reading this information to the right. That is, a query on the right side needs to check if there is a message in the “inbox” in the middle layer. So you can think about it as a cooperation where the left query makes one step forward, and the right query does a one step back, to meet at the central place, and pass the important information about the invalidation of cache.

I’m still in front of a piece of paper understanding how it works.

Original title and link: Algorithm for Automatic Cache Invalidation (NoSQL database©myNoSQL)