NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



benchmark: All content tagged as benchmark in NoSQL databases and polyglot persistence

Riak in the Cloud with Joyent SmartMachines

I usually don’t trust vendor benchmarks, but these Riak benchmarks look pretty much inline with Mozilla’s Riak benchmark. What is even more impressive is that these results were from running Riak on virtualized machines (the Joyent SmartMachines[1]).

Watch it for youself. Slides can be downloaded from ☞ here

  1. You can read more about Joyent SmartMachines ☞ here.  ()

Original title and link: Riak in the Cloud with Joyent SmartMachines (NoSQL databases © myNoSQL)

Redis Benchmark Ported to Memcached

Salvatore Sanfilippo:

This is a straightforward port of redis-benchmark to memcache protocol.

This way it is possible to test Redis and Memcache with not just an apple to apple comparison, but also using the exactly same mouth… :)

Does it mean that Redis is going after Memcached? I guess Membase is after the same users, so we will have some interesting competition.

Original title and link: Redis Benchmark Ported to Memcached (NoSQL databases © myNoSQL)


MongoDB and MySQL Benchmarks

Mark Callaghan has posted a 4 article series — ☞ part 1, ☞ part 2, ☞ part 3, and ☞ part 4 — comparing the the performance (queries per second) of MySQL with InnoDB and MongoDB:

I ran three micro-benchmarks: get by primary key, get by secondary key and update by primary key. MySQL had a higher peak QPS for all of them. Alas, the results for get by primary key were skewed because pymongo, the Python driver for MongoDB, uses more CPU than MySQLdb, the Python driver for MySQL. The client host was saturated during the test and this limited peak QPS to 80,000 for MongoDB versus 110,000 for MySQL.

I repeated one test using two 16-core client hosts with 40 processes per host. For that test the peak QPS on MongoDB improved to 155,000 while the peak for MySQL remained at 110,000. That is an impressive result. The results for get by secondary key and update by primary key are still valid as the server host saturated on those tests.

A couple of lessons I’ve learned:

  • if you are able to use the latest MySQL version and track different patches you can get a lot out of MySQL
  • while benchmarking, make sure you also check very carefully the drivers you are using
  • while benchmarking, pay attention to how concurrency is influencing the results
  • peak QPS is an interesting metric, but it might not be the one you are most interested about. Check out this extensive Riak benchmark run by Mozilla that covers other metrics that might be important for your app

Original title and link: MongoDB and MySQL Benchmarks (NoSQL databases © myNoSQL)

Redis: A Concurrency Benchmark

It looks like it is one of these days related to NoSQL benchmarks, as Jak Sprats shared on the Redis group his concurrency benchmark. Even if the thread doesn’t give details about the hardware, size of keys and values, the results are impressive.

This blew my mind, there is minimal performance degradation starting at 4000 concurrent requests and at 26000 concurrent requests the performance is 87.6K/s .. unbelievably good

Redis concurrency benchmark

As far as I know Redis is serializing all ops, so this is even more impressive. The part I’d be interested to see included in this benchmark is Redis virtual memory.

Original title and link for this post: Redis: A Concurrency Benchmark (published on the NoSQL blog: myNoSQL)


Extensive Riak Benchmarking at Mozilla Test Pilot

Mozilla has previously published about their detailed plan and extensive investigation into Cassandra, HBase, and Riak that led to choosing Riak. This time they are publishing some extensive Riak benchmark results (against both Riak 0.10 and Riak 0.11 running Bitcask) — they are using Riak benchmarking code, included in the list of correct NoSQL benchmarks and performance evaluations solutions. Both the results, their analysis , and interpretation are fascinating.

Our goal in running these studies was, simply put, no surprises. That meant we needed to run studies to that profiled:

  1. Latency
  2. Stability, especially for long running tests
  3. Performance when we introduced variable object sizes
  4. Performance when we introduced pre-commit hooks to evaluate incoming data

I guess Mozilla Test Pilot is one of the Riak’s most interesting case studies.

Original title and link for this post: Extensive Riak Benchmarking at Mozilla Test Pilot (published on the NoSQL blog: myNoSQL)

MongoDB and SQL Server Basic Speed Tests

Having used MongoDb almost exclusively with the NoRM C# driver for several months now, this is something that I have always wanted to do, just to satisfy my own curiosity.

Unfortunately “basic” is the wrong word. Just another useless benchmark.

Original title and link for this post: MongoDB and SQL Server Basic Speed Tests (published on the NoSQL blog: myNoSQL)


CouchDB: 5.5k inserts/sec with fire-and-forget and bulk ops

After saying that MongoDB’s default fire-and-forget behavior is wrong, CouchDB community welcomed this sample Clojure code showing 5500 inserts/second implemented with a fire-and-forget behavior and bulk inserts:

So I contemplated the problem some and wondered whether Clojure’s STM (Software Transactional Memory) could be leveraged. As requests come in, instead of connecting immediately to the database, why not queue them up until we have an optimal number and then do a bulk insert?

Measuring Redis Storage Overhead

Jeremy Zawodny (of has published two articles, ☞ here and ☞ here, sharing his experiments on measuring the recently released Redis 2.0.0 RC3 storage overhead for two scenarios:

  • simple key-values
  • hashes, a new data type that will be available with the upcoming Redis 2.0

The experiment is interesting as it shares the code used and so you’ll be able to run it for your particular scenarios. Do keep in mind that the results will vary as they depend heavily on the size of the stored values.

This tells me that on a 32GB box, it’s not unreasonable to host 200,000,000 keys (if their values are sufficiently small). […] The resulting dump file (dump-0.rdb) was 1.8GB in size.


If you do the math, that yields 1.25 billion (1,250,000,000) key/value pairs stored. […] So it took about 2 hours and 40 minutes to complete. The resulting dump file (.rdb file) was 13GB in size (compared to the previous 1.8GB) and the memory usage was roughly 17GB.

Salvatore Sanfilippo (@antirez), Redis creator and main developer, has a good explanation about the storage overhead:

If you turn a txt file with a list of “common surnames -> percentage of population” into a binary tree it will get more or less an order of magnitude bigger in memory compared to the raw txt file.

This is a common pattern: when you add a lot of metadata, for fast access, memory management, “zero-copy” transmission of this information, expires, …, the size is not going to be the one of concatenating all this data like in a unique string.


But for now our reasoning is: it’s not bad to be able to store 1 million of keys with less than 200 MB of memory (100 MB on 32bit systems) if an entry level box is able to serve this data at the rate of 100k requests/second, including the networking overhead. And with hashes we have a much better memory performance compared to top level keys. So… with a few GB our users can store ten or hundred of millions of stuff in a Redis server.

☞ Hacker News

Japanese Blogs Post Benchmarks on Membase, Memcached, Tokyo Tyrant and Redis

Two japanese blogs[1] have published some benchmarks comparing the newly released membase with memcached, Tokyo Tyrant and Redis.

Unfortunately both of them are just new examples of useless benchmarks:

  • only 1000 keys
  • the benchmark doesn’t vary the size of keys and values
  • no concurrency
  • no mixed reads/writes

I’d strongly suggest anyone planning to build a solid benchmark to take a look at these NoSQL benchmarks and performance evaluations to learn how to build useful/correct ones[2].

  1. The two benchmarks are published ☞ here and ☞ here. Unfortunately I don’t read Japanese and I’ve used Google Translator (which pretty much didn’t work)  ()
  2. Another useful resource about building correct benchmarks is Jan Lehnardt’s ☞ Benchmarks: You are Doing it Wrong  ()

NoSQL benchmarks and performance evaluations

Some say it is the right time to start having these around. Others are saying it’s way to early to start the “battle”. Users do want to see them and in case they’re lacking they create their own, most of the time using incomplete or wrong approaches.

But what am I talking about? As some of you might have guessed already:

NoSQL benchmarks and performance evaluations!

With their recent release of Riak 0.11.0, Basho guys have also published their internal ☞ benchmarking code. Similar internal benchmark code is ☞ available for MongoDB.

But users are more interested in seeing cross product benchmarks, even if most of the time constructing these is extremely complicated and they end up comparing apples with oranges.

All these being said and accepting that most of the time someone will figure out a way to invalidate the results, lets see what cross product benchmarks do we have in the NoSQL space.

Yahoo! Cloud Serving Benchmark

The Yahoo! Cloud Serving Benchmark’s goal is to facilitate performance comparisons of the new generation of cloud data serving systems. The source code is available on ☞ GitHub and Yahoo! has also published ☞ the results of running this benchmark against Cassandra, HBase, Yahoo!’s PNUTS, and a simple sharded MySQL implementation.

VoltDB Benchmark

VoltDB a new storage solution that calls itself the next-generation SQL RDBMS with ACID for fast-scaling OLTP applications has recently ☞ published the results of their benchmark comparing VoltDB and Cassandra.

It is worth noting that while being one of those apples to oranges comparisons (nb and the authors are well aware of it), there are still a couple of interesting and useful things to be learned from it (i.e. benchmarking procedure, tested scenarios, etc.)

Unfortunately at this time the source code is not yet available, but hopefully we will see it soon:

Going forward, we’re planning to release the code we used to do these benchmarks. We’d also like to try a few other storage layers

Hypertable and HBase Performance Evaluation

The guys behind Hypertable ☞ have published their results of comparing Hypertable with HBase using a benchmark based on the Google BigTable paper[1] from which both HBase and Hypertable are inheriting their architecture. Unfortunately, the benchmark code is not available at this moment.

Thanks to Stu Hood, now I know the code for this benchmark is available in the Hypertable distribution available ☞ here (tar.gz) and the configuration files are also available ☞ here (tar.gz)

So, as far as I could gather we have:

Did I miss any?

  1. The BigTable paper is available ☞ here  ()

Release: Production Ready MongoDB 1.4 Released

Judging by the number of posts I’ve seen around I’d guess you’ve already heard about the MongoDB 1.4 release[1]. Anyways, I definitely had to include it here as myNoSQL covers all major NoSQL projects and follows closely all things related to the NoSQL ecosystem.

While some MongoDB users seemed quite excited about the addition of ☞ geospatial indexing, others about some ☞ query language improvements, the things that caught my attention were:

  • background indexing and indexing improvements
  • concurrency improvements
  • the lack of autosharding (still alpha, still pushing, still…)
  • the lack of improvements or alternatives for the MongoDB durability tradeoff

Speaking of performance, the 10gen people[2] have run some benchmarks comparing MongoDB 1.2 with MongoDB 1.4. Without a couple of exceptions, the performance haven’t improved radically, so I’d speculate that there is still a lot of locking involved. The benchmark source code was made available[3] so you can dig deeper into it.

All in all, good and exciting news for the NoSQL world!

Riak and Voldemort: A (Non-Scientific) Benchmark

I’m relatively OK with benchmarks detailing the scenario, limitations and not making any additional claims.

Tests run for several minutes, hundreds of thousands of request were made to both systems. Nevertheless, the test is not at all scientific, for multiple reasons: both systems weren’t primed with billions of entries (which would make a great test on its own), different clients (Erlang native for Riak and a Java one, shipped with Voldemort) were used to access the systems, etc.

The only think missing is access to the source code used to test so that others would have a chance to push it further.