benchmark: All content tagged as benchmark in NoSQL databases and polyglot persistence
I firstly thought that Innostore, the embedded InnoDB from Basho, is just another cool project they’ve made available to the community. It was only after a couple of days that I realized that Innostore is in fact one option for the pluggable Riak backend storage engines. That definitely made me think more about this decision.
Luckily enough, David Smith from Basho has already took the time to explain ☞ the reasons that brought Riak to use InnoDB as one of its storage engines:
1. predictability and 2. stability. […] we need something that is going to have predictable latency under significant loads. After evaluating TokyoCabinent (TC), BerkeleyDB-C (BDB) and Embedded Inno, it was quite clear that Inno won this aspect hands down.
You’ll notice pretty much the same arguments in this post about ☞ MySQL usage at Flickr:
- it is a very well known component. When you’re scaling a complex app everything that can go wrong, will. Anything which cuts down on your debugging time is gold. All of MySQL’s flags and stats can be a bit overwhelming at times, but they’ve accumulated over time to solve real problems.
- it’s pretty darn fast and stable. Speed is usually one of the key appeals of the new NoSQL architectures, but MySQL isn’t exactly slow (if you’re doing it right). I’ve seen two large, commercial “NoSQL” services flounder, stall and eventually get rewritten on top of MySQL. (and you’ve used services backed by both of them)
As a side note, that last sentence reminded me of the migration Hashrocket team has completed for a pharma company.
Last, but not least, you can also take a look at this ☞ Yahoo! benchmark that includes MySQL and, if I’m not misinterpreting those results, you’ll notice that for some of them MySQL performed quite well.
I guess what we can learn from all these is:
- not all traditional storage engines are as bad as we sometimes want to think of them
- it is probably the complete feature set of the RDBMS that are making them overkill for some projects
- there are still a lot of scenarios in which an RDBMS makes sense
Strange post for a NoSQL centric blog, isn’t it?
- Create the initial append only file from your dataset just issuing:
- When the rewrite is done (you can see it from the INFO command output) stop the server
- Edit redis.conf in order to enable append only
- Restart the server
Just a few days after Redis 1.2.0 was released, Rediska, a PHP client for Redis that provides full integration with Zend, the popular PHP framework that is also looking to integrate with CouchDB and MongoDB, has announced the 0.3.0 release  featuring :
- Full support Redis 1.2.0 API
- Operate with keys on specified (by alias) server
- Specify DB index in server config
- Easy extending Rediska by adding you own or overwrite standart commands
- Lazy loading
- Full documentation
Credit Chris Streeter
I think the conclusion is wrong as it is based on comparing the real-time figures (wall time elapsed between invocation and termination). I’d say comparing total times (user + sys) would be more correct.
Update: @codemonkeyism has pointed out yet another reason for this benchmark being wrong:
“As far as I know CouchDB data is durable, but MongoDB is primarily memory and then stored and corruptable - are those comparable?”.
After posting about Scott Motte’s comparison of MongoDB and CouchDB, I thought there should be some more informative sources out there, so I’ve started to dig.
The first I came upon (thanks to Debasish Ghosh @debasishg) is an article about ☞ Raindrop requirements and the issues faced while attacking them with CouchDB and the pros and cons of possibly replacing CouchDB with MongoDB:
- Uses update-in-place, so the file system impact/need for compaction is less if we store our schemas in one document are likely to work better.
- Queries are done at runtime. Some indexes are still helpful to set up ahead of time though.
- Has a binary format for passing data around. One of the issues we have seen is the JSON encode/decode times as data passes around through couch and to our API layer. This may be improving though.
- Uses language-specific drivers. While the simplicity of REST with CouchDB sounds nice, due to our data model, the megaview and now needing a server API layer means that querying the raw couch with REST calls is actually not that useful. The harder issue is trying to figure out the right queries to do and how to do the “joins” effectively in our API app code.
- easy master-master replication. However, for me personally, this is not so important. […] So while we need backups, we probably are fine with master-slave. To support the sometimes-offline case, I think it is more likely that using HTML5 local storage is the path there. But again, that is just my opinion.
Anyway while some of the points above are generic, you should definitely try to consider them through the Raindrop requirements perspective about which you can read more here.
I’d also mention this ☞ benchmark comparing the performance of MongoDB, CouchDB, Tokyo Cabinet/Tyrant (note: the author of the benchmark is categorizing Tokyo Cabinet as a document database, while Tokyo is a key-value store) and uses MySQL results as a reference.
In case you have other resources that you think would be worth including do not hesitate to send them over.
Update: Just found a nice comparison matrix .
As a teaser, very soon I will introduce you to a new solution available in this space, so make sure to check MyNoSQL regularly.
Update: The main article about this new document store has been published: Terrastore: A Consistent, Partitioned and Elastic Document Database. I would strongly encourage you to check it, as Terrastore is looking quite promising.
Back when I was writing the ☞ Quick Reference to Alternative data storages, I have searched the internet for benchmark results probably more deeply than Google does it. And I couldn’t find much.
Things seem to be changing lately and I start gather quite a few results (see NoSQL benchmark articles).
Redis Benchmarking on Amazon EC2, Flexiscale, and Slicehost ☞
The author of the article has managed to run the Redis benchmarks on a set of different cloud hosting providers:
- small-remote (Amazon EC2, 32b)
- small (Amazon EC2, 32b)
- slicehost-256 (Slicehost, 64b)
- quadruple-extra-large (Amazon EC2, 64b)
- large (Amazon EC2, 64b)
- high-cpu-medium (Amazon EC2, 64b)
- high-cpu-extra-large-32b-os (Amazon EC2, 32b)
- high-cpu-extra-large (Amazon EC2, 64b)
- flexiscale-2gb-4core (Flexiscale, 64b)
- flexiscale-2gb-2core (Flexiscale, 64b)
- extra-large (Amazon EC2, 64b)
- double-extra-large (Amazon EC2, 64b)
Redis Benchmarks on FusionIO ☞
It looks like the “MySQL Performance guys” are growing their passion for NoSQL systems. Now they have published the results of benchmarking Redis on FusionIO in 5 modes:
- In-Memory (
save 900000000 900000000)
- Semi-Persistent Mode 1 (
save 1 1)
- Fully persistent (
appendonly yes, appendfsync always)
- Semi-Persistent Mode 2 (
appendonly yes, appendfsync no)
- Semi-Persistent Mode 3 (
appendonly yes, appendfsync everysec)
You might find useful reading the ☞ RAID vs SSD vs FusionIO setup to better understand the environment.
Update: there is an update of these Redis benchmarks
Igal Koshevoy has made available through Github his presentation on “Non-relational data stores for OpenSQL Camp: Overview, coding and assessment: MongoDB, Tokyo Tyrant & CouchDB”.
After a short presentation of the relational and non-relational worlds, Igal jumps to presented pros and cons for each of the MongoDB, Tokyo Tyrant and CouchDB, includes code snippets for all basic operations and completes with some benchmarking results. You can read the presentation embedded below (update: it looks like Google embed doesn’t work with this document or GitHub is not allowing access to it): so for the moment you can access it in PDF format ☞ here.
Sooner or later every piece of software or programming language gets benchmarked. Some benchmarks are interesting, while others tend to be created to prove that a particular solution is better than all others (vendor benchmarks). Coming up with a fair benchmark is a hard job and trying to analyze a set of heterogenous systems is even more difficult.
K.S. Bhaskar has published a benchmark proposal called ‘3n+1 NoSQL/Key-Value/Schema-Free/Schema-Less Database Benchmark’ that, in his words,
is designed to allow for apples to apples comparisons of NoSQL databases using features that should allow many if not most NoSQL engines to be benchmarked
There have been such proposals before and most probably there will be many more to come.
While writing a quick reference to alternative storages I have tried to put together as many performance results that I could find. Unfortunately that attempt was far from being a success, so seeing this proposal and ☞ people starting to publish their results is a major step forward.
What are your thoughts about NoSQL benchmarks?