NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



InnoDB: All content tagged as InnoDB in NoSQL databases and polyglot persistence

You want NoSQL? I’ll give you memcached

Tony Darnell in Use MySQL to store NoSQL and SQL data in the same database using memcached and InnoDB | Scripting MySQL:

With MySQL version 5.6 (and above), you have the ability to store and retrieve NoSQL data, using NoSQL commands, while keeping the data inside a MySQL InnoDB database. So, you can use NoSQL and SQL at the same time, on the same data, stored in the same database. And the beauty is that it takes just a few minutes to setup. This post will provide you with a quick lesson on how to setup NoSQL on a MySQL InnoDb database.

I see this trivialization of the term NoSQL quite frequently in the communications signed by Oracle: “Oh, you want NoSQL? Take memcached. Now shut up!” This is quite disrespectful to their customers and the developer community in general.

Original title and link: You want NoSQL? I’ll give you memcached (NoSQL database©myNoSQL)

RW locks are hard

Mark Callaghan continues his research and benchmarking of MongoDB, TokuMX, and InnoDB. This post focuses on the impact of locks in MongoDB and the different solutions that were implemented over time in InnoDB. Fantastic read.

MongoDB and TokuMX saturated at a lower QPS rate then MySQL when running read-only workloads on a cached database with high concurrency. Many of the stalls were on the per-database RW-lock and I was curious about the benefit from removing that lock. I hacked MongoDB to not use the RW-lock per query (not safe for production) and repeated the test. I got less than 5% more QPS at 32 concurrent clients. I expected more, looked at performance with PMP and quickly realized there were several other sources of mutex contention that are largely hidden by contention on the per-database RW-lock. So this problem won’t be easy to fix but I think it can be fixed.

Original title and link: RW locks are hard (NoSQL database©myNoSQL)


InnoDB Memcached plugin benchmark: 1mil QPS with in MySQL 5.7.3

On the InnoDB team blog:

As you probably already know, in MySQL 5.7.3 release, InnoDB Memcached reached a record of over 1 million QPS on a read only load. The overview of the benchmark and testing results can be seen in an earlier blog by Dimitri. In this blog, I will spend sometime on the detail changes we have made to achieve this number.

There’s another post detailing the benchmark:

The test was executed in “standalone” mode (both server and client are running on the same server). So, we used our biggest HW box we have in the LAB - a 48cores machine.

That’s a_good_ number. But if you think about it, the per-core QPS is not that high; if I remember correctly Redis can go up to 70k/s.

Original title and link: InnoDB Memcached plugin benchmark: 1mil QPS with in MySQL 5.7.3 (NoSQL database©myNoSQL)

A Story of MySQL and InnoDB at Facebook Told by Mark Callaghan

Just whetting your apetite for this interview with Mark Callaghan about MySQL, InnoDB, and his work at Facebook:

Q: How do you make MySQL both “less slow” and “faster” at the same time?

A: I ask questions like, “If I can make it do 10 things per second today, can I make it do 20 things per second tomorrow?” For example, we used to use an algorithm that is very CPU intensive to check database pages. Another person on my team, Ryan Mack, modified it to use hardware support on X86 processors so we could profile the servers in production to see what they were doing in these computing checksums.  We then realized that the newest CPUs had a faster way to do that, so we modified the MySQL to use the CRC32 for checksums. The hard part there was upgrading the servers on the fly from using the old check zones to the new checksums without taking the site down.

Exciting and scary.

Original title and link: A Story of MySQL and InnoDB at Facebook Told by Mark Callaghan (NoSQL database©myNoSQL)


WhySQL: MySQL/InnoDB ACID Guarantees for Evernote

Dave Engberg has published on the Evernote Techblog a post explaining why the Atomicity, Consistency, and Durability characteristics of a single replicated MySQL/InnoDB deployment are essential to the way Evernote operates.

While it’s difficult to argue about a technical decision with so little details available, I still wanted to point out a couple of things:

  1. Atomicity: most of the NoSQL databases offer atomic operation at the level of a single record. For distributed systems that do not want to rely on 2PC, it is the multi-row atomic operations that are not supported.

    The example presented in the post does not require multi-row transactions, but rather guaranteed client operation ordering. This is achievable in most NoSQL databases.

  2. Consistency: the post talks about data consistency from the perspective of data integrity guarantees through usage of foreign keys.

    In the world of NoSQL similar behavior could be achieved by different data modeling solutions. Using Cassandra as an example for the notebook deletion scenario, one could store all the notes of a notebook in a single Cassandra row, thus making the delete operation safe.

    It’s also worth mentioning that many of the eventually consistent NoSQL databases offer different consistent read and write operations.

  3. Durability: with just a few known exceptions, most NoSQL databases offer strong durability guarantees.

In conclusion, based only on the few details of the post, one could easily argument that a NoSQL database would fit the bill. But most of the time the reality behind is much different, making technical decisions a tad more complicated.

Original title and link: WhySQL: MySQL/InnoDB ACID Guarantees for Evernote (NoSQL database©myNoSQL)


Riak Getting LevelDB as Storage Engine

After Innostore and Bitcask, Basho guys are currently experimenting with integrating Google’s LevelDB as a storage engine for Riak. Preliminary results are looking promising:

For most Riak users, Bitcask is the obvious right storage engine to use. It provides low latency, solid predictability, is robust in the face of crashes, and is friendly from a filesystem backup point of view. However, it has one notable limitation: total RAM use depends linearly (though via a small constant) on the total number of objects stored. For this reason, Riak users that need to store billions of entries per machine sometimes use Innostore (our wrapper around embedded InnoDB) as their storage engine instead. InnoDB is a robust and well-known storage engine, and uses a more traditional design than Bitcask which allows it to tolerate a higher maximum number of items stored on a given host.

It appears that LevelDB may become a preferred choice for Riak users whose data set has massive numbers of keys and therefore is a poor match with Bitcask’s model. Performance aside, it compares favorably to InnoDB on other issues such as permissive license and operational usability.

Original title and link: Riak Getting LevelDB as Storage Engine (NoSQL database©myNoSQL)


Oracle Drops InnoDB from MySQL Classical Edition, But Not From Community Edition

I have heard many mentioning that Oracle removed InnoDB from the MySQL classical edition version. Now, I don’t know too much about the various versions and licenses of MySQL — it looks like there are at least 5: enterprise, classical, standard, cluster carrier grade, and community — but InnoDB doesn’t seem to have been dropped from the community edition too. So, I’m not really sure this is such a big deal.[1]

What are your thoughts on this story?

Update: Basho, creator of Riak that offers a pluggable storage engine based on InnoDB, ☞ clarifies the status of InnoDB:

InnoDB is available under the GPL. Innostore, as a derivative work of Embedded InnoDB, is also available under the GPL. Neither Oracle nor Basho can take that away from you.

  1. If everyone would actually be forced to go back at using MyISAM, that would be a bit more interesting as it would mean MySQL will be less durable and consistent.  ()

Original title and link: Oracle Drops InnoDB from MySQL Classical Edition, But Not From Community Edition (NoSQL databases © myNoSQL)

Details About Riak Innostore and Bitcask Backends

With the recent release 0.11.0 Riak switched the default backend storage from using embedded Innostore to Bitcask.

Andy Gross and johne had a very interesting conversation about the differences between Innostore and Bitcask Riak backend stores:

innostore currently creates a file per bucket/partition combo but all other backends use one file per partition unless you really want innostore, we recommend you use bitcask one other thing with buckets: buckets dont consume any resources as long as they use the bucket defaults - either the stock riak defaults or ones you set in your app.config buckets that change some of those defaults take up a small amount of space in the ring data structure that’s gossiped around

Details About Riak Innostore and Bitcask Backends originally posted on the NoSQL blog: myNoSQL


NoSQL and RDBMS: Learn from Others’ Experience

I firstly thought that Innostore[1], the embedded InnoDB from Basho, is just another cool project they’ve made available to the community. It was only after a couple of days that I realized that Innostore is in fact one option for the pluggable Riak backend storage engines. That definitely made me think more about this decision.

Luckily enough, David Smith from Basho has already took the time to explain ☞ the reasons that brought Riak to use InnoDB as one of its storage engines:

1. predictability and 2. stability. […] we need something that is going to have predictable latency under significant loads. After evaluating TokyoCabinent (TC), BerkeleyDB-C (BDB) and Embedded Inno, it was quite clear that Inno won this aspect hands down.

You’ll notice pretty much the same arguments in this post about ☞ MySQL usage at Flickr:

  • it is a very well known component. When you’re scaling a complex app everything that can go wrong, will. Anything which cuts down on your debugging time is gold. All of MySQL’s flags and stats can be a bit overwhelming at times, but they’ve accumulated over time to solve real problems.
  • it’s pretty darn fast and stable. Speed is usually one of the key appeals of the new NoSQL architectures, but MySQL isn’t exactly slow (if you’re doing it right). I’ve seen two large, commercial “NoSQL” services flounder, stall and eventually get rewritten on top of MySQL. (and you’ve used services backed by both of them)

As a side note, that last sentence reminded me of the migration Hashrocket team has completed for a pharma company.

Last, but not least, you can also take a look at this ☞ Yahoo! benchmark that includes MySQL and, if I’m not misinterpreting those results, you’ll notice that for some of them MySQL performed quite well.

I guess what we can learn from all these is:

  • not all traditional storage engines are as bad as we sometimes want to think of them
  • it is probably the complete feature set of the RDBMS that are making them overkill for some projects
  • there are still a lot of scenarios in which an RDBMS makes sense

Strange post for a NoSQL centric blog, isn’t it?