NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Broken Conversation: RDBMS vs NoSQL

I’ve been offline for the last couple of days, just to discover that by now the RDBMS are dead, or NoSQL is dead, or vim is better than emacs, or…. No, wait, I think it is just something broken with the internet again!

If you haven’t done a debugging session in a while, this time it might even be fun! I think everything started with the following fragment from an ☞ interview with Joe Stump (CTO of SimpleGeo, ex-Digg):

Essentially, there are a lot of people out there that are “using MySQL,” but they’re using it in a very, very NoSQL manner. Like at Digg, for instance, joins were verboten, no foreign key constraints, primary key look-ups. If you had to do ranges, keep them highly optimized and basically do the joins in memory. And it was really amazing. For instance, we rewrote comments about a year-and-a-half ago, and we switched from doing the sorting on a MySQL front to doing it in PHP. We saw a 4,000 percent increase in performance on that operation.

While this could have ended with lots of questions like what’s going on behind the curtains at Digg and some investigations around to see why Digg is looking into Cassandra (nb something that they haven’t really been secretive about), the problem is that these sort of statements are always providing way too little context to allow an informed opinion and they make up for great titles[1].

So, it wasn’t long until someone completely ignoring the lack of context ☞ has tried to prove the above statement as incorrect. While I couldn’t find much value in the published benchmark, I have at least re-read a confirmation that lots of RAM and SSD can help.

Digg’s case is an example of an entry-level RDBMS product used arguably suboptimally on under-powered hardware, and it seems questionable whether it proves anything of substance about either database technology. Yet it’s held as demonstrative of something — in particular the failing of the RDBMS — which is why I focus on it. They are different tools in the toolbox, arguably for different purposes, and that isn’t the focus of this entry.

Even if Joe Stump followed up with ☞ some more arguments, by this time the conversation showed visible signs of being broken and leading towards the “apocalyptical” and funny, but serious in intent, ☞ I Can’t wait for NoSQL to die.

Never mind of course that MySQL was the perfect solution to everything a few years ago when Ruby on Rails was flashing in the pan. Never mind that real businesses track all of their data in SQL databases that scale just fine. (For Silicon Valley readers, Walmart is a real business, Twitter is not.)

While there have been a couple of attempts from multiple camps to continue a balanced[2] conversation, by this time the “religious war” was on.

As entertaining as these vim vs emacs, object oriented vs functional programming, NoSQL vs RDBMS conversations are, I still wish that at the end of the day we will remind ourselves that we are all engineers and none of these are productive discussions if they don’t lead to better understanding the other camp.