NoSQL debate: All content tagged as NoSQL debate in NoSQL databases and polyglot persistence
Last week, in the CouchDB case studies, based on a single twit, I was mentioning a very interesting CouchDB use case related to the Palm webOS. Now the ☞ Palm Developer Center Blog is giving more details about an upcoming webOS native JSON storage named db8 which is designed to sync with CouchDB in the cloud:
db8: what if you had access to a fantastic performant native JSON store? That is where db8 comes in, our new open source JSON datastore that includes: - Native JSON storage with query mechanism - Built-in primitives for easy cloud syncing (Easily query changed / deleted data, Designed to sync with CouchDB in the cloud) - Fine-grained access control for apps - Mobile-optimized and fast (especially for updates) - Pluggable back-end
Update: In a ☞ recent article on ArsTechnica, Ryan Paul expresses his concerns related to using CouchDB for desktop configuration storage and synching:
CouchDB can’t seem to handle the load of Gwibber’s messages, leading to excessive CPU consumption and poor performance in certain cases. For example, the overhead of computing the views causes lag when the user switches streams after Gwibber refreshes. The cost of pulling the account configuration data out of the database can also sometimes cause a noticeable lag that lasts up to four or five seconds when opening Gwibber’s account manager.
I’d really love to hear from CouchDB experts some comments related to these concerns.
Update 2: Make sure you are reading the comment below that clarifies the above reported issues.
David Jensen mentions in ☞ his notes on a Riak presentation:
If you’re a small team, unless you’re an Erlang shop, one downside to Riak is that it is primarily written in Erlang and C. Why is this a downside? I’ve heard a valid recommendation that when you are using these new NoSQL products, it really helps to know the language it was written in so that you can help track down the source of bugs (and maybe even submit patches). If you use the language it was written in on a daily basis, it makes that job much easier.
While many will probably dismiss immediately such a concern — basically the simplest counter-question would be: how many times have you had to debug your database? — I do feel that, psychologically at least, this is a valid concern.
Most of the NoSQL solutions are still quite young with 0 something version and that makes you ask how many 0.something solutions are you basing your project on?. For many of these NoSQL projects there are not so many experts around and that raises the questions: how quick can I get someone to help? how expensive will it be? will he/she be able to solve my problem?
So I’d say that every responsible software engineer will be a bit concerned about using a solution built on a language that is not known by anyone in the small dev team.
The real question is will this stop NoSQL adoption? . The answer is definitely NO, because we like shiny new toys and we like to hack things and even more importantly we start realizing that there are use cases where NoSQL solutions will make our lives much much easier.
I don’t have a crystal ball, but I’ve already said it a couple of times that any project that can be connected to Google’s MapReduce patent should try to get a license from them or have this aspect very well clarified. And it looks like ☞ the Apache Software Foundation moved pretty fast for its Hadoop project:
To: ASF Board
Several weeks ago I sought clarification from Google about its
recent patent 7,650,331 [“System and method for efficient large- scale data processing”] that may be infringed by implementation of
the Apache Hadoop and Apache MapReduce projects. I just received
word from Google’s general counsel that “we have granted a license
for Hadoop, terms of which are specified in the CLA.”
I am very pleased to reassure the Apache community about Google’s
continued generosity and commitment to ASF and open source. Will
someone here please inform the Apache Hadoop and Apache MapReduce
projects that they need not worry about this patent.
Wondering when others will make their move!
A month ago I was writing about one of those catchy articles NoSQL wants to be elastic caching when it grows up arguing that if it is something to happen in this space, it will be that elastic caching solutions will look more seriously into persistency.
Nati Shalom (Gigaspaces CTO, @natishalom), has recently published a new article about RAM being the new enterprise persistence. As far as I can tell most of the decisions are based on the research paper The case for RAMClouds (pdf):
By integrating GigaSpaces XAP with the Cisco UCS machine we are demonstrating our ability to easily load hundreds of gigabytes into a single box, and to scale linearly with growing capacity without any performance degradation. This is a great example of how middleware that was built for memory from the ground up, combined with hardware that was equipped to provide terabytes of memory in a single box, can be game changing.
This exciting combination makes it possible to manage 15-20x the amount of data in-memory, per partition. This, in turn, makes it possible to store the entire application data set in‑memory, and gain not only 10x the performance but also great simplicity, because the application no longer needs to deal with a miss ratio in the cache; and, at the same time, there are no consistency issues because all the data resides in-memory.
This is indeed an interesting argument and one that I’m not going to argue against. But it still feels like elastic caching or in-memory elastic databases will remain just a part of the software equation:
- even if the price of RAM has continued to decrease, the machines mentioned do not sound like commodity hardware so you’ll have to balance the costs with the value of data
- it still sounds like vertical scaling (nb not saying that vertical scaling is always bad)
- there will always be data that will fit better on disk (e.g. video)
- the more data will be accumulated the more you’d like to make sure that querying it (nb online or offline) is not expensive
-  According to the original article the following solutions were considered as being part of elastic cache: IBM eXtremeScale, Gigaspaces, Terracotta, Microsoft Velocity, Hazelcast, NCache, Infinispan (↩)
I’ve been offline for the last couple of days, just to discover that by now the RDBMS are dead, or NoSQL is dead, or vim is better than emacs, or…. No, wait, I think it is just something broken with the internet again!
If you haven’t done a debugging session in a while, this time it might even be fun! I think everything started with the following fragment from an ☞ interview with Joe Stump (CTO of SimpleGeo, ex-Digg):
Essentially, there are a lot of people out there that are “using MySQL,” but they’re using it in a very, very NoSQL manner. Like at Digg, for instance, joins were verboten, no foreign key constraints, primary key look-ups. If you had to do ranges, keep them highly optimized and basically do the joins in memory. And it was really amazing. For instance, we rewrote comments about a year-and-a-half ago, and we switched from doing the sorting on a MySQL front to doing it in PHP. We saw a 4,000 percent increase in performance on that operation.
While this could have ended with lots of questions like what’s going on behind the curtains at Digg and some investigations around to see why Digg is looking into Cassandra (nb something that they haven’t really been secretive about), the problem is that these sort of statements are always providing way too little context to allow an informed opinion and they make up for great titles.
So, it wasn’t long until someone completely ignoring the lack of context ☞ has tried to prove the above statement as incorrect. While I couldn’t find much value in the published benchmark, I have at least re-read a confirmation that lots of RAM and SSD can help.
Digg’s case is an example of an entry-level RDBMS product used arguably suboptimally on under-powered hardware, and it seems questionable whether it proves anything of substance about either database technology. Yet it’s held as demonstrative of something — in particular the failing of the RDBMS — which is why I focus on it. They are different tools in the toolbox, arguably for different purposes, and that isn’t the focus of this entry.
Even if Joe Stump followed up with ☞ some more arguments, by this time the conversation showed visible signs of being broken and leading towards the “apocalyptical” and funny, but serious in intent, ☞ I Can’t wait for NoSQL to die.
Never mind of course that MySQL was the perfect solution to everything a few years ago when Ruby on Rails was flashing in the pan. Never mind that real businesses track all of their data in SQL databases that scale just fine. (For Silicon Valley readers, Walmart is a real business, Twitter is not.)
While there have been a couple of attempts from multiple camps to continue a balanced conversation, by this time the “religious war” was on.
As entertaining as these vim vs emacs, object oriented vs functional programming, NoSQL vs RDBMS conversations are, I still wish that at the end of the day we will remind ourselves that we are all engineers and none of these are productive discussions if they don’t lead to better understanding the other camp.
-  Just a couple of examples: ☞ DIGG: 4000% PERFORMANCE INCREASE BY SORTING IN PHP RATHER THAN MYSQL, ☞ How NoSQL Beats out MySQL for Social Networking, etc. (↩)
Ted Dziuba: ☞ I Can’t Wait for NoSQL to Die (↩)
- Eric Evans: ☞ Haters Gonna Hate
- Dare Obasanjo: ☞ The NoSQL Debate: Automatic vs Manual Transmission
- Stephan Schmidt: ☞ Why NoSQL Will Not Die
- Anders Karlsson: ☞ OK, you have waited long enough, here’s my take on NoSQL
- m3mnoch: ☞ Not Everyone Using noSQL is a Rails-Lovin’ Ass-Clown
- Jeremy Zawodny: ☞ NoSQL is Software Darwinism
- Chris Storm: Getting started with node.js and CouchDB ☞ part 1 and ☞ part 2. You cannot accuse him for having fun! ¶
- blog.isabel-drost.de: ☞ Bob Schulze on Tips and patterns with HBase ¶
- Dennis Forbes: ☞ Fighting The NoSQL Mindset, Though This Isn’t an anti-NoSQL Piece. ¶
I am still not sure the post deserved linking, but it generated too much noise around. Personally I’m in complete agreement with ☞ @coda:
My tests, with data sets 2-3 orders of magnitude smaller than yours, zero concurrent writers, and a single reader, indicate you suck.