NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



NoSQL debate: All content tagged as NoSQL debate in NoSQL databases and polyglot persistence

Is NoSQL Just for a Small Niche?

NoSQL is not a bad idea, but it fills a pretty small niche. A much smaller niche than I thought before. If you are considering a NoSQL implementation, you should probably satisfy several of the following conditions:

  • willing to develop in house expertise in NoSQL storage, monitoring, backups, analysis, tuning
  • large dataset
  • a lot of unstructured data
  • no schema design

Leaving aside the fact that “niche” can be a subjective term in this context (i.e. is 20% of the companies handling 80% of the data a niche?), these may be valid concerns raised by someone looking at NoSQL.

Every tech shop looking to adopt a new technology that helps them solve a real problem will have to develop some form of in house expertise. Indeed the availability of tools, support and external resources is important.

These days when the quantity of digital data is growing exponentially, trying to control the format of the data seems like an attempt to swim against the river. That’s not to say that structure doesn’t matter, but only that instead of spending time putting order into chaos, we would be better off solving our problems at hand.

Concluding, NoSQL is definitely not a silver bullet and is not here to replace any of the existing technologies and while concerns about new technologies have always existed, it is always useful to have the right tools around.


When should I use MongoDB?

Leaving aside a couple of small warnings, Brandon Keepers’ answer to the question when should I use MongoDB is:


No, seriously!?

OK, I think MongoDB makes sense with most web applications. In the end, most apps are just doing glorified CRUD, and don’t need ACID or many of the features of a relational database. There are times when you definitely should not use MongoDB, like when you need transactions.

NO, NO, NO. If you do that, then you’re definitely doing it wrong! Again!

Take a step back and think about this hypothetical situation: you’ve been driving the same car for the last 30 years. But recently (you’ve come out of your cave and) you discover there are other new shiny cars around, some looking and feeling and offering quite a different feature set than your old beloved car, others just being an upgraded version of your old cosy car.

Question is: would you go to the first car dealer and just get one of these “upgraded” cars? Or would you take some time to look at all these shiny new toys and see how each of them would improve your driving experience and safeness?

Ignoring for a moment my interest in shiny new things, I think we should always keep eyes open, evaluate our options and try to reach conclusions based on what we need. Remaining always inside our comfort zone will not help us experience the new, the better, the different.

So, what is my answer to the question “when should I use MongoDB”: only if it makes sense. Or expanding on this a bit, I’d say you should take your time to analyze your options and think about what you need firstly. Think about your data and your app data access patterns. Then use MongoDB only if is the best tool for your current job.


Palm webOS and CouchDB or NoSQL is Not Only About Scale

Last week, in the CouchDB case studies, based on a single twit, I was mentioning a very interesting CouchDB use case related to the Palm webOS. Now the ☞ Palm Developer Center Blog is giving more details about an upcoming webOS native JSON storage named db8 which is designed to sync with CouchDB in the cloud:

db8: what if you had access to a fantastic performant native JSON store? That is where db8 comes in, our new open source JSON datastore that includes: - Native JSON storage with query mechanism - Built-in primitives for easy cloud syncing (Easily query changed / deleted data, Designed to sync with CouchDB in the cloud) - Fine-grained access control for apps - Mobile-optimized and fast (especially for updates) - Pluggable back-end

While many still associate the whole NoSQL space with scalability or big data, these scenarios — there is also this atypical Riak usecase — are proving that NoSQL is about the best tool for the job.

Update: In a ☞ recent article on ArsTechnica, Ryan Paul expresses his concerns related to using CouchDB for desktop configuration storage and synching:

CouchDB can’t seem to handle the load of Gwibber’s messages, leading to excessive CPU consumption and poor performance in certain cases. For example, the overhead of computing the views causes lag when the user switches streams after Gwibber refreshes. The cost of pulling the account configuration data out of the database can also sometimes cause a noticeable lag that lasts up to four or five seconds when opening Gwibber’s account manager.

I’d really love to hear from CouchDB experts some comments related to these concerns.

Update 2: Make sure you are reading the comment below that clarifies the above reported issues.

NoSQL and Psychology

David Jensen mentions in ☞ his notes on a Riak presentation:

If you’re a small team, unless you’re an Erlang shop, one downside to Riak is that it is primarily written in Erlang and C. Why is this a downside? I’ve heard a valid recommendation that when you are using these new NoSQL products, it really helps to know the language it was written in so that you can help track down the source of bugs (and maybe even submit patches). If you use the language it was written in on a daily basis, it makes that job much easier.

While many will probably dismiss immediately such a concern — basically the simplest counter-question would be: how many times have you had to debug your database? — I do feel that, psychologically at least, this is a valid concern.

Most of the NoSQL solutions are still quite young with 0 something version and that makes you ask how many 0.something solutions are you basing your project on?. For many of these NoSQL projects there are not so many experts around and that raises the questions: how quick can I get someone to help? how expensive will it be? will he/she be able to solve my problem?

So I’d say that every responsible software engineer will be a bit concerned about using a solution built on a language that is not known by anyone in the small dev team.

The real question is will this stop NoSQL adoption? . The answer is definitely NO, because we like shiny new toys and we like to hack things and even more importantly we start realizing that there are use cases where NoSQL solutions will make our lives much much easier.

NoSQL: The RyanAir of RDBMS

Very interesting parallel:

I believe the analogy between NoSQL and RyanAir, although provocative, fits pretty well …

The system just did not scale anymore. That’s exactly what happened to low-cost (aka No-frills) carriers here in Europe in this last decade. The demand for connections between cities, often and quickly, without all the fuzz that nobody cares about, at least for short flights (read: simple service) just brought Ryanair, Easyjet etc to cut the unnecessary. And keep the planes doing the things they could do best, as long and as often as possible (to be honest, the business model of low-cost carrier flights comes from the 70ies, introduced by Herb Kelleher of SouthWest. Read more ).


Hadoop: Granted License for Google’s MapReduce Patent

I don’t have a crystal ball, but I’ve already said it a couple of times[1] that any project that can be connected to Google’s MapReduce patent should try to get a license from them or have this aspect very well clarified. And it looks like ☞ the Apache Software Foundation moved pretty fast for its Hadoop project:

To: ASF Board

Several weeks ago I sought clarification from Google about its
recent patent 7,650,331 [“System and method for efficient large- scale data processing”] that may be infringed by implementation of
the Apache Hadoop and Apache MapReduce projects. I just received
word from Google’s general counsel that “we have granted a license
for Hadoop, terms of which are specified in the CLA.”

I am very pleased to reassure the Apache community about Google’s
continued generosity and commitment to ASF and open source. Will
someone here please inform the Apache Hadoop and Apache MapReduce
projects that they need not worry about this patent.

Wondering when others will make their move!

About the NoSQL Hype Cycle

Interesting post from Steve Francia on the NoSQL hype cycle[1]:

Technology trigger

As Facebook, Twitter and others saw them as solutions to their massive scalability solutions (and because they were using relational databases for things they shouldn’t have) people began to see NoSQL as a golden hammer

Peak of inflated expectations

Unfortunately knowing when to use the technology requires actual experience with it, which never seems to catch up to the hype engine quickly enough, so consequently the technology transforms into a “golden hammer”. Better at everything and ready to displace everything that existed before.

Trough of Disillusionment

Current technologies exist because they do something well. when a new technology emerges it will likely be good at a different thing meaning the two will co-exist.

and so on:

hype cycle



What NoSQL is NOT good for

Wondering if these arguments are valid? Let’s take a look at each of them.

Face it, you’re not Google

investing a lot of time and resources in some technology whose main benefit is something you’re not going to ever need is aking to buying a supercar capable of hitting 300 kilometers per hour.

The argument alone is correct, but it’s a straw man. While some NoSQL solutions are known to be scalable, I’d say the main differentiator of NoSQL solutions is the different data model.

So, real question is: does your data fit the relational model or would it fit better with a document store, wide column store, graph database or even key-value store?

We’ve seen all this before

Anyone remembers the XML hype wave? XML was to change the world and of course every single database conceived since then. XML was everywhere and lots of people were putting bright ideas on the table to take advantage of XML. Databases that did not embraced XML wholeheartedly would die a slow and agonizing death. Anyone remembers the object database hype wave? More of the same.

In the current formulation, I’d say this argument is completely wrong as it is confusing the hype, marketing and PR around new companies and technologies with their technical merits (not to mention that object databases are continuing to be a valid solution for a range of problems). One thing that is easily forgot when speaking about some of the NoSQL solutions is that most of them are not coming from a lab, but rather they’ve been built by companies or organization that have tried to solve real problems.

As a side note, if the question would have referred to NoSQL projects being revolutionary or evolutionary then the debate would have been more interesting.

Better the devil you know

Relational databases have been around for something like 30 years. There are extensive bodies of knowledge and code that have been debugged to death during decades. There are lots of vendors and skilled resources available to deal with them. There are extensive bug databases covering decades-old releases. There are compatibility suites, industry benchmarks and all kinds of useful methodologies and devices to deal with them.

I do agree with the fact that the ecosystem around relational databases is extremely rich and provides you with almost everything you need. But that doesn’t make a hammer be something different. If your data doesn’t fit a relational database, all the knowledge and tools around will at most alleviate a bit the pain your project will go through to make your data fit in.

As a side note, with this kind of argument, PCs would have been dismissed and we could still count on one hand the number of programming languages.

There are many situations in which NoSQL is not the best option, but the only way to decide that is to look at the problem you are trying to solve.


In-Memory Elastic Databases

A month ago I was writing about one of those catchy articles NoSQL wants to be elastic caching when it grows up arguing that if it is something to happen in this space, it will be that elastic caching solutions[1] will look more seriously into persistency.

Nati Shalom (Gigaspaces CTO, @natishalom), has recently published a new article about RAM being the new enterprise persistence. As far as I can tell most of the decisions are based on the research paper The case for RAMClouds (pdf):

By integrating GigaSpaces XAP with the Cisco UCS machine we are demonstrating our ability to easily load hundreds of gigabytes into a single box, and to scale linearly with growing capacity without any performance degradation. This is a great example of how middleware that was built for memory from the ground up, combined with hardware that was equipped to provide terabytes of memory in a single box, can be game changing.

This exciting combination makes it possible to manage 15-20x the amount of data in-memory, per partition. This, in turn, makes it possible to store the entire application data set in‑memory, and gain not only 10x the performance but also great simplicity, because the application no longer needs to deal with a miss ratio in the cache; and, at the same time, there are no consistency issues because all the data resides in-memory.

This is indeed an interesting argument and one that I’m not going to argue against. But it still feels like elastic caching or in-memory elastic databases will remain just a part of the software equation:

  • even if the price of RAM has continued to decrease, the machines mentioned do not sound like commodity hardware so you’ll have to balance the costs with the value of data
  • it still sounds like vertical scaling (nb not saying that vertical scaling is always bad)
  • there will always be data that will fit better on disk (e.g. video)
  • the more data will be accumulated the more you’d like to make sure that querying it (nb online or offline) is not expensive


  • [1] According to the original article the following solutions were considered as being part of elastic cache: IBM eXtremeScale, Gigaspaces, Terracotta, Microsoft Velocity, Hazelcast, NCache, Infinispan ()

Broken Conversation: RDBMS vs NoSQL

I’ve been offline for the last couple of days, just to discover that by now the RDBMS are dead, or NoSQL is dead, or vim is better than emacs, or…. No, wait, I think it is just something broken with the internet again!

If you haven’t done a debugging session in a while, this time it might even be fun! I think everything started with the following fragment from an ☞ interview with Joe Stump (CTO of SimpleGeo, ex-Digg):

Essentially, there are a lot of people out there that are “using MySQL,” but they’re using it in a very, very NoSQL manner. Like at Digg, for instance, joins were verboten, no foreign key constraints, primary key look-ups. If you had to do ranges, keep them highly optimized and basically do the joins in memory. And it was really amazing. For instance, we rewrote comments about a year-and-a-half ago, and we switched from doing the sorting on a MySQL front to doing it in PHP. We saw a 4,000 percent increase in performance on that operation.

While this could have ended with lots of questions like what’s going on behind the curtains at Digg and some investigations around to see why Digg is looking into Cassandra (nb something that they haven’t really been secretive about), the problem is that these sort of statements are always providing way too little context to allow an informed opinion and they make up for great titles[1].

So, it wasn’t long until someone completely ignoring the lack of context ☞ has tried to prove the above statement as incorrect. While I couldn’t find much value in the published benchmark, I have at least re-read a confirmation that lots of RAM and SSD can help.

Digg’s case is an example of an entry-level RDBMS product used arguably suboptimally on under-powered hardware, and it seems questionable whether it proves anything of substance about either database technology. Yet it’s held as demonstrative of something — in particular the failing of the RDBMS — which is why I focus on it. They are different tools in the toolbox, arguably for different purposes, and that isn’t the focus of this entry.

Even if Joe Stump followed up with ☞ some more arguments, by this time the conversation showed visible signs of being broken and leading towards the “apocalyptical” and funny, but serious in intent, ☞ I Can’t wait for NoSQL to die.

Never mind of course that MySQL was the perfect solution to everything a few years ago when Ruby on Rails was flashing in the pan. Never mind that real businesses track all of their data in SQL databases that scale just fine. (For Silicon Valley readers, Walmart is a real business, Twitter is not.)

While there have been a couple of attempts from multiple camps to continue a balanced[2] conversation, by this time the “religious war” was on.

As entertaining as these vim vs emacs, object oriented vs functional programming, NoSQL vs RDBMS conversations are, I still wish that at the end of the day we will remind ourselves that we are all engineers and none of these are productive discussions if they don’t lead to better understanding the other camp.

MongoDB, SQL and ... Market Positioning?

Trying to answer why MongoDB is not using SQL as its query language:

The main reason we went the way we did with the query language - representing queries as JSON - was to normalize the data we are storing with the query mechanism. If we are storing JSON in the database, can we not represent the queries that way too? We thought that made sense.

Is that a strong enough argument? I don’t think so.

What I’ll say next will sound strange, but I think this was more of a branding or market positioning kind of decision.

If MongoDB would have used SQL, what would have been its major differentiator from existing RDBMS? The BSON based storage? I don’t think people would really care about it. Support for MapReduce? Probably. Performance? Probably. Lacking the advanced SQL statements and tools? Quite possible.

MongoDB going with SQL would have made it look like the “ugly duck” of the RDBMS world. And that’s definitely not a good positioning for a new product.

Anyway this is just a crazy personal hypothesis.


NoSQL News & Links 2010-03-26

  1. Chris Storm: Getting started with node.js and CouchDB ☞ part 1 and ☞ part 2. You cannot accuse him for having fun!
  2. ☞ Bob Schulze on Tips and patterns with HBase
  3. Dennis Forbes: ☞ Fighting The NoSQL Mindset, Though This Isn’t an anti-NoSQL Piece.

    I am still not sure the post deserved linking, but it generated too much noise around. Personally I’m in complete agreement with ☞ @coda:

    My tests, with data sets 2-3 orders of magnitude smaller than yours, zero concurrent writers, and a single reader, indicate you suck.