NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Recipes for Using NoSQL Solutions

The guys from Hashrocket — same guys that brought us the MongoDB Ruby libraries comparison — got featured in a ☞ CNET article about revitalizing a pharma project by replacing the existing relational database with a document database: MongoDB:

By moving the main data into a hierarchy in MongoDB, they were able to read the same data in a single query versus a combination of joins, sub-selects, and separate queries of the existing database. This in turn solved their immediate database issue and also helped to future-proof the application.

Now that sounds a bit more like some data remodeling and some separation of commands and queries, but the article is too short on details to jump to any conclusions.

But what I’ve found really interesting is the 3 short recipes to help decide if a NoSQL solution would represent a good fit for a problem:

  1. Data models that can be visualized as a tree where the majority of the data exists in the context of a single root node may be a good case for a non-relational data store.
  2. If you have an extremely large data set and are looking for performance gains through de-normalization, a nonrelational database may be a fit.
  3. Applications that don’t need multi-object transactions at the database level are also good candidates for a nonrelational store

I’d suggest though some small amendments:

  • the size of the data is not really relevant for recipe #2 as by de-normalization the size of data you’ll have to deal with would be bigger. The important aspect of this recipe is de-normalization in general. In all read intensive apps (i.e. more reads than write), de-normalization will speed things up.
  • I think that the negation of the 3rd recipe might work even better: “if your app requires multi-object transactions then NoSQL will probably not be the best solution”

I have found another set of recipes in ☞ Ilya Sterin’s intro to NoSQL and CouchDB:

  1. Highly dynamic structure
  2. Data model is not very relational.
  3. Your relational schema is denormalized to the point where you’re no longer benefiting from relational database features like enforcing consistency and reducing redundancy in the data.
  4. Your relational database is bending backwards to accommodate your read/write throughput even after you denormalized
  5. You continuously marshall/unmarshall data (ORM???) to persist it and then to transform it to another data format.

As you can notice the de-normalization argument appears in both lists, Ilya also adding the “unstructure” and “data transformation” recipes. The only one that I would have liked to be more detailed is the 2nd recipe above as, as far as I can say, the world we are modeling is highly connected and it is usually the case that we only preserve this partially and not vice-versa.

But even before getting to these hopefully helpful checklists, you should follow your common sense rules, described so well by Mathias Meyer in ☞ this post:

  • no size fits all
  • don’t believe everything you hear
  • it’s not about speed and scaling
  • don’t compare apples and oranges