NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



nosql debate: All content tagged as nosql debate in NoSQL databases and polyglot persistence

NoSQL vs SQL, Why Not Both?

Alaric Snell-Pym[1]:

But there’s no real reason why an SQL SELECT statement is difficult to implement in a replicated environment. After all, you can just run that SELECT on a nearby replica. It’s only INSERT/UPDATE/DELETE/CREATE/DROP that cause problems, because SQL comes with the expectation that those operations occur instantly upon a single global state.

Add to the list of problematics queries sub-SELECTs, JOINs and I’m not sure what remains can still be called SQL.

Also a replicated system is not necessarily equivalent to a sharded system where implementing even the simplest forms of SELECT becomes more difficult.

While from an adoption point of view (including developers’ familiarity with SQL, tooling support, existing expertise), supporting SQL makes a lot of sense, there are too many problems to be solved until it would get to a decent level. All these limitations would just be frustrating to everyone.

Sometimes it’s much better to start with a clean solution and determine over time if an integration/standard is emerging. Most practical and long lived standards out there are coming from real-live battle proven scenarios/experience.

  1. Alaric Snell-Pym: Chief Software Architect of GenieDB  ()

Original title and link: NoSQL vs SQL, Why Not Both? (NoSQL databases © myNoSQL)


Suitability of NoSQL Solutions

Monty Taylor:

Point being - when I rant about the suitability of NoSQL solutions, I’m mainly complaining that in many cases it seems to me that they’re using them because they’re popular or trendy and not because they are or are not actually suited to the task at hand.

The other way around is even more valid and I’d say it applies more often: RDBMS was used in some many cases just because it was around for such a long time.

  1. Monty Taylor: Drizzle  ()

Original title and link: Suitability of NoSQL Solutions (NoSQL databases © myNoSQL)


NoSQL Databases - The Trend for Databases in the Cloud?

From a sys-con article[1]:

Will there be some sort of “war” between NoSQL and SQL supporters like the one of REST versus SOAP? The answer is maybe. Who will win this case? As with SOAP versus REST, there won’t be a winner or a loser.

Bad parallel, but it begs a comment: I hope (some) NoSQL databases will be more like REST, as I wouldn’t call SOAP a success.

  1. Which unfortunately doesn’t have much, or at all with its title.  ()

Original title and link: NoSQL Databases – The Trend for Databases in the Cloud? (NoSQL databases © myNoSQL)


Why NoSQL does not Impress Me

Carl McDade:

I am not particularly impressed with CouchDB, MongoDB…Tokyo cabinet or any of the derivatives. Why not? Because it’s nothing new to the web or me as a developer. I was already using text/flat file databases back in 2003. […] To me it would have been very impressive if SQL was available as an option to map/reduce.


I have noticed that many that are hot on “NoSQL” are those that do not have any training or interest in learning about database architecture, design or optimization. To them things like CouchDB are away of escaping the need to learn these things. But what many don’t realize is that the replacement for SQL, Map/Reduce, is a more difficult and less intuitive way of querying the data source.


Most that ask me about NoSQL have never tried to distribute anything more than 10 gigs of data across a couple of database servers and a maybe two web servers. In other words they are sightseers looking for the latest craze but have no interest in learning anything about the internals or the proper uses of the “new/old” technology.

Some are good points, some are really bad.

Original title and link for this post: Why NoSQL does not Impress Me (published on the NoSQL blog: myNoSQL)


Why a DiY Big Data Stack is a Better Option

Someone is hopefully kidding:

While settling on a standard big data stack is deeply important to the big data industry as a whole, I’m nonetheless questioning the operational and competitive consequences for companies who choose to buy into this standard without first considering the value of building their own proprietary solution. […] The other benefit to going our own way is a sustainable competitive advantage over time.

I’m not sure a storage solution actually represents competitive advantage, but I’m wondering if you shouldn’t focus first on getting your business up and profitable, before investing your engineering efforts in building yet another storage solution?

Original title and link for this post: Why a DiY Big Data Stack is a Better Option (published on the NoSQL blog: myNoSQL)


Big Data and the Need for New Approaches to Data Integration

I’d say Dave Linthicum got some things wrongly:

First is the ability to manage large data sets more efficiently than with traditional relational technology as done in the past. The methodology is to leverage an approach called MapReduce.

MapReduce is about processing data, but you got to store that data first.

The “Map” portion of MapReduce is the master node that accepts the request and divides it among any number of worker nodes. The “Reduce” portion means that the master node considers the results from the worker nodes and combines them to determine the answer to the request. The power of this architecture is the simplistic nature of MapReduce, meaning it’s both easy to understand and to implement.


It is clear to me that using the cloud’s ability to provide massive amounts of commodity computing power, on-demand, when combined with a database architecture that will exploit that power means data processing power on scales we have never seen at these low price points.

This is still something I’m not yet convinced of. Processing in the cloud is indeed a good option. But data must be available on the cloud. And in the case of big data either storing it or moving it to the cloud doesn’t seem to be the best alternative.

Big Data and the Need for New Approaches to Data Integration originally posted on the NoSQL blog: myNoSQL


NoSQL Databases Aren't Hierarchical

Unfortunately based on a wrong hypothesis:

However most of the NoSQL tools seem to be NoRelational. As I see it, many of these tools map closely to the model that the relational model replaced.. the hierarchical model. Some describe themselves as hierarchical.

While not sure what NoSQL databases the author is referring, from my point of view the common denominator of column stores, document databases and key-value stores is the key-value model which is not hierarchical. On the other hand, graph databases are using the graph model at their core which is again different from the hierarchical model. The Java Content Repository implementations (e.g. Jackrabbit) are the only systems I’m aware of being hierarchical, so the hypothesis doesn’t apply.

NoSQL Databases Aren’t Hierarchical originally posted on the NoSQL blog: myNoSQL


Heroku Encourages Polyglot Persistence

Heroku published an article preaching polyglot persistence through a Database-as-a-Service approach:

Database-as-as-service is one of the coming decade’s most promising business models. […] DaaS also goes hand-in-glove with polyglot persistence. Thanks to database services, you won’t need to learn how to sysadmin/DBA for every datastore you use – you can instead outsource that job to a service provider specializing in each database.

While it definitely sounds exciting to be able to use all these NoSQL databases , we should always keep in mind the cost of complexity even if DaaS will help alleviate some of the complexity of heterogeneous systems.

The article includes also some interesting use cases for a couple of NoSQL databases:

  • Frequently-written, rarely read statistical data (for example, a web hit counter) should use an in-memory key/value store like Redis, or an update-in-place document store like MongoDB.
  • Big Data (like weather stats or business analytics) will work best in a freeform, distributed db system like Hadoop.
  • Binary assets (such as MP3s and PDFs) find a good home in a datastore that can serve directly to the user’s browser, like Amazon S3.
  • Transient data (like web sessions, locks, or short-term stats) should be kept in a transient datastore like Memcache. (Traditionally we haven’t grouped memcached into the database family, but NoSQL has broadened our thinking on this subject.)
  • If you need to be able to replicate your data set to multiple locations (such as syncing a music database between a web app and a mobile device), you’ll want the replication features of CouchDB.
  • High availability apps, where minimizing downtime is critical, will find great utility in the automatically clustered, redundant setup of datastores like Casandra and Riak.

These are good examples, but you can find many more in our coverage of NoSQL uses cases and the per-product case studies: CouchDB case studies or MongoDB case studies, etc.

Heroku Encourages Polyglot Persistence originally posted on the NoSQL blog: myNoSQL


NoSQL databases Should Support SQL Queries

Nati Shalom uses the old CS saying “Any software problem can be solved by adding another layer of indirection” to suggest that NoSQL databases could support SQL queries (and not only):

The key is the decoupling of the query semantics from the underlying data-store as illustrated in the diagram below:

SQL engine indirection

While it’s difficult to strongly argue against it, the real question is: how difficult will be for such a layer to calculate the costs of such queries? Or differently put:

The two software problems that can never be solved by adding another layer of indirection are that of providing adequate performance or minimal resource usage.

— Jeff Kesselman


Why Should Rubyists be Interested in NoSQL?

Jesse Wolgamott answers the question why should Rubyists be interested in NoSQL?

Once you reach the point in transaction system where the database is the scalability cause of your scalability problems, there’s no going back. You’ve taken the red pill. Table-based transaction databases are constrained by memory and there’s a hard maximum until your app crawls to a halt. The dream of true replication and easy sharding is built in.

Also: migrations just suck, even in Rails.

Interesting to note that Jesse’s talk is about MongoDB, CouchDB, RavenDB and Amazon SDB, the first 3 of them not being known for built-in scalability features. While that’s not to say they cannot scale — see for example scaling CouchDB — and while each of them has an attractive feature set, there are already other NoSQL databases that provide better and easier scalability: Cassandra, HBase, Riak, Project Voldemort.


Just say NoSQL

An article carrying quite a few strong statements. Some I do agree with, some I don’t

New waves of application development technology are often incompatible with old ways of thinking. Typically, when a brave new world opens to programmers, a healthy portion of them will cast aside the old ways in favor of the new. But the NoSQL movement is not about throwing out your SQL databases to be replaced by key-value stores. NoSQL, ironically, has nothing to do with avoiding SQL, and everything to do with the judicious use of relational databases.

Take for example:

He said (nb Mike Gualtieri, senior analyst Forrester Research) that saving actual customer purchasing information is better suited to a relational database, while storing more ephemeral information, such as customer product ratings and comments, is more appropriate for a NoSQL database.

Saying that NoSQL is fit for “ephemeral information” is a mistake: put your “cheap” data into NoSQL and your “important” data into relational databases. You don’t use a programming language for a product that is not so important and a different language for an important one. You always take into consideration a lot of aspects before making that decision. The same applies to choosing the storage backend.


Ellis (nb Jonathan Ellis, Cassandra lead and founder of Riptano) said that the developers at Digg invented a rule of thumb for deciding whether or not an environment necessitates a NoSQL database like Cassandra: “If you’re layering memcached on top of MySQL, you’re inventing an ad hoc NoSQL database by doing that,” said Ellis.

Well, that pretty much sounds like: “if you are using a dict/hash/map then you are inventing an ad hoc NoSQL database”. Personally I think that using a caching mechanism that is accessible through simple get/set operations just means that 1) memory access offers higher speed than anything else, 2) most of the time we like accessing our data in different ways

All in all, a good read built around quotes of different people involved or looking at the NoSQL market.


NoSQL and The Future of CMS

Interesting to check if the set of requirements of a CMS represent a good fit for NoSQL solutions:

  1. Richly structured content types
  2. Unstructured binary objects
  3. Relationships / references / associations
  4. The ability to evolve content models over time (what I call “schema evolution”)
  5. Branch / merge (in the Source Code Management (SCM) sense of the term)
  6. Snapshot based versioning
  7. ACID transactions
  8. Scalability to large content sets
  9. Geographic distribution

The only requirement that doesn’t seem to be satisfied by most of the NoSQL is “ACID transactions”. But in case this could be translated into atomic and durable operations, I think most of the NoSQL solution will pass this test too.

The guys from Outerthought, builders of the Daisy CMS, have been publishing a lot recently about their decision to build the next generation CMS (Lily) on top of HBase. Below are the slides of their presentation: “Learning Lessons: Building a CMS on top of NoSQL technologies” from Berlin Buzzwords

Another resource useful to understand the needs behind a CMS is ☞ OuterThoughts’ technology choices.