document store: All content tagged as document store in NoSQL databases and polyglot persistence
A couple of days back I’ve read ☞ a blog post with what I’d call an extremely catchy title: “Why I Think Mongo is to Databases what Rails was to Frameworks“. While the 7 reasons presented in the article are not wrong by themselves, I think that the features mentioned are not so unique to MongoDB.
But let’s take them one by one…
1. Migrations are Dead
[…] migrations are so last year.
Throw a new key into any model and you can start adding data to it.
The whole thing about migrations is related to the complexity of mutating RDBMS imposed fixed schema. In other words, any schema-less solution, being it a document database or a key-value store or even a schema-less RDBMS will show the same benefit.
2. Single Collection Inheritance Gone Wild
By using inheritance, they all share the same base keys, validations, callbacks and collection.
Before looking at inheritance we’d need to firstly separate state and behavior. And then separate behavior into behavior that can be implemented close to the data by behavior that belongs to the object/app model.
Behavior characteristic to the object/app model is not important here as it has nothing to do with the data store. The kind of behavior that can be implemented close to the data (.f.e validations) have been long supported by RDBMS by means of simple data type definitions, constraints or even triggers.
So we are left with mapping inheritable state to data store. As we already know, key-value stores are most of the time completely unaware of the data structure and so inheritance has no meaning there. For approaches where the store must be aware in some way of the data structure, I’d say that over years RDBMS and ORMs have come up with an extremely well designed approach for handling it and I’ll just mention three basic strategies: table per class hierarchy, table per subclass, table per concrete class. In case you’d like to read more on this I’d recommend this Hibernate (Java ORM) ☞ doc.
5. Embedding Custom Objects/Hash Keys/Array Keys
For the next three points, I have reversed the order as I do see them as specialized cases of this more generic one.
Mongo natively understand arrays […] you can even index the values and perform efficient queries on arrays
As if array keys were not enough, hash keys are just as awesome.
What is that you say? Arrays and hashes just aren’t enough for you. Well go fly a kite… or just use an embedded object.
Storing custom object was “always” possible, even if we are including here key-value stores or even RDBMS (nb it is obvious that document stores can handle this scenario). Over time, the most concerns expressed related to custom objects where in terms of efficiency/performance of storing/fetching such data and data layout (nb what I mean is how transparent is to operate with such an object).
So I’d say that the real questions/feature here would be:
- does the engine have an optimal strategy for storing/fetching this sort of ‘objects’? (f.e. how does it deal with array size modifications, etc.)
- in case your app needs to access details of such ‘object’, does the store support it? (f.e. can I filter results based on such on ‘object’ field value(s)?)
- in case such ‘objects’ are used to model relationships, how is your engine helping you avoid the N+1 query issue?
6. Incrementing and Decrementing
I see incrementation/decrementation as just a particular case of the generic “read and modify value” scenario, which is supported by both RDBMS and column-based stores (nb you can correct me on this one as I haven’t checked them all). There is an additional characteristic of this operation that is probably making the difference: atomicity. An even more generic feature that would fit this scenario is the ☞ compare-and-swap.
7. Files, aka GridFS
Mongo actually has a really cool GridFS specification that is implemented for all the drivers to allow storing files right in the database. I remember when storing files in the database was a horrible idea, but with Mongo this is really neat.
Well, I guess everyone tried at some point to store files into MySQL or other RDBMS. The whole issue related to it was the performance of the operation and how handy the API was.
In the end, please allow me say it once again that my intention is neither to argue against MongoDB features nor to deny how important these features can be for an application, but rather to clarify that these features are not unique to MongoDB. And if I misinterpreted any of these please feel free to correct me.
I guess these are the last releases for an eventful 2009 NoSQL year:
Mongo 1.2.1 is just a minor release featuring the following bug fixes:
- mongoimport now works on windows
- gcc 4.4 can be used to compile
- better map/reduce error handling
A day after our coverage of Terrastore, a consistent, partitioned and elastic document database, the 0.3 version was released featuring a much easier installation tool. You can read the announcement ☞ here. Sergio Bossa, Terrastore creator, has published a nice summary of what Terrastore is ☞ here.
And with this, I am looking forward to more exciting NoSQL releases in 2010.
Terrastore is a very young Apache licensed document store solution built on top of the Terracotta (an in-memory clustering technology) that released its 0.2 version a couple of days ago.
I had the opportunity to chat with Sergio Bossa (@sbtourist) and have him answer a couple of questions about Terrastore.
Alex: What is it that made you create Terrastore in the first place?
Sergio: I wanted a scalable document store with consistency features, because I think that’s an uncovered topic/space in current implementations, which are all geared toward BASE.
Being a document database, Terrastore belongs to the same category as CouchDB, MongoDB, and Riak. In some regards (f.e. partitioning), Terrastore is similar to Riak. You should also check  to find out more about Terrastore and the CAP theorem.
Terracotta replication is not full, nor geared toward all nodes, but only those actually requiring the replicated data. This is more and more optimized in Terrastore, where, thanks to consistent hashing and partitioning, data is not duplicated at all. Terrastore also guarantees that data will never be duplicated among nodes, unless new nodes are joining or older nodes are leaving, thus requiring data redistribution. A Terrastore client doesn’t need to know where the data is: it can contact whatever Terrastore node and requests will be routed to the proper node holding the value (note: this is similar to the way Dynamo, Project Voldemort, Cassandra and other distributed stores are working)
At this point, more people have joined the chat and so more interesting questions and answers were coming up.
Alex: Considering Terrastore is built on top of Terracotta, is it an in-memory storage making it somehow similar to Redis?
Sergio: Correct, it stores everything in memory, but it is persistent as well. It is not as fast as Redis mainly due to some overhead related to its distributed features.
Paulo Gaspar: Terrastore looks very much like a persistent, transactional Memcached service.
Sergio: Persistent, transactional, and partitioned/sharded. An interesting difference is that afaik Memcached partitioning is done client side, while Terrastore has builtin support for data partitioning, distribution and access routing.
Terrastore is already HTTP and JSON friendly  and the future might bring support for the memcached protocol too.
Please see the following resources to learn more about Terrastore:
The people at Teach Me to Code have published a 3 part screencast about MongoDB. The episodes are covering the following aspects:
- CRUD operations using MongoDB shell
- creating a Ruby application that accesses MongoDB
- using MongoMapper (see NoSQL libraries) with your Rails app and MongoDB
You can watch the complete series below (episodes are 13min, 21min and respectively 10min long. Also make sure you check Michael Dirolf’s Introduction to MongoDB.
Introduction to MongoDB: CRUD operation using MongoDB shell
Introduction to MongoDB: building a Sinatra based app interacting with MongoDB
Introduction to MongoDB: Rails and MongoMapper
The videos are also available for download (see the reference section). And you can always watch more NoSQL videos by using the video tag.
A lot of people say that location-enabled services will be the #### [*] of tomorrow, so is there any Geo NoSQL?
Populating a MongoDB with POIs ☞
What I especially liked is the flexibility you get from this kind of databases (nb MongoDB) and the ease of installation and use. The downside for geographic applications is that at the moment there is no built-in support for geometries.
Using MongoDB to Store Geographic Data ☞
Managing GIS data with NoSQL in circumstances where performances and scalability are a major issue could be the way for the win.
GeoCouch: The future ☞
What I call “complex analytics” is things like: “return all apple trees that are located with a 10km range around buildings that have are over 100m high, but only in countries with a population over 50 million people” is not possible with GeoCouch as you would need the attribute values as well. Those are stored in CouchDB, so you would need to request them. What GeoCouch only supports is a simple: give me all IDs within a bounding box/polygon/radius.
Tokyo Cabinet: Loading and querying point data ☞
I’m going to load 500.000 POIs in a database and query them with a bounding box query. I will use the table database from Tokyo Cabinet because it supports the most querying facilities. With a table database you can query numbers with full matched and range queries and for strings you can do full matching, forward matching, regular expression matching,…
And so the answer is: yes, we do have some Geo NoSQL!
After posting about Scott Motte’s comparison of MongoDB and CouchDB, I thought there should be some more informative sources out there, so I’ve started to dig.
The first I came upon (thanks to Debasish Ghosh @debasishg) is an article about ☞ Raindrop requirements and the issues faced while attacking them with CouchDB and the pros and cons of possibly replacing CouchDB with MongoDB:
- Uses update-in-place, so the file system impact/need for compaction is less if we store our schemas in one document are likely to work better.
- Queries are done at runtime. Some indexes are still helpful to set up ahead of time though.
- Has a binary format for passing data around. One of the issues we have seen is the JSON encode/decode times as data passes around through couch and to our API layer. This may be improving though.
- Uses language-specific drivers. While the simplicity of REST with CouchDB sounds nice, due to our data model, the megaview and now needing a server API layer means that querying the raw couch with REST calls is actually not that useful. The harder issue is trying to figure out the right queries to do and how to do the “joins” effectively in our API app code.
- easy master-master replication. However, for me personally, this is not so important. […] So while we need backups, we probably are fine with master-slave. To support the sometimes-offline case, I think it is more likely that using HTML5 local storage is the path there. But again, that is just my opinion.
Anyway while some of the points above are generic, you should definitely try to consider them through the Raindrop requirements perspective about which you can read more here.
I’d also mention this ☞ benchmark comparing the performance of MongoDB, CouchDB, Tokyo Cabinet/Tyrant (note: the author of the benchmark is categorizing Tokyo Cabinet as a document database, while Tokyo is a key-value store) and uses MySQL results as a reference.
In case you have other resources that you think would be worth including do not hesitate to send them over.
Update: Just found a nice comparison matrix .
As a teaser, very soon I will introduce you to a new solution available in this space, so make sure to check MyNoSQL regularly.
Update: The main article about this new document store has been published: Terrastore: A Consistent, Partitioned and Elastic Document Database. I would strongly encourage you to check it, as Terrastore is looking quite promising.