NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



document store: All content tagged as document store in NoSQL databases and polyglot persistence

Python, Django and MongoDB

Interested in Python, Django and MongoDB? Then I hope you’ll find these posts interesting:

And then there is this fresh screencast from Kevin Fricovsky talking Django and MongoDB integration. You can read about it ☞ here, but as a quick summary, the screencast will introduce you to mongoengine and then using Django-Mumblr, a NoSQL-based blog engine it will dive deeper into the details of Django and MongoDB integration.

Update: Just found a couple of more MongoDB Django tricks that you may find interesting.

The first one is a solution that provides access to MongoDB document _id’s from Django templates. The ☞ solution is based on a custom Django filter and using it as in {{ object|mongo_id }}

I find the solution pretty odd, not to mention that using a filter for accessing such an important document information seems convoluted. I’d much prefer to have the _id accessible directly on an object through either a field or at least a special property. Behavior for an unsaved document might be as simple as returning None or raising an exception.

The second trick fixes a problem with using Django’s FileWrapper while working with MongoDB’s GridFS. I’d probably be tempted to call this a bug, so before getting it fixed you can read the details ☞ here.

CouchDB List Functions

Just another trick for your CouchDB toolbox:

List functions are a mechanism for iterating over rows in a view to produce output. CouchDB list functions are typically used to generate alternate formats for output (Atom, XML, HTML, etc.). I still want to generate JSON for consumption by my Sinatra application. Hopefully, that will not prove difficult.

Other CouchDB tips&tricks


MongoDB Poster

One of those examples of “an image is worth a thousand words”


8 Reasons You Should Like CouchDB… and not only

While this is not the ☞ original title, I’d say it would summarize pretty well Jilles Van Gurp’s post about the set of features he liked most in CouchDB:

Document oriented and schema less storage

That’s definitely not unique. Here on MyNoSQL, we are aware of at least 4 document databases: CouchDB, MongoDB, Riak and Terrastore.

Conflict resolution

MongoDB encourages a master-slave setup so it will not have to deal with conflict resolution. Terrastore doesn’t have to address this either as the value of a key lives on a specific node and updates will be applied in their chronological order. Riak uses vector clocks for conflict resolution.

Robust incremental replication

Replication is supported by both Riak and MongoDB. Terrastore, being built on top of Terracotta, uses a different strategy:

Terracotta replication is not full, nor geared toward all nodes, but only those actually requiring the replicated data. This is more and more optimized in Terrastore, where, thanks to consistent hashing and partitioning, data is not duplicated at all. Terrastore also guarantees that data will never be duplicated among nodes, unless new nodes are joining or older nodes are leaving, thus requiring data redistribution.

You can read more about Terrastore approach on Terrastore: a consistent, partitioned and elastic document database.

I leave it up to the readers to comment based on their experience how robust replication is in each of the MongoDB, Riak and Terrastore case.

Fault tolerant

The explanation in the original article is more about durability than fault tolerance. In that regards, as far as I know only MongoDB is not really durable.

On Riak case, each write operation can specify the number of virtual nodes involved in the operation and also the number of successful durably writes.

Terrastore is durably storing its data on master whose availability can be enhanced by putting it in active/passive mode (simply put there can be passive masters that would take on the tasks once the initial master has failed).

I should probably mention that in terms of the fault tolerance definition, both Riak and Terrastore are fault tolerant.


Both Riak and Terrastore are HTTP friendly.

I left at the end the three reasons that are either more specific to CouchDB implementation or are debatable.

  • cleanup by replicating

This sounds more like something specific to CouchDB append only approach.

  • incremental map reduce on read

I’m not sure I understand the benefits of this.

  • it’s fast

As always, this sort of arguments are highly debatable and it always a good idea to use very well crafted benchmarks that fit your app scenarios.

In conclusion I’d say we ended up with 5 reasons you should like CouchDB and Riak and Terrastore. And I bet a change in the requirements (see as an example: On why I think these pro MongoDB arguments are not unique) would result in any other combinations of these four NoSQL solutions.

Special thanks to Sergio Bossa for clarifying some aspects related to Terrastore.

Installing CouchDB on Your Favorite Linux?

I am seeing lots of tutorials on how to install CouchDB on your favorite flavor of Linux, so I was wondering if this is a complex thing to get going? Or is it more about trying out the latest versions?

As far as I know getting CouchDB on Mac OS is just a matter of downloading it from ☞ CouchDBX. And there’s also the ☞ Homebrew way. Pretty similar experience for ☞ Windows.

So I thought I should ask if you’d find it useful to put together a list of tutorials on how to install uptodate versions of CouchDB on each of these OSes? In case the answer is yes, then please submit your preferred tutorial through a comment. Thanks!

Access CouchDB document revisions with RelaxDB

A nice trick to get quick access to the CouchDB document revisions with the Ruby RelaxDB library:

I messed with RelaxDB for Ruby for a little while to get this whole revisiony thing to work. For those familiar with RelaxDB, it exposes RelaxDB.load(_id, :revs=>true) for you, but due to limitations in Couch, you can’t get the :revs from a view, only directly loading an object. So to get around this, I mixed a revisions method into the Document class:

Check the other CouchDB tricks


Paginating with CouchDB

Except the case you are planning to offer your users a very bad experience, you’ll have to figure out a way to paginate through long collections. Using CouchDB is no different from any other storage, maybe it adds a bit of complexity:

CouchDB is a different beast, its aggressive use of indexes means that occasionally you loose some functionality that you’ve been accustomed to having in other persistence mechanisms, like the number of rows matching a query.

The following articles should get you up to speed on how to accomplish pagination while using CouchDB:

And if you have a different solution, please share it with MyNoSQL readers!

Check the other CouchDB tricks

MongoDB in the Windows Environment

I’ve put together a couple of posts that are taking MongoDB for a ride on a Windows environment.

Firstly, you have to install MongoDB. You can use a MongoDB Windows installer ☞ or choose to run it in a virtual machine.

In case you decide to go the first route, you may find this post ☞ useful as it will walk you from getting MongoDB installed on your Windows machine, start using the MongoDB console and then using mongodb-csharp to connect to MongoDB from C#. In case you prefer to jump directly to coding, you should probably check Getting started with MongoDB and C#.

It is interesting to note that both articles are using the same C# library: mongodb-csharp, which recently has added a LINQ provider (via @alastairs).

This other post ☞ will show you how to get MongoDB running inside VirtualBox hosted on Windows machine with a Ubuntu 9.10 guest. Then you’ll be able to use Visual Studio and one of the many MongoDB C# libraries to connect to MongoDB.

If instead of C# you’d like to try out MongoDB from F# then you’ll probably like to check this article ☞, which covers some nice features of using the F# dynamic constructs.

Rubyist and Pythonistas have a lot more materials to play with and here are just a few examples:

Last, but not least, I couldn’t find anything about Visual Basic and MongoDB :-)!

Quick reference to latest MongoDB, Project Voldemort, Terrastore, and Riak

After mentioning all these NoSQL releases of the most active and exciting NoSQL week, I thought MyNoSQL should provide a quick reference to all of them.

MongoDB 1.2.2

MongoDB 1.2.2 is mostly a bug fix release as can be seen from the ☞ announcement.


Project Voldemort 0.70

This new version of Project Voldemort is including one of the most awaited features: online rebalancing. While the ☞ official announcement makes it clear that rebalancing has been extensively tested, I couldn’t find a good description of the rebalancing algorithm.

There are other interesting features in the release that I’d like to mention:

  • New failure detector merged into the main branch
  • Beta mechanism for restoring all of node’s data from replicas on demand. This is an alternative to a more gradual mechanism provided by read-repair.

I’d really love to publish an article about the Voldemort rebalancing implementation, so please do forward this kind request to anyone that can help. ( @strlen, @ijuma: I hope you are reading this).


Terrastore 0.4

While version change sounded like a minor release, the latest version of Terrastore, the partitioned and elastic document database built on top of Terracotta, features quite a few interesting new things:

  • New, configurable, server-to-master reconnection procedure and improved graceful shutdown procedure for server nodes.
  • New socket-based internal communication layer, improving multi-node performances and lowering resource consumption.
  • New transparent rerouting of client requests in case of failing nodes.
  • Improved rebalancing in case of nodes leaving or failing.

I must confess that if I’d be managing Terrastore releases and knew that these features are well tested, I would definitely jumped a couple of versions!


Riak 0.7.1

Riak 0.7.1 seems to be mostly a bugfix release, with a pretty cryptic ☞ announcement (at least for someone not familiar with the Riak source code).


Statistical Computation with Incanter and MongoDB

Q: Can you explain why is MongoDB a good choice for incanter (as opposed to Clojure more generally?) Everything that I work with (in R) that is not rectangular, is indexed …

A: Do you mean as opposed to SQL databases, or other schema-less databases?

MongoDB can easily persist arbitrarily deeply nested Clojure data structures, which makes it a convenient choice, but that’s not to say there are not many other options, all equally useful.

I’d speculate that for larger data sets, having Incanter to work with HBase or Hadoop (if it doesn’t already) would take it to the next level.



FluidDB Proposal is Brilliant, But…

It looks like initially I have misunderstood what FluidDB is by calling it a “Wikipedia of databases“. A ☞ post on the FluidDB blog has clarified it for me, so I came up with another definition: “a persistent post-it database service in the cloud”.

While reading the blog post — and leaving aside the fact that as any NoSQL solution it came up with a Twitter-related application — I have found myself getting really excited at the idea. But then, I have realized that:

  1. there must be someone creating such an application
  2. the associative pattern suggested by FluidDB, while being nice will have to be extremely carefully implemented in the app
  3. (this is the most interesting from our NoSQL perspective) there doesn’t seem to be anything unique in FluidDB that would make such a usecase non-applicable to any other NoSQL solution.

Let me clarify it a bit. Basically the article introduces the idea of being able to associate metadata with any kind of information and keep this metadata under clear access rules. In my opinion this is just a simple associative relationship (think key-value stores) that could be implemented using any of the NoSQL solutions we are covering here. Moreover, I tend to think that document databases, like CouchDB, MongoDB, Riak or Terrastore would offer natively a lot of flexibility on the metadata “(un)structure”

Am I wrong?

Taking a look at a fascinating FluidDB usecase that seems to be easily supported by any NoSQL solution. Or not?