NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Java: All content tagged as Java in NoSQL databases and polyglot persistence

Redis: Java Libraries

Staying in the Java land, I’ve put together this short list of Redis Java libraries as I’m looking myself for a stable one:


Main page mentions that JRedis is compatible with Redis 1.2.x only. But JRedis author left a comment mentioning that JRedis is Redis 2.0.0 compatible.

jredis project


Project home page mentions that Jedis is compatible with Redis 2.0.0. Currently the following features are supported:

  • Sorting
  • Connection handling
  • Commands operating on all the kind of values
  • Commands operating on string values
  • Commands operating on hashes
  • Commands operating on lists
  • Commands operating on sets
  • Commands operating on sorted sets
  • Transactions
  • Pipelining
  • Publish/Subscribe
  • Persistence control commands
  • Remote server control commands
  • Connection pooling

jedis project


A library developed in Scala (you should still be able to use it from any VM language) compatible with Redis 2.0.0. Some features:

  • Native Scala types Set and List responses.
  • Consisten Hashing on the client.
  • Support for Clustering of Redis nodes.

scala-redis project

Have I missed any? Also wondering which of these is stable.

Update: Graeme Rocher (@graemerocher) pointed out:


According to project home page, it is compatible with Redis 2.0.0, but there are no other details.

java-redis-client project.

Original title and link for this post: Redis: Java Libraries (published on the NoSQL blog: myNoSQL)

MongoDB: Java Frameworks Compared

Knut Haugen takes a look at MongoDB Java Driver, Morphia, and Mungbean:

It’s Mungbean, by a nose! Mainly because of the cleaner domain objects and the DSL. There is more code involved but I found it to be more elegant than the other approaches. I want to note that both morphia and mungbean are not immensely mature and done by any definition of the word and that has to come into consideration when using them. And it may be that a wordy statically typed language like Java has a bit of friction with a very dynamic database backend like MongoDB. I don’t know, but I’ll be looking into ruby drivers in the future and we’ll see.

The other libraries — GuiceyData, Sculptor, Morphia, but also the MongoDB Java driver — are compared in MongoDB in Java.

Original title and link for this post: MongoDB: Java Frameworks Compared (published on the NoSQL blog: myNoSQL)


Hector, Main Java Client for Cassandra Improves API

Hector is probably the most known and used client for Cassandra. Now it is getting a new API focused on getting rid of Thrift details:

When writing the first version of hector the premise was that users are comfortable with the current level of the thrift API so hector should maintain an API similar in spirit. […] I was wrong. As it turns out, users don’t learn the thrift API and then go use hector. Most users tend to just skip the thrift API and start with hector. Fait enough. But then I’m asked why did I make such a funny API… They are right, users of hector should not suffer from the limitations of the thrift API. Add to that the complexity of dealing with failover, which clients need not care about at the API level (and in the v1 API they did) and some complex anonymous classes and the Command pattern users need to understand (if only we could have closures in java…) then we get a less than ideal API.

That sounds like a very sane process: launch a first version and see what the real users are saying.

Update: Riptano, the company offering support for Cassandra, has made available a PDF detailing Hector API:

You can download it from ☞ here.

Hector, Main Java Client for Cassandra Improves API originally posted on the NoSQL blog: myNoSQL


Tutorial: MongoDB in Java

Every time I realize the flurry of NoSQL activity in the dynamic languages space, I feel the urge to post about the status of NoSQL adoption and support in environments like C# and Java.

So, what solutions do you have if you are in the Java land wanting to use MongoDB?

Using MongoDB Java driver API

Lets start with a code snippet to get a feeling of the MongoDB Java driver:

To learn more about basic CRUD operations with MongoDB you can check ☞ this post, or ☞ this post, and/or the first 10 slides of James’ slides below:

Another good resource are the following slides from Brendan W. McAdams:

If using the Java driver feels too verbose, then I have some Groovy sugar for it.

MongoDB driver API with Groovy sugar

Same James Williams has ☞ a post on the how things can get a bit better when using Groovy and MongoDB.

GuiceyData: ProtocolBuffers-like mapping

In ☞ this post, Matt Insler talks about the complexity of using schemaless NoSQL databases and statically types languages.

One of the best things about MongoDB is the lack of an enforced schema for collections. […] All of that being said, working with these records in a language like Java and on large diverse teams of people who don’t want to open the database and inspect the records to see what values and sub-records are available, means that you will always spend time wrapping these records in a strong-typed class. Wrapping up loose data into classes that can both access and create that data sounds just like another project I’ve used recently.

and he comes out with a possible solution based on external data definitions:

The GuiceyData Generator is a quick and easy way to specify strongly typed data structures to be stored in a MongoDB database and mapped to wrappers and builders in Java.

As a side note, BSON is a typed serialization format, so the real problem is just type mapping.

GuiceyData source code can be found on ☞ GitHub and you can read more about it ☞ here.

Sculptor: DSL for code generation

Another code generator, based on an internal DSL is ☞ Sculptor.

Sculptor generates data mapper classes that converts domain objects to/from MongoDB data structures, DBObjects. This makes it easy to use a domain model à la DDD with automatic mapping to MongoDB data structures.

Sculptor provides generic repository operations for use with MongoDB. This includes operations such as save, delete, findById, findByKey, findByCondition, and some more. You get CRUD operations, and GUI, for free.

Sculptor 1.9.0 - Support for MongoDB

Morphia: JPA-like

A different approach is proposed by ☞ Morphia which feels closer to JPA.

According to the slidedeck below:

  • brings Hibernate/JPA paradigms to MongoDB
  • allows annotating of POJOs to make converting them between MongoDB and Java very easy
  • supports DAO abstractions
  • offers type-safe query support
  • compatible with GWT, Guice, Spring, and DI frameworks

A look at Morphia performance and possible optimizations can be found ☞ here.

GuiceyData vs Morphia vs Sculptor

If you have a hard time deciding which one should you pick, you’ll probably find ☞ this article resource useful.

With that I guess you should just get started. And if you have a preferred approach to using MongoDB in Java please share it with the rest of us!

Tutorial: CouchDB and Java with the ektorp library

It might be only my feeling that the amount of experiments and work done in the NoSQL space using dynamic languages (PHP, Python, Ruby, etc.) is bigger than what has been done so far using “big brothers” languages like C# or Java. That’s not to say that C# and Java developers do not like NoSQL, just that good resources are rare.

Recently the author of the Java CouchDB ektorp library has published ☞ an interesting tutorial on how to build a basic blog app[1].

Here are my notes after going through the article:

  • model classes must extend a library provided class (CouchDbDocument). Java being a single inheritance language, this might be an issue for complex class hierarchies, so I was wondering if an approach based on annotations (see JPA) would not work even better. Update: according to the author this is optional and there are also mechanisms available.
  • “view” classes which provide through an annotation the CouchDB mapreduce based view definition.

It is also worth noting the comment around the model definition:

The relationship between BlogPost and Comment is modeled with a blogPostId field in Comment. In order to find the comments for a blog post a ‘by_blogPostId’ query has to be performed.

A perhaps more natural way would be let BlogPost keep its comments in a list. Although this is possible, this model would cause update congestion in the blog post document if many users post comments concurrently.

A better way is to keep each comment in its own document as no update conflict will occur.

We already talked a couple of times about how important is the role of data modeling while using a NoSQL storage. And even if CouchDB has ☞ a whole wiki page dedicated to modeling entity relationships and there are some articles covering schemaless data modeling, questions are still arising ☞ every ☞ day. The lesson to be learned from this is that: 1) you’ll need to carefully design your data model and 2) while you might be tempted to re-use an already known “pattern”, you’d better think twice about your application scenarios.


NoSQL News & Links 2010-04-16

  1. Tarek Ziadé: ☞ A Firefox plugin experiment. XUL, Bottle and Redis
  2. Andreas Jung: ☞ Looking beyond one’s own nose - looking at RabbitMQ and MongoDB

    Unsorted remarks on RabbitMQ and MongoDB plus some benchmarks with mass data

  3. Franck Cuny: ☞ presque, a Redis / Tatsumaki based message queue. Perl and Redis baby!
  4. Mark Atwood: ☞ Reacting to “Memcached is not a store”. IMO, it is as much as a store as any dict/hash you’ve been using. Well, a bit more.
  5. okram: ☞ pipes. A lot of activity around graph databases lately:

    Pipes is a graph-based data flow framework written in Java 1.6+. A process graph (also known as a Kahn process network) is composed of a set of process vertices connected to one another by a set of communication edges. Each process can run independent of the others and as such, concurrency is a natural consequence as data is transformed in a pipelined fashion from input to output.

A Groovy Way to Work with Neo4j

Groovy can really make things much more readable and nice:

The article shows a couple more tricks.


Java Persistence API with HBase

Sounds like the same JPA solution used by Google AppEngine can be used for HBase too:

It is possible to easily use JDO/JPA (via Datanucleus) to persist objects in the HBase BigTable implementation.


Presentation: Overview of HBase at Meetup

Sslides for the Overview of HBase at Meetup presentation.

My notes:

  • the options slide:
  • “scaling is built in, but extra indexing is DIY”. We had a post on this subject HBase secondary indexes
  • open source library for Java beans mapping to HBase tables ☞ meetup.beeno

Getting Up to Speed with CouchDB and Java

Nothing fancy, but if you haven’t done it already, this article will probably get you started in no time.

This article provides a step-by-step guide for using Apache CouchDB using Java. Here we will use Java code to:

  • Create a database in CouchDB,
  • Store employee data (Employee number, Name, Designation, etc).
  • Perform some CRUD operations on the data that is stored.
  • Show the use of “Views” and how it can retrieve data based on field values.

The software we use includes:

  • Apache CouchDB 0.9.0 installation
  • Couchdb4j-0.1.2 jar file
  • Json-lib-2.2.3-jdk15 jar file
  • ezmorph-1.0 jar file
  • Httpclient-4.0-beta2 jar file
  • Httpcore-4.0.1 jar file

Please note the current CouchDB version is 0.10.1, so you might want to try with this version instead of the one mentioned in the article.


Get a Taste of Graph Databases: InfoGrid and Neo4j

As I said in MongoDB MapReduce tutorial, the best way to validate that you’ve got the basics right about a system is to use some basic code. And this is exactly the idea behind this post: to take a look at a very (very) basic tagging app in InfoGrid and Neo4j.

InfoGrid version

The code with more details can be found ☞ here.

Neo4j version

The Neo4j code was contributed by Mattias Persson from Neo Technology (thanks Mattias).

Note: I couldn’t figure out a way to make the code more readable that this. But you can hover over the code snippets and you’ll get the option to see the original source code.

Here are my notes about the two code snippets above:

  • everything in Neo4j must happen inside a transaction even if it’s a graph traversal operation (this gives a very strong Isolation level). The InfoGrid traversal code seem to happen outside the transaction, so it sounds like it supports a more relaxed isolation level (interesting question here is: if traversal would happen inside a transaction, would that isolate it from seeing possible external modifications?)
  • InfoGrid’s central element is MeshObject, while Neo4j has Node and Relationship. Generally speaking I have found the terminology in InfoGrid a bit more unusual (f.e. MeshObject, relateAndBless, etc.)
  • the Neo4j uses also the LuceneIndexService for indexing both the tag and web resources nodes, but that’s only becaus e the code there makes sure not to duplicate either tags or web resources (i.e. this functionality is not present in the InfoGrid code and I don’t know how that would look like)
  • in both cases a relationship gives you access at both its ends. While both InfoGrid and Neo4j documentation speak about bidirectional arcs

If someone would contribute the code for ☞ HyperGraphDB and/or ☞ VertexDB I think this post would get even more interesting!

Update: The guys from Sones picked up my challenge and they show up their C# implementation on this ☞ post. I have included below the code for reference

Sones version

Update: I’ve just got another submission from Filament. Code is included below and their original post is ☞ here

Filament version

InfiniteGraph version

Update: Thanks to ☞ Todd Stavish we now have a version of this sample code for InfiniteGraph

Usecase: NoSQL-based Blogs

Aside Twitter applications, blogs are another darling of NoSQL projects. So, I’ve put together a list of NoSQL powered blog projects.



A Rails and CouchDB blog. Code on ☞ GitHub


A CouchDB-based blog built in “one day” with Django (nb so far I couldn’t find the source code, so any leads are appreciated).



A simple blog built using neo4j, jo4neo and Stripes. You can read more about it ☞ here and get the code from ☞ Google code.

Couple of comments:

  • I don’t really like the fact that the model is neo4j aware, but that’s similar to what JPA is doing too
  • I like the indexing annotation though, but I am not sure if it uses neo4j Lucene full text indexing



A cli-application blog built using neo4j. Code available on ☞ Google code.




A lightweight blogging engine written in C++ and using MongoDB. Code available on ☞ GitHub


Update: thanks to the comments, I have added two more NoSQL-based blog engines.


Django-Mumblr is a basic Django tumblelog application that uses MongoDB. Source code can be found on ☞ GitHub


mmmblog is a blog engine based on Rails, mongomapper and MongoDB, providing feeds, OpenID comments and a simple admin interface. Code is available on ☞ Gitorious

I am pretty sure there are more out there, so please send them over!