Java: All content tagged as Java in NoSQL databases and polyglot persistence
So, what solutions do you have if you are in the Java land wanting to use MongoDB?
- using the MongoDB Java driver API
- GuiceyData: ProtocolBuffers-like mapping
- Sculptor: DSL for code generation
- Morphia: JPA-like API
- GuiceyData vs Morphia vs Sculptor
Using MongoDB Java driver API
Lets start with a code snippet to get a feeling of the MongoDB Java driver:
Another good resource are the following slides from Brendan W. McAdams:
If using the Java driver feels too verbose, then I have some Groovy sugar for it.
MongoDB driver API with Groovy sugar
Same James Williams has ☞ a post on the how things can get a bit better when using Groovy and MongoDB.
GuiceyData: ProtocolBuffers-like mapping
In ☞ this post, Matt Insler talks about the complexity of using schemaless NoSQL databases and statically types languages.
One of the best things about MongoDB is the lack of an enforced schema for collections. […] All of that being said, working with these records in a language like Java and on large diverse teams of people who don’t want to open the database and inspect the records to see what values and sub-records are available, means that you will always spend time wrapping these records in a strong-typed class. Wrapping up loose data into classes that can both access and create that data sounds just like another project I’ve used recently.
and he comes out with a possible solution based on external data definitions:
The GuiceyData Generator is a quick and easy way to specify strongly typed data structures to be stored in a MongoDB database and mapped to wrappers and builders in Java.
As a side note, BSON is a typed serialization format, so the real problem is just type mapping.
Sculptor: DSL for code generation
Another code generator, based on an internal DSL is ☞ Sculptor.
Sculptor generates data mapper classes that converts domain objects to/from MongoDB data structures, DBObjects. This makes it easy to use a domain model à la DDD with automatic mapping to MongoDB data structures.
Sculptor provides generic repository operations for use with MongoDB. This includes operations such as save, delete, findById, findByKey, findByCondition, and some more. You get CRUD operations, and GUI, for free.
A different approach is proposed by ☞ Morphia which feels closer to JPA.
According to the slidedeck below:
- brings Hibernate/JPA paradigms to MongoDB
- allows annotating of POJOs to make converting them between MongoDB and Java very easy
- supports DAO abstractions
- offers type-safe query support
- compatible with GWT, Guice, Spring, and DI frameworks
A look at Morphia performance and possible optimizations can be found ☞ here.
GuiceyData vs Morphia vs Sculptor
If you have a hard time deciding which one should you pick, you’ll probably find ☞ this article resource useful.
With that I guess you should just get started. And if you have a preferred approach to using MongoDB in Java please share it with the rest of us!
It might be only my feeling that the amount of experiments and work done in the NoSQL space using dynamic languages (PHP, Python, Ruby, etc.) is bigger than what has been done so far using “big brothers” languages like C# or Java. That’s not to say that C# and Java developers do not like NoSQL, just that good resources are rare.
Here are my notes after going through the article:
- model classes must extend a library provided class (
CouchDbDocument). Java being a single inheritance language, this might be an issue for complex class hierarchies, so I was wondering if an approach based on annotations (see JPA) would not work even better. Update: according to the author this is optional and there are also mechanisms available.
- “view” classes which provide through an annotation the CouchDB mapreduce based view definition.
It is also worth noting the comment around the model definition:
The relationship between BlogPost and Comment is modeled with a blogPostId field in Comment. In order to find the comments for a blog post a ‘by_blogPostId’ query has to be performed.
A perhaps more natural way would be let BlogPost keep its comments in a list. Although this is possible, this model would cause update congestion in the blog post document if many users post comments concurrently.
A better way is to keep each comment in its own document as no update conflict will occur.
We already talked a couple of times about how important is the role of data modeling while using a NoSQL storage. And even if CouchDB has ☞ a whole wiki page dedicated to modeling entity relationships and there are some articles covering schemaless data modeling, questions are still arising ☞ every ☞ day. The lesson to be learned from this is that: 1) you’ll need to carefully design your data model and 2) while you might be tempted to re-use an already known “pattern”, you’d better think twice about your application scenarios.
- Tarek Ziadé: ☞ A Firefox plugin experiment. XUL, Bottle and Redis ¶
- Andreas Jung: ☞ Looking beyond one’s own nose - looking at RabbitMQ and MongoDB ¶
Unsorted remarks on RabbitMQ and MongoDB plus some benchmarks with mass data
- Franck Cuny: ☞ presque, a Redis / Tatsumaki based message queue. Perl and Redis baby! ¶
- Mark Atwood: ☞ Reacting to “Memcached is not a store”. IMO, it is as much as a store as any dict/hash you’ve been using. Well, a bit more. ¶
- okram: ☞ pipes. A lot of activity around graph databases lately: ¶
Pipes is a graph-based data flow framework written in Java 1.6+. A process graph (also known as a Kahn process network) is composed of a set of process vertices connected to one another by a set of communication edges. Each process can run independent of the others and as such, concurrency is a natural consequence as data is transformed in a pipelined fashion from input to output.
As I said in MongoDB MapReduce tutorial, the best way to validate that you’ve got the basics right about a system is to use some basic code. And this is exactly the idea behind this post: to take a look at a very (very) basic tagging app in InfoGrid and Neo4j.
The code with more details can be found ☞ here.
The Neo4j code was contributed by Mattias Persson from Neo Technology (thanks Mattias).
Note: I couldn’t figure out a way to make the code more readable that this. But you can hover over the code snippets and you’ll get the option to see the original source code.
Here are my notes about the two code snippets above:
- everything in Neo4j must happen inside a transaction even if it’s a graph traversal operation (this gives a very strong Isolation level). The InfoGrid traversal code seem to happen outside the transaction, so it sounds like it supports a more relaxed isolation level (interesting question here is: if traversal would happen inside a transaction, would that isolate it from seeing possible external modifications?)
- InfoGrid’s central element is
MeshObject, while Neo4j has
Relationship. Generally speaking I have found the terminology in InfoGrid a bit more unusual (f.e.
- the Neo4j uses also the
LuceneIndexServicefor indexing both the tag and web resources nodes, but that’s only becaus e the code there makes sure not to duplicate either tags or web resources (i.e. this functionality is not present in the InfoGrid code and I don’t know how that would look like)
- in both cases a relationship gives you access at both its ends. While both InfoGrid and Neo4j documentation speak about bidirectional arcs
Update: The guys from Sones picked up my challenge and they show up their C# implementation on this ☞ post. I have included below the code for reference
Update: I’ve just got another submission from Filament. Code is included below and their original post is ☞ here
Aside Twitter applications, blogs are another darling of NoSQL projects. So, I’ve put together a list of NoSQL powered blog projects.
A Rails and CouchDB blog. Code on ☞ GitHub
A CouchDB-based blog built in “one day” with Django (nb so far I couldn’t find the source code, so any leads are appreciated).
Couple of comments:
- I don’t really like the fact that the model is neo4j aware, but that’s similar to what JPA is doing too
- I like the indexing annotation though, but I am not sure if it uses neo4j Lucene full text indexing
A cli-application blog built using neo4j. Code available on ☞ Google code.
A lightweight blogging engine written in C++ and using MongoDB. Code available on ☞ GitHub
Update: thanks to the comments, I have added two more NoSQL-based blog engines.
Django-Mumblr is a basic Django tumblelog application that uses MongoDB. Source code can be found on ☞ GitHub
I am pretty sure there are more out there, so please send them over!
Everyone is building these days a Twitter-like or Twitter-related project using some NoSQL solution. I guess they can use as a ‘scientific’ explanation for these experiments Nati Shalom’s (Gigaspaces) great ☞ post on the common principles behind NoSQL alternatives (the post was inspired by his talk at QCon on building a scalable Twitter application. The presentation is embedded below).
Even if the project code is not available and I couldn’t get the mentioned online version to work, I’d say that the combination of Redis and HTML5 WebSockets is making it worth mentioning. And it case you cannot get it to work either, there is a screencast for it:
TStore is a twitter search result backup tool build in Python and CouchDB. The source code is available on ☞ GitHub.
Retwis is a non-distributed Twitter clone built in PHP and using Redis. The source code and extended details about the implementation are available ☞ here.
According to this page, there is already a port of this solution to Ruby and Sinatra: ☞ Retwis-RB.
Floxee is a commercial tweetstream search and tagging platform built using MongoDB. You can read a bit more about MongoDB usage ☞ here
I am pretty sure I haven’t found all Twitter-like/Twitter-related NoSQL apps out there, so please feel free to send me more. I’ll be happy to update the post.
And in case you are not interested in NoSQL Twitter applications, then you can check the MongoDB-based forum/message-boards apps.