infogrid: All content tagged as infogrid in NoSQL databases and polyglot persistence
Pere Urbón-Bayes must check slides deck on graph databases and their applicability. I like this graph database products slide most:
- Neo4j: open source database NoSQL graph
- Dex: the high performance graph database
- HyperGraphDB: an IA and semantic web graph database
- Infogrid: the Internet graph database
- Sones: SaaS dot Net graph database
- VertexDB: high performance database server
By the way I’ve heard Pere (@purbon) is currently looking for a job ;-).
After triggering our quick review of graph databases, Pere Urbón came up with a nice comparison of these — Neo4j, HyperGraphDB, DEX, InfoGrid, Sones, VertexDB — in terms of License, Schema, Querying, Storage implementation, Utilities, Language and Operating system support.
Pere has made this very interesting NoSQL graph database matrix available as a ☞ PDF on his blog.
Even if a bit late to report it, InfoGrid, one of the graph databases covered here, has announced the release of a new version: InfoGrid 2.9.4, which even if a minor release comes with a lot of improvements and fixes. The list of changes can be read ☞ here and the new InfoGrid 2.9.4 version can be downloaded from ☞ here.
One of the problems mentioned when discussing relational databases scalability is that handling storage enforced relationships, ACID and scale do not play well together. In the NoSQL space there is a category of storage solutions that uses highly interconnected data: graph databases. (note also that some of these graph databases are also transactional).
Lately there have been quite a few interesting discussions related to scaling graph databases. Alex Averbuch is working on a sharding Neo4j thesis and his recent post presents some of the possible solutions. Alex’s article is a very good starting point for anyone interesting in scaling graph databases.
Then there is also this article on InfoGrid‘s blog that is presenting a different web-like solution based on a custom protocol: XPRISO: eXtensible Protocol for the Replication, Integration and Synchronization of distributed Objects. While I haven’t had the chance to dig deeper into InfoGrid suggested approach there was one thing that caught my attention right away: while the association with web-scale is definitely an interesting idea, having specific knowledge of the nodes location and having to use custom API for it doesn’t seem to be the best solution. Basically the web addressed this by having URIs for each reachable resource (InfoGrid should try a similar idea, get rid of the different API for accessing local vs remote nodes, etc.)
Update: make sure you check the comment thread for more details about InfoGrid perspective on scaling graph databases.
Oren Eini concludes in his post:
After spending some time thinking about it, I came to the conclusion that I can’t envision any general way to solve the problem. Oh, I can think of several ways of reduce the problem:
- Batching cross machine queries so we only perform them at the close of each breadth first step.
- Storing multiple levels of associations (So “users/ayende” would store its relations but also “users/ayende”’s relation and “users/arik”’s relations).
While I haven’t had enough time to think about this topic, my gut feeling is that possible solutions are to be found in the space of a combination of using unique identifiers for distributed nodes and a mapreduce-like approach. I cannot stop wondering if this is not what Google’s Pregel is doing (nb I should have read the paper (pdf) firstly).
The InfoGrid blog has started to publish a series on basic operations with graph databases. While it looks like getting a taste of graph databases was a very good start, it wasn’t meant to introduce the details of working with a graph database, something that people may not be familiar with.
So, here are the first three articles on operations with a graph database:
- ☞ Nodes
- ☞ Edges and Traversals
- ☞ Typing (from free form nodes/edges to “strongly typed” nodes/edges)
- ☞ Properties
- ☞ Identifiers
- ☞ Traversals
Traversals are the most common operations on a graph database. They are just as important for graph databases as joins are for relational databases.
- ☞ Sets (new)
Sets are a core concept of most databases. […] Sets apply to Graph Databases just as well and are just as useful:
The most frequently encountered set of nodes in a Graph Database is the result of a traversal.
I just hope the series will keep going!
Pere Urbón ☞ published a short review of a couple of existing graph databases. For your reference, below are the ones reviewed in the post and a couple more that we’ve previously mentioned here on myNoSQL:
☞ Neo4j is an embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables.
☞ DEX is a high performance library to manage very large graphs or networks
☞ HyperGraphDB: a general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism.
☞ InfoGrid: an Internet Graph Database with a many additional software components that make the development of REST-ful web applications on a graph foundation easy.
☞ vertexdb: a high performance graph database server that supports automatic garbage collection.
Note: by checking the project homepage I cannot tell if the project is still active or not.
☞ AllegroGraph RDFStore: a modern, high-performance, persistent RDF graph database.
Note: AllegroGraph seems to be positioned in the RDF stores space, which features some other solutions too.
☞ Filament: a graph persistence framework and associated toolkits based on a navigational query style.
☞ Sones GraphDS provides an inherent support for high-level data abstraction concepts (graph structures, walks, consistency, editions, revisions, copies), its own Graph Query Language, an underlying distributed file system and various interfaces like SOAP, REST or WebDAV.
And I’m not sure these are all …
Update: make sure you check the NoSQL Graph Database Matrix
- Jonathan Ellis: ☞ Cassandra in action. A nice round-up of everything that was said lately about Cassandra. And just as a teaser, I can tell you that more is coming! ¶
- New article about traversals in graph databases added to the great operations on a graph database series ¶
- yousry.de: ☞ NoSQL and Web applications. A mix of explaining the Not only SQL meaning of NoSQL and some unrelated stuff. ¶
- Some NoSQL tools that might get you excited: ¶
- ☞ Redis toolset: a toolkit for Redis
- ☞ ektorp: Java API for CouchDB
- ☞ Querly: a query engine for CouchDB in Erlang ☞
- ☞ hbasebridge: simple JSON-RPC query bridge for HBase ☞
All added to the list of NoSQL libraries.
- hugoware: ☞ Performing Updates With CSMongo. Working with the C# CSmongo lib and MongoDB. See also MongoDB and C# and MongoDB in the Windows Environment. ¶
- Chris Strom: ☞ Base Debian Install on VirtualBox. Trying out the upcoming CouchDB 0.11 in a virtual env. ¶
- Jamie Talbot: ☞ Handling JSON Objects in CouchDB Native Erlang Views. ¶
- handcraftsman: Some more C# and MongoDB: ☞ Fluent MongoDB Part1 and ☞ Fluent MongoDB Part 2. ¶
- riklaunim: ☞ MongoDB data management in Python. Just the basics of using pymongo with MongoDB. ¶
- Two new articles added to the operations on graph databases series. A must read for everyone looking into graph databases. ¶
It looks like MyNoSQL’s initiative to compare same scenarios implemented by some of the graph databases is catching up and after Neo4j blog published an extensive article on ☞ access control lists with Neo4j, the guys from InfoGrid picked up the challenge and provided ☞ their own solution.
While I haven’t got a chance yet to use either of these graph databases, the more I look at these comparisons the more I get the feeling that they have more in common than differences, so at the end of the day it might be only a matter of preference in picking one or the other. Competition and choice is always good!
As I said in MongoDB MapReduce tutorial, the best way to validate that you’ve got the basics right about a system is to use some basic code. And this is exactly the idea behind this post: to take a look at a very (very) basic tagging app in InfoGrid and Neo4j.
The code with more details can be found ☞ here.
The Neo4j code was contributed by Mattias Persson from Neo Technology (thanks Mattias).
Note: I couldn’t figure out a way to make the code more readable that this. But you can hover over the code snippets and you’ll get the option to see the original source code.
Here are my notes about the two code snippets above:
- everything in Neo4j must happen inside a transaction even if it’s a graph traversal operation (this gives a very strong Isolation level). The InfoGrid traversal code seem to happen outside the transaction, so it sounds like it supports a more relaxed isolation level (interesting question here is: if traversal would happen inside a transaction, would that isolate it from seeing possible external modifications?)
- InfoGrid’s central element is
MeshObject, while Neo4j has
Relationship. Generally speaking I have found the terminology in InfoGrid a bit more unusual (f.e.
- the Neo4j uses also the
LuceneIndexServicefor indexing both the tag and web resources nodes, but that’s only becaus e the code there makes sure not to duplicate either tags or web resources (i.e. this functionality is not present in the InfoGrid code and I don’t know how that would look like)
- in both cases a relationship gives you access at both its ends. While both InfoGrid and Neo4j documentation speak about bidirectional arcs
Update: The guys from Sones picked up my challenge and they show up their C# implementation on this ☞ post. I have included below the code for reference
Update: I’ve just got another submission from Filament. Code is included below and their original post is ☞ here