ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

infogrid: All content tagged as infogrid in NoSQL databases and polyglot persistence

A Survey of Graph Databases for the Java Programmers

Jasper Pei Lee provides an overview of the following graph databases from the perspective of the Java developer: Neo4j, InfiniteGraph, DEX, InfoGrid, HyperGraphDB, Trinity, AllegroGraph:

Graph Databases for the Java Programmers

His review is similar to the Quick Review of Existing Graph Databases, but stays focused on using these graph databases from a Java environment, this making it less generic than the NoSQL Graph Database Matrix.

The only part that I didn’t understand is the closing:

High-performance and distributed deploy are supposed to be supported by all products.

Without qualifying what high-performance means is difficult to assess if all reviewed products are on par[1]. And scaling graph databases is far from being a solved problem.


  1. AllegroGraph takes pride in breaking records related to the number of stored triples, while others are focused on access speed, or reliability.  

Original title and link: A Survey of Graph Databases for the Java Programmers (NoSQL database©myNoSQL)

via: http://jasperpeilee.wordpress.com/2011/11/25/a-survey-on-graph-databases/


Graph Theory and Databases

Pere Urbón-Bayes must check slides deck on graph databases and their applicability. I like this graph database products slide most:

  • Neo4j: open source database NoSQL graph
  • Dex: the high performance graph database
  • HyperGraphDB: an IA and semantic web graph database
  • Infogrid: the Internet graph database
  • Sones: SaaS dot Net graph database
  • VertexDB: high performance database server

By the way I’ve heard Pere (@purbon) is currently looking for a job ;-).

Original title and link: Graph Theory and Databases (NoSQL databases © myNoSQL)


InfoGrid: Graph Database Schema

While most of the graph databases can be seen as collections of vertices and edges[1] carrying bags of properties, InfoGrid thinks that having some sort of a schema is good for both data integrity, code simplicity, and documentation purposes:

InfoGrid distinguishes between properties that must have a non-null value, and properties that may or or may not be null.

[…]

If InfoGrid did not distinguish between required and optional values, application code would be littered with unnecessary tests for null values. (or failing that, unexpected NullPointerExceptions.) We think being specific is better when creating the model; higher-quality and less cluttered application code is the reward.

InfoGrid is not alone, as we’ve seen a similar approach in the Java Content Repository node type definitions and also in OrientDB schema-less, schema-full, and mixed schema.

So what other creators think about schemas on graph databases?


  1. Check Marko A. Rodrigues and Peter Neubauer’s paper: Constructions from dots and lines  ()

Original title and link for this post: InfoGrid: Graph Database Schema (published on the NoSQL blog: myNoSQL)

via: http://infogrid.org/blog/2010/08/required-vs-optional-property-values/


NoSQL Graph Database Matrix

After triggering our quick review of graph databases, Pere Urbón came up with a nice comparison of these — Neo4j, HyperGraphDB, DEX, InfoGrid, Sones, VertexDB — in terms of License, Schema, Querying, Storage implementation, Utilities, Language and Operating system support.

Pere has made this very interesting NoSQL graph database matrix available as a ☞ PDF on his blog.


Release: InfoGrid 2.9.4 with Tons of Improvements

Even if a bit late to report it, InfoGrid, one of the graph databases covered here, has announced the release of a new version: InfoGrid 2.9.4, which even if a minor release comes with a lot of improvements and fixes. The list of changes can be read ☞ here and the new InfoGrid 2.9.4 version can be downloaded from ☞ here.

As a side note, just a couple of days ago I’ve covered an interesting discussion in the graph databases space: scaling graph databases, where InfoGrid‘s position is a very interesting one.


An Interesting Problem: Scaling Graph Databases

One of the problems mentioned when discussing relational databases scalability is that handling storage enforced relationships, ACID and scale do not play well together. In the NoSQL space there is a category of storage solutions that uses highly interconnected data: graph databases. (note also that some of these graph databases are also transactional).

Lately there have been quite a few interesting discussions related to scaling graph databases. Alex Averbuch is working on a sharding Neo4j thesis and his recent post presents some of the possible solutions. Alex’s article is a very good starting point for anyone interesting in scaling graph databases.

Then there is also this article on InfoGrid‘s blog that is presenting a different web-like solution based on a custom protocol: XPRISO: eXtensible Protocol for the Replication, Integration and Synchronization of distributed Objects. While I haven’t had the chance to dig deeper into InfoGrid suggested approach there was one thing that caught my attention right away: while the association with web-scale is definitely an interesting idea, having specific knowledge of the nodes location and having to use custom API for it doesn’t seem to be the best solution. Basically the web addressed this by having URIs for each reachable resource (InfoGrid should try a similar idea, get rid of the different API for accessing local vs remote nodes, etc.)

Update: make sure you check the comment thread for more details about InfoGrid perspective on scaling graph databases.

Oren Eini concludes in his post:

After spending some time thinking about it, I came to the conclusion that I can’t envision any general way to solve the problem. Oh, I can think of several ways of reduce the problem:

  • Batching cross machine queries so we only perform them at the close of each breadth first step.
  • Storing multiple levels of associations (So “users/ayende” would store its relations but also “users/ayende”’s relation and “users/arik”’s relations).

While I haven’t had enough time to think about this topic, my gut feeling is that possible solutions are to be found in the space of a combination of using unique identifiers for distributed nodes and a mapreduce-like approach. I cannot stop wondering if this is not what Google’s Pregel is doing (nb I should have read the paper (pdf) firstly).


Operations on Graph Databases

The InfoGrid blog has started to publish a series on basic operations with graph databases. While it looks like getting a taste of graph databases was a very good start, it wasn’t meant to introduce the details of working with a graph database, something that people may not be familiar with.

So, here are the first three articles on operations with a graph database:

  1. ☞ Nodes
  2. ☞ Edges and Traversals
  3. ☞ Typing (from free form nodes/edges to “strongly typed” nodes/edges)
  4. ☞ Properties
  5. ☞ Identifiers
  6. ☞ Traversals

    Traversals are the most common operations on a graph database. They are just as important for graph databases as joins are for relational databases.

  7. ☞ Sets (new)

    Sets are a core concept of most databases. […] Sets apply to Graph Databases just as well and are just as useful:

    The most frequently encountered set of nodes in a Graph Database is the result of a traversal.

I just hope the series will keep going!


Quick Review of Existing Graph Databases

Pere Urbón ☞ published a short review of a couple of existing graph databases. For your reference, below are the ones reviewed in the post and a couple more that we’ve previously mentioned here on myNoSQL:

Neo4j

☞ Neo4j is an embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables.

DEX

☞ DEX is a high performance library to manage very large graphs or networks

HyperGraphDB

☞ HyperGraphDB: a general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism.

InfoGrid

☞ InfoGrid: an Internet Graph Database with a many additional software components that make the development of REST-ful web applications on a graph foundation easy.

vertexdb

☞ vertexdb: a high performance graph database server that supports automatic garbage collection.

Note: by checking the project homepage I cannot tell if the project is still active or not.

AllegroGraph

☞ AllegroGraph RDFStore: a modern, high-performance, persistent RDF graph database.

Note: AllegroGraph seems to be positioned in the RDF stores space, which features some other solutions too.

Filament

☞ Filament: a graph persistence framework and associated toolkits based on a navigational query style.

Sones

☞ Sones GraphDS provides an inherent support for high-level data abstraction concepts (graph structures, walks, consistency, editions, revisions, copies), its own Graph Query Language, an underlying distributed file system and various interfaces like SOAP, REST or WebDAV.

And I’m not sure these are all …

Update: make sure you check the NoSQL Graph Database Matrix

Quick Review of Existing Graph Databases originally posted on the NoSQL blog: myNoSQL


NoSQL Ecosystem News & Links 2010-03-25

  1. Jonathan Ellis: ☞ Cassandra in action. A nice round-up of everything that was said lately about Cassandra. And just as a teaser, I can tell you that more is coming[1]!
  2. New article about traversals in graph databases added to the great operations on a graph database series
  3. yousry.de: ☞ NoSQL and Web applications. A mix of explaining the Not only SQL meaning of NoSQL and some unrelated stuff.
  4. Some NoSQL tools that might get you excited:

    All added to the list of NoSQL libraries.

References

  • [1] If you have a Cassandra or any NoSQL story you’d like to share, please ping me right away! ()

NoSQL Ecosystem News & Links 2010-03-18

  1. hugoware: ☞ Performing Updates With CSMongo. Working with the C# CSmongo lib and MongoDB. See also MongoDB and C# and MongoDB in the Windows Environment.
  2. Chris Strom: ☞ Base Debian Install on VirtualBox. Trying out the upcoming CouchDB 0.11 in a virtual env.
  3. Jamie Talbot: ☞ Handling JSON Objects in CouchDB Native Erlang Views.
  4. handcraftsman: Some more C# and MongoDB: ☞ Fluent MongoDB Part1 and ☞ Fluent MongoDB Part 2.
  5. riklaunim: ☞ MongoDB data management in Python. Just the basics of using pymongo with MongoDB.
  6. Two new articles added to the operations on graph databases series. A must read for everyone looking into graph databases.

Access Control Lists with Graph Databases

It looks like MyNoSQL’s initiative to compare same scenarios implemented by some of the graph databases is catching up and after Neo4j blog published an extensive article on ☞ access control lists with Neo4j, the guys from InfoGrid picked up the challenge and provided ☞ their own solution.

While I haven’t got a chance yet to use either of these graph databases, the more I look at these comparisons the more I get the feeling that they have more in common than differences, so at the end of the day it might be only a matter of preference in picking one or the other. Competition and choice is always good!


Get a Taste of Graph Databases: InfoGrid and Neo4j

As I said in MongoDB MapReduce tutorial, the best way to validate that you’ve got the basics right about a system is to use some basic code. And this is exactly the idea behind this post: to take a look at a very (very) basic tagging app in InfoGrid and Neo4j.

InfoGrid version

The code with more details can be found ☞ here.

Neo4j version

The Neo4j code was contributed by Mattias Persson from Neo Technology (thanks Mattias).

Note: I couldn’t figure out a way to make the code more readable that this. But you can hover over the code snippets and you’ll get the option to see the original source code.

Here are my notes about the two code snippets above:

  • everything in Neo4j must happen inside a transaction even if it’s a graph traversal operation (this gives a very strong Isolation level). The InfoGrid traversal code seem to happen outside the transaction, so it sounds like it supports a more relaxed isolation level (interesting question here is: if traversal would happen inside a transaction, would that isolate it from seeing possible external modifications?)
  • InfoGrid’s central element is MeshObject, while Neo4j has Node and Relationship. Generally speaking I have found the terminology in InfoGrid a bit more unusual (f.e. MeshObject, relateAndBless, etc.)
  • the Neo4j uses also the LuceneIndexService for indexing both the tag and web resources nodes, but that’s only becaus e the code there makes sure not to duplicate either tags or web resources (i.e. this functionality is not present in the InfoGrid code and I don’t know how that would look like)
  • in both cases a relationship gives you access at both its ends. While both InfoGrid and Neo4j documentation speak about bidirectional arcs

If someone would contribute the code for ☞ HyperGraphDB and/or ☞ VertexDB I think this post would get even more interesting!

Update: The guys from Sones picked up my challenge and they show up their C# implementation on this ☞ post. I have included below the code for reference

Sones version

Update: I’ve just got another submission from Filament. Code is included below and their original post is ☞ here

Filament version

InfiniteGraph version

Update: Thanks to ☞ Todd Stavish we now have a version of this sample code for InfiniteGraph