infogrid: All content on NoSQL databases and projects about infogrid, featuring the best daily NoSQL articles, news, and links on infogrid

InfoGrid: Graph Database Schema

by Alex Popescu

Twitter Reddit

While most of the graph databases can be seen as collections of vertices and edges[1] carrying bags of properties, InfoGrid thinks that having some sort of a schema is good for both data integrity, code simplicity, and documentation purposes:

InfoGrid distinguishes between properties that must have a non-null value, and properties that may or or may not be null.

[…]

If InfoGrid did not distinguish between required and optional values, application code would be littered with unnecessary tests for null values. (or failing that, unexpected NullPointerExceptions.) We think being specific is better when creating the model; higher-quality and less cluttered application code is the reward.

InfoGrid is not alone, as we’ve seen a similar approach in the Java Content Repository node type definitions and also in OrientDB schema-less, schema-full, and mixed schema.

So what other creators think about schemas on graph databases?


  1. Check Marko A. Rodrigues and Peter Neubauer’s paper: Constructions from dots and lines  ()

Original title and link for this post: InfoGrid: Graph Database Schema (published on the NoSQL blog: myNoSQL)


NoSQL Graph Database Matrix

by Alex Popescu

Twitter Reddit

After triggering our quick review of graph databases, Pere Urbón came up with a nice comparison of these — Neo4j, HyperGraphDB, DEX, InfoGrid, Sones, VertexDB — in terms of License, Schema, Querying, Storage implementation, Utilities, Language and Operating system support.

Pere has made this very interesting NoSQL graph database matrix available as a ☞ PDF on his blog.


Release: InfoGrid 2.9.4 with Tons of Improvements

by Alex Popescu

Twitter Reddit

Even if a bit late to report it, InfoGrid, one of the graph databases covered here, has announced the release of a new version: InfoGrid 2.9.4, which even if a minor release comes with a lot of improvements and fixes. The list of changes can be read ☞ here and the new InfoGrid 2.9.4 version can be downloaded from ☞ here.

As a side note, just a couple of days ago I’ve covered an interesting discussion in the graph databases space: scaling graph databases, where InfoGrid‘s position is a very interesting one.


An Interesting Problem: Scaling Graph Databases

by Alex Popescu

Twitter Reddit
2 likes

One of the problems mentioned when discussing relational databases scalability is that handling storage enforced relationships, ACID and scale do not play well together. In the NoSQL space there is a category of storage solutions that uses highly interconnected data: graph databases. (note also that some of these graph databases are also transactional).

Lately there have been quite a few interesting discussions related to scaling graph databases. Alex Averbuch is working on a sharding Neo4j thesis and his recent ☞ post presents some of the possible solutions. Alex’s article is a very good starting point for anyone interesting in scaling graph databases.

Then there is also this ☞ article on InfoGrid‘s blog that is presenting a different web-like solution based on a custom protocol: ☞ XPRISO: eXtensible Protocol for the Replication, Integration and Synchronization of distributed Objects. While I haven’t had the chance to dig deeper into InfoGrid suggested approach there was one thing that caught my attention right away: while the association with web-scale is definitely an interesting idea, having specific knowledge of the nodes location and having to use custom API for it doesn’t seem to be the best solution. Basically the web addressed this by having URIs for each reachable resource (InfoGrid should try a similar idea, get rid of the different API for accessing local vs remote nodes, etc.)

Update: make sure you check the comment thread for more details about InfoGrid perspective on scaling graph databases.

Oren Eini concludes in ☞ his post:

After spending some time thinking about it, I came to the conclusion that I can’t envision any general way to solve the problem. Oh, I can think of several ways of reduce the problem:

  • Batching cross machine queries so we only perform them at the close of each breadth first step.
  • Storing multiple levels of associations (So “users/ayende” would store its relations but also “users/ayende”’s relation and “users/arik”’s relations).

While I haven’t had enough time to think about this topic, my gut feeling is that possible solutions are to be found in the space of a combination of using unique identifiers for distributed nodes and a mapreduce-like approach. I cannot stop wondering if this is not what Google’s ☞ Pregel is doing (nb I should have read the ☞ paper (pdf) firstly).


Operations on Graph Databases

by Alex Popescu

Twitter Reddit

The InfoGrid blog has started to publish a series on basic operations with graph databases. While it looks like getting a taste of graph databases was a very good start, it wasn’t meant to introduce the details of working with a graph database, something that people may not be familiar with.

So, here are the first three articles on operations with a graph database:

  1. ☞ Nodes
  2. ☞ Edges and Traversals
  3. ☞ Typing (from free form nodes/edges to “strongly typed” nodes/edges)
  4. ☞ Properties
  5. ☞ Identifiers
  6. ☞ Traversals

    Traversals are the most common operations on a graph database. They are just as important for graph databases as joins are for relational databases.

  7. ☞ Sets (new)

    Sets are a core concept of most databases. […] Sets apply to Graph Databases just as well and are just as useful:

    The most frequently encountered set of nodes in a Graph Database is the result of a traversal.

I just hope the series will keep going!


Quick Review of Existing Graph Databases

by Alex Popescu

Twitter Reddit
3 likes

Pere Urbón ☞ published a short review of a couple of existing graph databases. For your reference, below are the ones reviewed in the post and a couple more that we’ve previously mentioned here on myNoSQL:

Neo4j

☞ Neo4j is an embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables.

DEX

☞ DEX is a high performance library to manage very large graphs or networks

HyperGraphDB

☞ HyperGraphDB: a general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism.

InfoGrid

☞ InfoGrid: an Internet Graph Database with a many additional software components that make the development of REST-ful web applications on a graph foundation easy.

vertexdb

☞ vertexdb: a high performance graph database server that supports automatic garbage collection.

Note: by checking the project homepage I cannot tell if the project is still active or not.

AllegroGraph

☞ AllegroGraph RDFStore: a modern, high-performance, persistent RDF graph database.

Note: AllegroGraph seems to be positioned in the RDF stores space, which features some other solutions too.

Filament

☞ Filament: a graph persistence framework and associated toolkits based on a navigational query style.

Sones

☞ Sones GraphDS provides an inherent support for high-level data abstraction concepts (graph structures, walks, consistency, editions, revisions, copies), its own Graph Query Language, an underlying distributed file system and various interfaces like SOAP, REST or WebDAV.

And I’m not sure these are all …

Update: make sure you check the NoSQL Graph Database Matrix

Quick Review of Existing Graph Databases originally posted on the NoSQL blog: myNoSQL


NoSQL Ecosystem News & Links 2010-03-25

by Alex Popescu

Twitter Reddit
  1. Jonathan Ellis: ☞ Cassandra in action. A nice round-up of everything that was said lately about Cassandra. And just as a teaser, I can tell you that more is coming[1]!
  2. New article about traversals in graph databases added to the great operations on a graph database series
  3. yousry.de: ☞ NoSQL and Web applications. A mix of explaining the Not only SQL meaning of NoSQL and some unrelated stuff.
  4. Some NoSQL tools that might get you excited:

    All added to the list of NoSQL libraries.

References

  • [1] If you have a Cassandra or any NoSQL story you’d like to share, please ping me right away! ()

NoSQL Ecosystem News & Links 2010-03-18

by Alex Popescu

Twitter Reddit
1 likes

  1. hugoware: ☞ Performing Updates With CSMongo. Working with the C# CSmongo lib and MongoDB. See also MongoDB and C# and MongoDB in the Windows Environment.
  2. Chris Strom: ☞ Base Debian Install on VirtualBox. Trying out the upcoming CouchDB 0.11 in a virtual env.
  3. Jamie Talbot: ☞ Handling JSON Objects in CouchDB Native Erlang Views.
  4. handcraftsman: Some more C# and MongoDB: ☞ Fluent MongoDB Part1 and ☞ Fluent MongoDB Part 2.
  5. riklaunim: ☞ MongoDB data management in Python. Just the basics of using pymongo with MongoDB.
  6. Two new articles added to the operations on graph databases series. A must read for everyone looking into graph databases.


Access Control Lists with Graph Databases

by Alex Popescu

Twitter Reddit

It looks like MyNoSQL’s initiative to compare same scenarios implemented by some of the graph databases is catching up and after Neo4j blog published an extensive article on ☞ access control lists with Neo4j, the guys from InfoGrid picked up the challenge and provided ☞ their own solution.

While I haven’t got a chance yet to use either of these graph databases, the more I look at these comparisons the more I get the feeling that they have more in common than differences, so at the end of the day it might be only a matter of preference in picking one or the other. Competition and choice is always good!


Get a Taste of Graph Databases: InfoGrid and Neo4j

by Alex Popescu

Twitter Reddit

As I said in MongoDB MapReduce tutorial, the best way to validate that you’ve got the basics right about a system is to use some basic code. And this is exactly the idea behind this post: to take a look at a very (very) basic tagging app in InfoGrid and Neo4j.

InfoGrid version

The code with more details can be found ☞ here.

Neo4j version

The Neo4j code was contributed by Mattias Persson from Neo Technology (thanks Mattias).

Note: I couldn’t figure out a way to make the code more readable that this. But you can hover over the code snippets and you’ll get the option to see the original source code.

Here are my notes about the two code snippets above:

  • everything in Neo4j must happen inside a transaction even if it’s a graph traversal operation (this gives a very strong Isolation level). The InfoGrid traversal code seem to happen outside the transaction, so it sounds like it supports a more relaxed isolation level (interesting question here is: if traversal would happen inside a transaction, would that isolate it from seeing possible external modifications?)
  • InfoGrid’s central element is MeshObject, while Neo4j has Node and Relationship. Generally speaking I have found the terminology in InfoGrid a bit more unusual (f.e. MeshObject, relateAndBless, etc.)
  • the Neo4j uses also the LuceneIndexService for indexing both the tag and web resources nodes, but that’s only becaus e the code there makes sure not to duplicate either tags or web resources (i.e. this functionality is not present in the InfoGrid code and I don’t know how that would look like)
  • in both cases a relationship gives you access at both its ends. While both InfoGrid and Neo4j documentation speak about bidirectional arcs

If someone would contribute the code for ☞ HyperGraphDB and/or ☞ VertexDB I think this post would get even more interesting!

Update: The guys from Sones picked up my challenge and they show up their C# implementation on this ☞ post. I have included below the code for reference

Sones version

Update: I’ve just got another submission from Filament. Code is included below and their original post is ☞ here

Filament version

InfiniteGraph version

Update: Thanks to ☞ Todd Stavish we now have a version of this sample code for InfiniteGraph


Graph Databases: The graph model and processing

by Alex Popescu

Twitter Reddit

A new must read article from Ricky Ho on graph databases in which he covers the basics of the graph model and some graph algorithms.

I found many of the graph algorithms follows a general processing pattern. There are multiple rounds of (sequential) processing iterations. Within each iteration, there are a set of active nodes that perform local processing in parallel.

The article refers to Neo4j and Gremlin, but doesn’t mention InfoGrid (note: there is a comment that adds some details to the article from the InfoGrid perspective though).

Neo4j provide a restricted, single-threaded graph traversal model

  • At each round, the set of active nodes is always a single node
  • The set of active nodes of next round is determined by the traversal policy (breath or depth-first), but is still a single node
  • It offers a callback function to determine whether this node should be included in the result set

Gremlin, on the other hand, provides an interactive graph traversal model where user can step through each iteration. It uses an XPath like syntax to express the navigation.

  • The node is expressed as Node(id, inE, outE, properties)
  • The arc is expressed as Arc(id, type, inV, outV, properties)

The article also covers how graph algorithms (like topological sort, minimum spanning tree, single source shortest path) can benefit (or not) from a mapreduce implementations:

For example, graph algorithms with a breath-first search nature fits better into parallel computing paradigm with those that has a depth-first search nature. On the other hand, perform search at all nodes fits better in parallel computing than perform search at a particular start node.


InfoGrid and NoSQL

by Alex Popescu

Twitter Reddit

Johannes Ernst uses these 4 NoSQL criteria to discuss how ☞ InfoGrid fits the NoSQL world:

  • InfoGrid does not use SQL-the-language.
  • InfoGrid uses a graph database model, not a tabular model, …
  • InfoGrid relaxes ACID. No truly distributed system that I know of has ever had ACID properties, nor wanted them. Too many things can go wrong.
  • InfoGrid uses the “small pieces loosely joined” paradigm, not a top-down paradigm.

I guess one could argue that Johannes came up with those factors to make sure that InfoGrid scores well, but I don’t think that was his intention.