NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Graph Databases: The graph model and processing

A new must read article from Ricky Ho on graph databases in which he covers the basics of the graph model and some graph algorithms.

I found many of the graph algorithms follows a general processing pattern. There are multiple rounds of (sequential) processing iterations. Within each iteration, there are a set of active nodes that perform local processing in parallel.

The article refers to Neo4j and Gremlin, but doesn’t mention InfoGrid (note: there is a comment that adds some details to the article from the InfoGrid perspective though).

Neo4j provide a restricted, single-threaded graph traversal model

  • At each round, the set of active nodes is always a single node
  • The set of active nodes of next round is determined by the traversal policy (breath or depth-first), but is still a single node
  • It offers a callback function to determine whether this node should be included in the result set

Gremlin, on the other hand, provides an interactive graph traversal model where user can step through each iteration. It uses an XPath like syntax to express the navigation.

  • The node is expressed as Node(id, inE, outE, properties)
  • The arc is expressed as Arc(id, type, inV, outV, properties)

The article also covers how graph algorithms (like topological sort, minimum spanning tree, single source shortest path) can benefit (or not) from a mapreduce implementations:

For example, graph algorithms with a breath-first search nature fits better into parallel computing paradigm with those that has a depth-first search nature. On the other hand, perform search at all nodes fits better in parallel computing than perform search at a particular start node.