NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



InfoGrid: All content tagged as InfoGrid in NoSQL databases and polyglot persistence

Graph Databases: The graph model and processing

A new must read article from Ricky Ho on graph databases in which he covers the basics of the graph model and some graph algorithms.

I found many of the graph algorithms follows a general processing pattern. There are multiple rounds of (sequential) processing iterations. Within each iteration, there are a set of active nodes that perform local processing in parallel.

The article refers to Neo4j and Gremlin, but doesn’t mention InfoGrid (note: there is a comment that adds some details to the article from the InfoGrid perspective though).

Neo4j provide a restricted, single-threaded graph traversal model

  • At each round, the set of active nodes is always a single node
  • The set of active nodes of next round is determined by the traversal policy (breath or depth-first), but is still a single node
  • It offers a callback function to determine whether this node should be included in the result set

Gremlin, on the other hand, provides an interactive graph traversal model where user can step through each iteration. It uses an XPath like syntax to express the navigation.

  • The node is expressed as Node(id, inE, outE, properties)
  • The arc is expressed as Arc(id, type, inV, outV, properties)

The article also covers how graph algorithms (like topological sort, minimum spanning tree, single source shortest path) can benefit (or not) from a mapreduce implementations:

For example, graph algorithms with a breath-first search nature fits better into parallel computing paradigm with those that has a depth-first search nature. On the other hand, perform search at all nodes fits better in parallel computing than perform search at a particular start node.


InfoGrid and NoSQL

Johannes Ernst uses these 4 NoSQL criteria to discuss how ☞ InfoGrid fits the NoSQL world:

  • InfoGrid does not use SQL-the-language.
  • InfoGrid uses a graph database model, not a tabular model, …
  • InfoGrid relaxes ACID. No truly distributed system that I know of has ever had ACID properties, nor wanted them. Too many things can go wrong.
  • InfoGrid uses the “small pieces loosely joined” paradigm, not a top-down paradigm.

I guess one could argue that Johannes came up with those factors to make sure that InfoGrid scores well, but I don’t think that was his intention.