NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Phoebus: All content tagged as Phoebus in NoSQL databases and polyglot persistence

6 Pregel-Inspired Frameworks

A quick overview of 6 Pregel-inspired frameworks (Apache Hama, GoldenOrb, Apache Giraph, Phoebus, Signal/Collect, and HipG):

So, to summarize, what Hama, GoldenOrb and Giraph have in common is: Java platform, Apache License (and incubation), BSP computation. What they differ for: Hama offers BSP primitives not graph processing API (so it sits at a lower level), GoldenOrb provides Pregel’s API but requires the deployment of additional software to your existing Hadoop infrastructure, Giraph provides Pregel’s API (and is kind of complete at the current state) and doesn’t require additional infrastructure.

Original title and link: 6 Pregel-Inspired Frameworks (NoSQL database©myNoSQL)


Phoebus: Erlang-based Implementation of Google’s Pregel

Chad DePue about Phoebus, the first (?) open source implementation of Google’s Pregel algorithm:

Essentially, Phoebus makes calculating data for each vertex and edge in parallel possible on a cluster of nodes. Makes me wish I had a massively large graph to test it with.

Developed by Arun Suresh (Yahoo!), the project ☞ page includes a bullet description of the Pregel computational model:

  • A Graph is partitioned into a groups of Records.
  • A Record consists of a Vertex and its outgoing Edges (An Edge is a Tuple consisting of the edge weight and the target vertex name).
  • A User specifies a ‘Compute’ function that is applied to each Record.
  • Computation on the graph happens in a sequence of incremental Super Steps.
  • At each Super step, the Compute function is applied to all ‘active’ vertices of the graph.
  • Vertices communicate with each other via Message Passing.
  • The Compute function is provided with the Vertex record and all Messages sent to the Vertex in the previous SuperStep.
  • A Compute funtion can:
    • Mutate the value associated to a vertex
    • Add/Remove outgoing edges.
    • Mutate Edge weight
    • Send a Message to any other vertex in the graph.
    • Change state of the vertex from ‘active’ to ‘hold’.
  • At the begining of each SuperStep, if there are no more active vertices -and- if there are no messages to be sent to any vertex, the algorithm terminates.
  • A User may additionally specify a ‘MaxSteps’ to stop the algorithm after a some number of super steps.
  • A User may additionally specify a ‘Combine’ funtion that is applied to the all the Messages targetted at a Vertex before the Compute function is applied to it.

While it sounds similar to mapreduce, Pregel is optimized for graph operations, by reducing I/O, ensuring data locality, but also preserving processing state between phases.

Original title and link: Phoebus: Erlang-based Implementation of Google’s Pregel (NoSQL databases © myNoSQL)