Chad DePue about Phoebus, the first (?) open source implementation of Google’s Pregel algorithm:
Essentially, Phoebus makes calculating data for each vertex and edge in parallel possible on a cluster of nodes. Makes me wish I had a massively large graph to test it with.
Developed by Arun Suresh (Yahoo!), the project ☞ page includes a bullet description of the Pregel computational model:
- A Graph is partitioned into a groups of Records.
- A Record consists of a Vertex and its outgoing Edges (An Edge is a Tuple consisting of the edge weight and the target vertex name).
- A User specifies a ‘Compute’ function that is applied to each Record.
- Computation on the graph happens in a sequence of incremental Super Steps.
- At each Super step, the Compute function is applied to all ‘active’ vertices of the graph.
- Vertices communicate with each other via Message Passing.
- The Compute function is provided with the Vertex record and all Messages sent to the Vertex in the previous SuperStep.
- A Compute funtion can:
- Mutate the value associated to a vertex
- Add/Remove outgoing edges.
- Mutate Edge weight
- Send a Message to any other vertex in the graph.
- Change state of the vertex from ‘active’ to ‘hold’.
- At the begining of each SuperStep, if there are no more active vertices -and- if there are no messages to be sent to any vertex, the algorithm terminates.
- A User may additionally specify a ‘MaxSteps’ to stop the algorithm after a some number of super steps.
- A User may additionally specify a ‘Combine’ funtion that is applied to the all the Messages targetted at a Vertex before the Compute function is applied to it.
While it sounds similar to mapreduce, Pregel is optimized for graph operations, by reducing I/O, ensuring data locality, but also preserving processing state between phases.
Original title and link: Phoebus: Erlang-based Implementation of Google’s Pregel (NoSQL databases © myNoSQL)