Following his post on graph processing, Ricky Ho explains the major difference between Pregel and MapReduce applied to graph processing:
Since Pregel model retain worker state (the same worker is responsible for the same set of nodes) across iteration, the graph can be loaded in memory once and reuse across iterations. This will reduce I/O overhead as there is no need to read and write to disk at each iteration. For fault resilience, there will be a periodic check point where every worker write their in-memory state to disk.
Also, Pregel (with its stateful characteristic), only send local computed result (but not the graph structure) over the network, which implies the minimal bandwidth consumption.
If you need to summarize that even further it is basically:
- reducing I/O as much as possible
- ensuring data locality