For mathematically inclined MapReduce/Hadoop researchers a paper (PDF) by Gero Greiner and Riko Jacob (Institute of Theoretical Computer Science) :
In this, we present upper and lower bounds on the parallel I/O-complexity that are matching up to constant factors for the shuffle step. The shuffle step is the single communication phase where all information of one MapReduce invocation gets transferred from map workers to reduce workers.
[…] we can show that current implementations of the MapReduce model as a framework are almost optimal in the sense of worst-case asymptotic parallel I/O-complexity. This further yields a simple method to consider the external memory performance of an algorithm expressed in MapReduce.
On the one hand, this shows how much complexity can be “hidden” for an algorithm expressed in MapReduce compared to PEM. On the other hand, our results bound the worst-case performance loss of the MapReduce approach in terms of I/O-efficiency.
Original title and link: Paper: The Efficiency of MapReduce in Parallel External Memory ( ©myNoSQL)