Another must read from Ricky Ho:
Notice that Map/Reduce is good for “data parallelism”, which is different from “task parallelism”. Here is a description about their difference and a general parallel processing design methodology.
There are no formal definition of the Map/reduce model. Basic on the Hadoop implementation, we can think of it as a “distributed merge-sort engine”. The general processing flow is as follows:
- Input data is “split” into multiple mapper process which executes in parallel
- The result of the mapper is partitioned by key and locally sorted
- Result of mapper of the same key will land on the same reducer and consolidated there
- Merge sorted happens at the reducer so all keys arriving the same reducer is sorted
Original title and link: Designing algorithms for Map Reduce (NoSQL databases © myNoSQL)