From scripting to MapReduce:
Next, Let us assume we have a much larger dataset with consists of all language movies and their details. And after running the new file on the same program, we find that it takes a lot of time. We want it to run faster. There are two ways to go about this. One, we hack the hell out of perl. Although this sounds promising, this might not be the best path. Or, we can try to run parts of the program in parallel thereby speeding up the execution. But, going this route, we might open up a Pandora’s Box of issues that accompany parallel processing.
How many processes are we going to run? How do we co-ordinate the processes? How are we going combine the output of all the process? What if processes fail? How are we going to control access to a shared resource? What if the dataset outgrows our single machine? These are some of the questions that need answering before we start writing parallel programs. Although we can get the program to run faster, it is often messy in real life.
What if there is programming model that abstracts away these details and let us concentrate only on writing our business logic? And that the model automatically takes care of all the finer details of parallelism, freeing us from thinking about its intrinsic details. It seems there is that model available.
Original title and link: Parallel Processing Using the Map Reduce Programming Model ( ©myNoSQL)