Hadoop Implementers, Take Some Advice From My Grandmother
Paige Roberts on Pervasive blog:
The Hadoop distributed computing concept is inherently parallel and, therefore, should be friendly to better utilization models. But parallel programming, beyond the basic data level, the embarrassingly parallel level, requires different habits. MapReduce is already heading us in the wrong direction. Most Hadoop data centers aren’t doing any better when it comes to usage levels than traditional data centers. There’s still a tremendous amount of energy and compute power going to waste.
YARN gives us the option to use other compute models in Hadoop clusters; better, more efficient compute models, if we can create them.
People running Hadoop at scale always want to optimize power consumption. As the first example that comes to my mind, in November, Facebook, which most probably runs the largest Hadoop cluster, open sourced their work on improving MapReduce jobs scheduling in a project named Corona which was meant to increase the efficiency of using the resources available in their Hadoop clusters:
In heavy workloads during our testing, the utilization in the Hadoop MapReduce system topped out at 70%. Corona was able to reach more than 95%.
Original title and link: Hadoop Implementers, Take Some Advice From My Grandmother (©myNoSQL)