Paige Roberts on Pervasive blog:
The Hadoop distributed computing concept is inherently parallel and,
therefore, should be friendly to better utilization models. But
parallel programming, beyond the basic data level, the
embarrassingly parallel level, requires different habits. MapReduce
is already heading us in the wrong direction. Most Hadoop data
centers aren’t doing any better when it comes to usage levels than
traditional data centers. There’s still a tremendous amount of
energy and compute power going to waste.
YARN gives us the option to use other compute models in Hadoop
clusters; better, more efficient compute models, if we can create
People running Hadoop at scale always want to optimize power consumption. As the first example that comes to my mind, in November, Facebook, which most probably runs the largest Hadoop cluster, open sourced their work on improving MapReduce jobs scheduling in a project named Corona which was meant to increase the efficiency of using the resources available in their Hadoop clusters:
In heavy workloads during our testing, the utilization in the Hadoop MapReduce system topped out at 70%. Corona was able to reach more than 95%.
Original title and link: Hadoop Implementers, Take Some Advice From My Grandmother ( ©myNoSQL)