hRaven: the beginning of smart Hadoop schedulers?

While going through the A Bird’s-Eye View of Pig and Scalding with hRaven slidedeck, I’ve started to wonder if hRaven might actually represent the beginning of smart Hadoop schedulers. While Hadoop has a pluggable scheduler framework and YARN will feature some improvements in the fair scheduler, I don’t think these are on par yet with the resource allocation management solutions available in MPP systems.

In a way slide #29, titled Current uses, hinted at something similar:

Current uses of hRaven

  • Pig reducer optimizations
  • Cluster utilization/capacity planning
  • Application performance trending over time
  • Identifying common job anti-patterns
  • Ad-hoc analysis troubleshooting cluster problems

What do you think?

