While going through the A Bird’s-Eye View of Pig and Scalding with hRaven slidedeck, I’ve started to wonder if hRaven might actually represent the beginning of smart Hadoop schedulers. While Hadoop has a pluggable scheduler framework and YARN will feature some improvements in the fair scheduler, I don’t think these are on par yet with the resource allocation management solutions available in MPP systems.
In a way slide #29, titled Current uses, hinted at something similar:
Current uses of hRaven
- Pig reducer optimizations
- Cluster utilization/capacity planning
- Application performance trending over time
- Identifying common job anti-patterns
- Ad-hoc analysis troubleshooting cluster problems
What do you think?
Original title and link: hRaven: the beginning of smart Hadoop schedulers? ( ©myNoSQL)