If you feel like starting to hack Hadoop, a list of interesting Hadoop improvements from Dhruba Borthakur
. Just a couple to wet your taste:
- Ability to make Hadoop scheduler resource aware, especially CPU, memory and IO resources. The current implementation is based on statically configured slots.
*Ability to dynamically increase replicas of data in HDFS based on access patterns. This is needed to handle hot-spots of data.
- Make map-reduce jobs work across data centers. In many cases, a single hadoop cluster cannot fit into a single data center and a user has to partition the dataset into two hadoop clusters in two different data centers.
*High Availability of the JobTracker. In the current implementation, if the JobTracker machine dies, then all currently running jobs fail.
Original title and link: Hadoop Research Topics (NoSQL databases © myNoSQL)