Hadoop Research Topics
If you feel like starting to hack Hadoop, a list of interesting Hadoop improvements from Dhruba Borthakur[1]
. Just a couple to wet your taste:
- Ability to make Hadoop scheduler resource aware, especially CPU, memory and IO resources. The current implementation is based on statically configured slots. *Ability to dynamically increase replicas of data in HDFS based on access patterns. This is needed to handle hot-spots of data.
- Make map-reduce jobs work across data centers. In many cases, a single hadoop cluster cannot fit into a single data center and a user has to partition the dataset into two hadoop clusters in two different data centers. *High Availability of the JobTracker. In the current implementation, if the JobTracker machine dies, then all currently running jobs fail.
- Dhruba Borthakur: Hadoop Engineer at Facebook (↩)
Original title and link: Hadoop Research Topics (NoSQL databases © myNoSQL)
via: http://hadoopblog.blogspot.com/2010/11/hadoop-research-topics.html