Given the following Hadoop NameNode problem:
the problem is, if the Namenode crashes, the entire file system becomes inoperable because clients and Datanodes still need the metadata to do anything useful. Furthermore, since the Namenode maintains all the metadata only in memory, the number of files you can store on the filesystem is directly proportional to the amount of RAM the Namenode has. As if that’s not enough, the Namenode will be completely saturated under write intensive workloads, and will be unable to respond to even simple client side queries like
ls. Have a look at Shvachko’s paper which describes these problems at great length and depth, on which we’ve based our work.
Lalith Suresh has worked for the last couple of months on the following solution:
“Move all of the Namenode’s metadata storage into an in-memory, replicated, share-nothing distributed database.”
[…] We chose MySQL Cluster as our database because of its wide spread use and stability. So for the filesystem to scale to a larger number of files, one needs to add more MySQL Cluster Datanodes, thus moving the bottleneck from the Namenode’s RAM to the DB’s storage capacity. For the filesystem to handle heavier workloads, one needs to add only more Namenode machines and divide the load amongst them. Another interesting aspect is that if a single Namenode machine has to reboot, it needn’t fetch any state into memory and will be ready for action within a few seconds (although it still has to sync with Datanodes). Another advantage of our design is that the modifications will not affect the clients or Datanodes in anyway, except that we might need to find a way to divide the load among the Namenodes.
His post covers the how, but also pros and cons of his solution. And the result is available on GitHub.
Update: Hortonworks is already working on a the next generation of Apache Hadoop MapReduce which is focusing on reliability, availability, scalability, and
predictable latency. But this doesn’t make Lalith’s work less interesting .
Original title and link: MySQL Cluster Used to Implement a Highly Available and Scalable Hadoop NodeName ( ©myNoSQL)