The story of migrating 30PB of HDFS stored data:
The scale of this migration exceeded all previous ones, and we considered a couple of different migration strategies. One was a physical move of the machines to the new data center. We could have moved all the machines within a few days with enough hands at the job. However, this was not a viable option as our users and analysts depend on the data 24/7, and the downtime would be too long.
Facebook could have convinced part of the 750+ million users to help carry over the physical machines.
Luckily HDFS has replication built in and Facebook engineers have paired it with their own improvements:
Another approach was to set up a replication system that mirrors changes from the old cluster to the new, larger cluster. Then at the switchover time, we could simply redirect everything to the new cluster. This approach is more complex as the source is a live file system, with files being created and deleted continuously. Due to the unprecedented cluster size, a new replication system that could handle the load would need to be developed. However, because replication minimizes downtime, it was the approach that we decided to use for this massive migration.
Another bit worth emphasizing in this post:
In 2010, Facebook had the largest Hadoop cluster in the world, with over 20 PB of storage. By March 2011, the cluster had grown to 30 PB — that’s 3,000 times the size of the Library of Congress!
Until now I thought Yahoo! has the largest Hadoop deployment, at least in terms of number of machines.
Original title and link: Moving an Elephant: Large Scale Hadoop Data Migration at Facebook ( ©myNoSQL)