Tanuki: A 30000 Cores AWS Cluster
Sometimes the only valid comment is wow.
We have now launched a cluster 3 times the size of Tanuki, or 30,000 cores, which cost $1279/hour to operate for a Top 5 Pharma. It performed genuine scientific work — in this case molecular modeling — and a ton of it. The complexity of this environment did not necessarily scale linearly with the cores.
In fact, we had to implement a triad of features within CycleCloud to make it a reality:
- MultiRegion support: To achieve the mind boggling core count of this cluster, we launched in three distinct AWS regions simultaneously, including Europe.
- Massive Spot instance support: This was a requirement given the potential savings at this scale by going through the spot market. Besides, our scheduling environment and the workload had no issues with the possibility of early termination and rescheduling.
- Massive CycleServer monitoring & Grill GUI app for Chef monitoring: There is no way that any mere human could keep track of all of the moving parts on a cluster of this scale.
Facebook runs a 30PB Hadoop analytic data warehouse and Yahoo! has a 100,000 cores/40,000 machines Hadoop cluster. I’m wondering what are the largest Amazon Elastic MapReduce jobs ever run. Any ideas?
Original title and link: Tanuki: A 30000 Cores AWS Cluster (©myNoSQL)