We have now launched a cluster 3 times the size of Tanuki, or 30,000 cores, which cost $1279/hour to operate for a Top 5 Pharma. It performed genuine scientific work — in this case molecular modeling — and a ton of it. The complexity of this environment did not necessarily scale linearly with the cores.
In fact, we had to implement a triad of features within CycleCloud to make it a reality:
MultiRegion support: To achieve the mind boggling core count of this cluster, we launched in three distinct AWS regions simultaneously, including Europe.
Massive Spot instance support: This was a requirement given the potential savings at this scale by going through the spot market. Besides, our scheduling environment and the workload had no issues with the possibility of early termination and rescheduling.
Massive CycleServer monitoring & Grill GUI app for Chef monitoring: There is no way that any mere human could keep track of all of the moving parts on a cluster of this scale.
This blog is called myNoSQL and it is written by me, Alex Popescu, a software architect with a passion for open source and communities.
It records my readings, learnings, and opinions on NoSQL databases, polyglot persistence, and distributed systems -- subjects that I'm passionate about.
The opinions expressed here are my own, and no other party necessarily agrees with them.
If you feel I'm biased, I probably am.