Curt Monash quoting Omer Trajman (Cloudera) in a post counting petabyte-scale Hadoop deployments:
The number of Petabyte+ Hadoop clusters expanded dramatically over the past year, with our recent count reaching 22 in production (in addition to the well-known clusters at Yahoo! and Facebook). Just as our poll back at Hadoop World 2010 showed the average cluster size at just over 60 nodes, today it tops 200. While mean is not the same as median (most clusters are under 30 nodes), there are some beefy ones pulling up that average. Outside of the well-known large clusters at Yahoo and Facebook, we count today 16 organizations running PB+ clusters running CDH across a diverse number of industries including online advertising, retail, government, financial services, online publishing, web analytics and academic research. We expect to see many more in the coming years, as Hadoop gets easier to use and more accessible to a wide variety of enterprise organizations.
First questions that bumped in my head after reading it:
- How many deployments DataStax’ Brisk has? How many close or over petabyte?
- How many clients run EMC Greenplum HD and how many are close to this scale?
- Same question about NetApp Hadoopler clients.
- Same question for MapR.
Answering these questions would give us a good overview of the Hadoop ecosystem.
Original title and link: Petabyte-Scale Hadoop Clusters ( ©myNoSQL)