Roberto V. Zicari interviewing Joe Russell1 about Hadoop virtualization with Serengeti:
A common misconception when virtualizing Hadoop clusters is that we decouple
the data nodes from the physical infrastructure. This is not necessarily
true. When users virtualize a Hadoop cluster using Project Serengeti, they
separate data from compute while preserving data locality. By preserving
data locality, we ensure that performance isn’t negatively impacted, or
essentially making the infrastructure appear as static. Additionally, it
creates true multi-tenancy within more layers of the Hadoop stack, not just
the name node.
I’m not 100% sure I get this, but the way I explained it to myself to actually make sense this would mean that HDFS lives directly on the physical hardware and only the compute part is virtualized. Is that what he means?
Original title and link: Hadoop Virtualization ( ©myNoSQL)