What’s New and Upcoming in HDFS

Great retrospective with many architecture details of the improvements added to HDFS in 2012 and what is planned for this year by Todd Lipcon.

For a quick overview:

  • 2012: HDFS 2.0
    • HA (in 2 phases)
    • Performance improvements:
      • for Impala: faster libhdfs, APIs for spindle-based scheduling
      • for HBase and Accumulo: direct reads from block files in secure environments, application level checksums and IOPS elimintation
    • on-the-wire encryption
    • rolling upgrades and wire compatibility
  • 2013:
    • HDFS snapshots
    • better storage density and file formats
    • caching and hierarchical storage management

