ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Hadoop Versions Take 3… Can you follow it?

I’ve just read the Hortonworks’s post about the improvements in Hadoop .Next, jumped up and screamed “Super!”:

  • Federation for Scaling HDFS – HDFS has undergone a transformation to separate Namespace management from the Block (storage) management to allow for significant scaling of the filesystem. In previous architectures, they were intertwined in the NameNode.
  • NextGen MapReduce (aka YARN) – MapReduce has undergone a complete overhaul in hadoop-0.23, including a fundamental change to split up the major functionalities of the JobTracker, resource management and job scheduling/monitoring into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs. Thus, Hadoop becomes a general purpose data-processing platform that can support MapReduce as well as other application execution frameworks such as MPI, Graph processing, Iterative processing etc.

But then my eyes stopped on this part:

We are pleased to report that almost all of the benchmarks perform significantly better on Hadoop .Next (0.23.1) compared to the current stable hadoop-1.0 release.

With the image of the Hadoop versions in mind, I’ve asked myself and on Twitter what’s the plan with the Hadoop 1.0 and Hadoop 0.23 branches? Will they get unified in a next version? Will they continue in paralle? As you’d expect I was hoping to hear something like “once we finalize the major changes we will focus on clarifying “.

What I heard instead from Arun C.Murthy1 is that:

  • 0.23 is the next major production ready version
  • 1.0 will become the “old” deprecated version

Are you still with me?

I’m starting to wonder if this is some sort of strategy to get everyone confused. If it’s not, then I really hope someone will do something to clarify this mess.

Update: The conversation with Arun C. Murthy trying to clarify the future direction of Hadoop continued over a series of tweets. As he posted here too, the conclusion is:

Hadoop-0.23 will soon be Hadoop-Y where Y > 1. Thus Hadoop 1.0 is currently stable release, and Hadoop-Y will be next major release continuing lots of new features etc.


  1. Arun C. Murthy is Founder and Architect at Hortonworks, Hadoop PMC 

Original title and link: Hadoop Versions Take 3… Can you follow it? (NoSQL database©myNoSQL)