NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Hadoop Versions Take 2: What You Wanted to Know About Hadoop, but Were Too Afraid to Ask: Genealogy of Elephants

Another great diagram explaining the complicated tree of Hadoop versions.

Apache Hadoop Versions

Click for full size image. Credit Konstantin I. Boudnik & Cos

When compared with the other diagram of Apache Hadoop versions, this one contains some very interesting details about the versions of Hadoop used by third party distributions like EMC, IBM, MapR, and even Azure:

The diagram above clearly shows a few important gaps of the rest of commercial offerings:

  • none of them supports Kerberos security (EMC, IBM, and MapR)
  • unavailability of Hbase due to the lack of HDFS append in their systems (EMC, IBM). In case of MapR you end up using a custom HBase distributed by MapR. I don’t want to make any speculation of the latter in this article.

If I’d be in position to choose which version of Hadoop to be used for a project, here is where I’d start from:

  1. if the project would have a budget for prototyping and experimentation, my first choice would be the latest official Apache distribution. This would give access to both the latest and greatest (and not always bug free), but more importantly it would allow the team to access the Hadoop community know-how
  2. if the project would require getting up to speed as fast as possible (and I’d be able to get some budget for trainings), I’d start my investigation with Cloudera Distribution of Hadoop. Even if there would be no budget for getting support for Cloudera, the advantage would be in having everything well packaged together.

Original title and link: Hadoop Versions Take 2: What You Wanted to Know About Hadoop, but Were Too Afraid to Ask: Genealogy of Elephants (NoSQL database©myNoSQL)