Michael Hausenblas (MapR) about the topic of the day: “Hadoop distributions”, about which I’ve already linked to Steve Loughran’s If There Is a Problem in the Hadoop JARs, How Are You Going to Fix It?, Merv Adrian’s Open Source “Purity”, Hadoop, and Market Realities and Matthew Aslett’s What It Means to Be “all In” on Hadoop:
One aspect I’d like to highlight is the importance of ‘standard’ interfaces,
defined through community consensus, and enforced by the Apaches and the
likes.I think it makes perfect sense to offer a commercial implementation
that is superior to the implementation you get ‘for free’ — as long as
you’re 100% compatible with the community-defined standard.
Here’s something I don’t understand about the above. The “Defining Hadoop wiki page” dedicates a complete paragraph to compatibility. The most important and relevant part of it is:
Other entities may claim that other products (including derivative works)
are compatible with Apache Hadoop. The Apache Hadoop development team is not
a standards body, and cannot confirm or deny such assertions. All that we
can say is “there is no official certification that a product is compatible
with Hadoop, other than when a release of the Apache source tree is declared
a new release of Apache Hadoop itself”.
Going back to MapR’s post my question is: if the Apache Hadoop project doesn’t offer a certification toolkit and the project team doesn’t validate the compatibility, what exactly does it mean to be “100% compatible” with something that can change any time and is completely out of your control?
Original title and link: Hadoop: What Matters Are Open and Standardized Interfaces ( ©myNoSQL)