NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Pivotal HD: All content tagged as Pivotal HD in NoSQL databases and polyglot persistence

Proprietary Hadoop Is a Losing Strategy

Matt Asay (10gen) for ReadWrite adding to the long discussion around EMC’s Pivotal HD announcement:

EMC has seemingly bottomless resources to throw at Hadoop, and every incentive to do so. It’s a smart, highly successful company and no doubt will prove successful with Pivotal HD. However, I can’t see it ever dominating an open-source infrastructure market with a proprietary distribution. Open source is the foundation for today’s most interesting markets, from Big Data to mobile to cloud computing. It’s unlikely that EMC will somehow stem this tide with a proprietary product, no matter its short- term performance or functionality advantages.

While I’ve linked to different perspectives about this topic, I’m not sure anyone outside our bubble actually came to a conclusion.

What I know, though, is that EMC is benefiting from this. A lot. Three weeks ago, I wasn’t reading anything about EMC and Hadoop. Today all major websites have at least a couple of articles about it.

Original title and link: Proprietary Hadoop Is a Losing Strategy (NoSQL database©myNoSQL)


What It Means to Be “all In” on Hadoop

Another post about the Pivotal HD and the accompanying statements, this time from Matthew Aslett:

Pivotal HD is not Hadoop
Neither is Cloudera’s Distribution, including Apache Hadoop.
Nor the Hortonworks Data Platform.
Nor the MapR Distribution.
Nor IBM’s InfoSphere BigInsights.
Nor the WANdisco Distro.
Nor Intel’s Distribution for Apache Hadoop.

Original title and link: What It Means to Be “all In” on Hadoop (NoSQL database©myNoSQL)


How Many Hadoops?

The short answer is there is only one Apache Hadoop distribution.

The long answer is that there are many distributions that include Apache Hadoop or are claiming compatibility with Apache Hadoop.

The oldest and probably most popular: Cloudera’s Distribution of Hadoop (CDH)

The 100% open source: Hortonworks Data Platform.

The prioprietary: MapR.

The blue one: IBM InfoSphere BigInsights.

The latest: WANdisco Hadoop WDD, Intel Distribution of Hadoop and Pivotal HD from EMC Greenplum.

There’s also the version Facebook’s running on their cluster which includes Facebook Corona: a different approach to job scheduling and resource management.

But this list is not complete as it doesn’t include appliances featuring Hadoop. In this category we have:

  1. Oracle’s Big Data appliance featuring Cloudera’s Distribution of Hadoop
  2. Netapp’s Hadooplers
  3. EMC Greenplum DCA
  4. Teradata Aster Discovery Platform featuring Hortonworks’s Hadoop Data Platform
  5. Data Direct Networks (DDN)

I hope I didn’t miss any important ones1. As a conclusion for this list, my question is: who is actually benefiting from all these distributions?

  1. I left aside for now Hadoop-as-a-Service.  

Original title and link: How Many Hadoops? (NoSQL database©myNoSQL)