Greenplum: All content tagged as Greenplum in NoSQL databases and polyglot persistence
The short answer is there is only one Apache Hadoop distribution.
The long answer is that there are many distributions that include Apache Hadoop or are claiming compatibility with Apache Hadoop.
The oldest and probably most popular: Cloudera’s Distribution of Hadoop (CDH)
The 100% open source: Hortonworks Data Platform.
The prioprietary: MapR.
The blue one: IBM InfoSphere BigInsights.
There’s also the version Facebook’s running on their cluster which includes Facebook Corona: a different approach to job scheduling and resource management.
But this list is not complete as it doesn’t include appliances featuring Hadoop. In this category we have:
- Oracle’s Big Data appliance featuring Cloudera’s Distribution of Hadoop
- Netapp’s Hadooplers
- EMC Greenplum DCA
- Teradata Aster Discovery Platform featuring Hortonworks’s Hadoop Data Platform
- Data Direct Networks (DDN)
I hope I didn’t miss any important ones1. As a conclusion for this list, my question is: who is actually benefiting from all these distributions?
I left aside for now Hadoop-as-a-Service. ↩
Original title and link: How Many Hadoops? ( ©myNoSQL)
My list of 8 most interesting companies for the future of Hadoop didn’t try to include anyone having a product with the Hadoop word in it. But the list from InformationWeek does. To save you 15 clicks, here’s their list:
- Amazon Elastic MapReduce
- EMC (with EMC Greenplum Unified Analytics Platform and EMC Data Computing Appliance)
- IBM (InfoSphere BigInsights)
- Informatica (for HParser)
Original title and link: 12 Hadoop Vendors to Watch in 2012 ( ©myNoSQL)
Just a quick recap:
- Cloudera: Oracle, Dell, NetApp
- Hortonworks: Microsoft
- MapR: EMC (integration with Greenplum HD)
Amazon doesn’t partner with anyone for their Amazon Elastic Map Reduce. And IBM is walking alone with the software-only InfoSphere BigInsights.
Original title and link: Partnerships in the Hadoop Market ( ©myNoSQL)
- The ability to orchestrate execution of Hadoop related tasks (i.e., executing a Hive Query, Pig Script, or M/R job) as part of a broader IT workflow.
- The ability to setup dependencies, so if a step fails the job can branch down a recovery path or send a notification, or if it’s a success it goes on to subsequent dependent tasks. Likewise it supports initiating several tasks in parallel.
- New integration for Pig — so that developers have the ability to execute a Pig job from a PDI Job flow, integrate the execution of Pig jobs in broader IT workflows through PDI Jobs, take advantage of our out of the box scheduler, and so on.
The list of tools Pentaho 4 integrates with is quite long:
- a long list of traditional RDBMS
- analytics databases (Greenplum, Vertica, Netezza, Teradata, etc.)
- NoSQL databases (MongoDB, HBase, etc.)
- Hadoop variants
- LexisNexis HPCC
This is the world of polyglot persistence and hybrid data storage.
Original title and link: BI Pentaho Integrates Hadoop, NoSQL Databases, and Analytic Databases ( ©myNoSQL)