mapreduce: All content tagged as mapreduce in NoSQL databases and polyglot persistence
Based on ESG’s modeling of a medium-sized Hadoop-oriented big data project, the preconfigured Oracle Big Data Appliance is 39% less costly than a “build” equivalent do-it-yourself infrastructure. And using Oracle Big Data Appliance will cut the project length by about one-third. For most enterprises planning to take big data beyond experimentation and proof-of- concept, ESG suggests skipping the idea of in-house development, on-going management, and expansion of your own big data infrastructure, to instead look to purpose-built infrastructure solutions such as Oracle Big Data Appliance.
This is an extract from Oracle’s whitepaper “Getting Real about Big Data: Build Versus Buy“. It’s a nice reading excercise to better understand how the database leader is positioning their Oracle Big Data Appliance compared to Hadoop’s commodity-hardware cluster.
I’d love seeing the equivalent paper from Hortonworks1.
The only reason I’m referring directly to Hortonworks and not also Cloudera is that the Hadoop part of Oracle Big Data Appliance is offered by Cloudera. ↩
Original title and link: Oracle Paper: The Cost of Do-It-Yourself Hadoop vs Oracle Big Data Appliance ( ©myNoSQL)
Very timely infographic from GigaOm about the Hadoop ecosystem augmenting my earlier How many Hadoops? post:
Original title and link: The Hadoop Ecosystem Infographic ( ©myNoSQL)
The short answer is there is only one Apache Hadoop distribution.
The long answer is that there are many distributions that include Apache Hadoop or are claiming compatibility with Apache Hadoop.
The oldest and probably most popular: Cloudera’s Distribution of Hadoop (CDH)
The 100% open source: Hortonworks Data Platform.
The prioprietary: MapR.
The blue one: IBM InfoSphere BigInsights.
There’s also the version Facebook’s running on their cluster which includes Facebook Corona: a different approach to job scheduling and resource management.
But this list is not complete as it doesn’t include appliances featuring Hadoop. In this category we have:
- Oracle’s Big Data appliance featuring Cloudera’s Distribution of Hadoop
- Netapp’s Hadooplers
- EMC Greenplum DCA
- Teradata Aster Discovery Platform featuring Hortonworks’s Hadoop Data Platform
- Data Direct Networks (DDN)
I hope I didn’t miss any important ones1. As a conclusion for this list, my question is: who is actually benefiting from all these distributions?
I left aside for now Hadoop-as-a-Service. ↩
Original title and link: How Many Hadoops? ( ©myNoSQL)