Cloudera: All content tagged as Cloudera in NoSQL databases and polyglot persistence
Under the terms of the joint development and licensing agreement, the two companies will deliver open standards-based reference architectures that simplify management and accelerate deployment of Hadoop Cluster environments. Clients can purchase the Cloudera Enterprise platform and future Cloudera products either directly from HP or bundled in HP AppSystem for Apache Hadoop.
The new HP reference architecture for Apache Hadoop for Cloudera and HP AppSystem for Apache Hadoop—Cloudera are based on HP Converged Infrastructure. They include the Cloudera Enterprise platform and HP Insight Cluster Manager Utility (CMU) software.
Original title and link: Cloudera and HP Partnership to Simplify Hadoop Deployments ( ©myNoSQL)
A couple of links covering various aspects of this question:
- Quora thread covering this subject
- Joe Stein’s Hadoop distribution bake-off and my experience with Cloudera and MapR
- How I’d choose a Hadoop distribution
- MapR claims title as de facto standard for Hadoop
If you have other good references answering the question of what Hadoop distribution to choose please leave a comment.
Original title and link: Cloudera or MapR for Hadoop Distribution? ( ©myNoSQL)
Found the following bits in a post on The Register by Timothy Prickett Morgan:
While Cloudera and MapR are charging $4,000 per node for their enterprise-class Hadoop distributions (including their proprietary extensions and tech support), Hortonworks doesn’t have any proprietary extensions and is living off of the support contracts for the HDP 1.0 stack. […] Hortonworks is not providing its full list price, but for a starter ten-node cluster, you can get a standard support contract for $12,000 per year.
Hortonworks’s pricing looks a bit aggressive, but this could be explained by the fact that Hortonworks Data Platform 1.0 was made available only this week.
For running Hadoop in the cloud, there’s also Amazon Elastic MapReduce whose pricing was always clear. And Amazon has recently announced support for MapR Hadoop distribution on Elastic MapReduce.
Original title and link: Pricing for Hadoop Support: Cloudera, Hortonworks, MapR ( ©myNoSQL)
What Are the Pros and Cons of Running Cloudera’s Distribution for Hadoop vs Amazon Elastic MapReduce Service?
Old Quora question, but still very relevant. Top response from Jeff Hammerbacher:
Elastic MapReduce Pros:
- Dynamic MapReduce cluster sizing.
- Ease of use for simple jobs via their proprietary web console.
- Great documentation.
- Integrates nicely with other Amazon Web Services.
Cloudera Distribution for Hadoop:
- CDH is open source; you have access to the source code and can inspect it for debugging purposes and make modifications as required.
- CDH can be run on a number of public or private clouds using an open source framework, Whirr, so you’re not tied to a single cloud provider
- With CDH, you can move your cluster to dedicated hardware with little disruption when the economics make sense. Most non-trivial applications will benefit from this move.
- CDH packages a number of open source projects that are not included with EMR: Sqoop, Flume, HBase, Oozie, ZooKeeper, Avro, and Hue. You have access to the complete platform composed of data collection, storage, and processing tools.
- CDH packages a number of critical bug fixes and features and the most recent stable releases, so you’re usually using a more stable and feature-rich product.
- You can purchase support and management tools for CDH via Cloudera Enterprise.
- CDH uses the open source Oozie framework for workflow management. EMR implemented a proprietary “job flow” system before major Hadoop users standardized on Oozie for workload management.
- CDH uses the open source Hue framework for its user interface. If you require new features from your web interface, you can easily implement them using the Hue SDK.
- CDH includes a number of integrations with other software components of the data management stack, including Talend, Informatica, Netezza, Teradata, Greenplum, Microstrategy, and others. […]
- CDH has been designed and deployed in common Linux environments and you can use standard tools to debug your programs. […]
Make sure you also read Hadoop in the Cloud: Pros and Cons which addresses (almost) the same question.
A Twitter-style answer to this question would be: “Control and customization vs Automated and Managed Service”. 80 characters left to add your own perspective.
Original title and link: What Are the Pros and Cons of Running Cloudera’s Distribution for Hadoop vs Amazon Elastic MapReduce Service? ( ©myNoSQL)