Cloudera: All content tagged as Cloudera in NoSQL databases and polyglot persistence
Wednesday, 16 January 2013
Video Interview With Cloudera’s Jeff Hammerbacher on Building Big Data Systems
I wasn’t expecting to see this on TechCrunch… so it took me a bit deciding to link to it. I did it for Jeff Hammerbacher.
Original title and link: Video Interview With Cloudera’s Jeff Hammerbacher on Building Big Data Systems (©myNoSQL)
Monday, 29 October 2012
Overview of Dremel-Like Solutions: Moving Beyond Hadoop for Big Data Needs
Until I learn more about the recently announced Cloudera Impala and Druid from Metamarkets, this article by Jaikumar Vijayan should offer—with some inherent mistakes1—a good overview of the solutions aiming to offer alternatives to the batch-processing nature of Hadoop:
- Google Dremel (BigQuery)
- Cloudera Impala
- Metamarkets Druid
- Nodeable StreamReduce
- SAP HANA integrated with Hadoop, etc.
-
Just an example: “If you can stand latencies of a few seconds, Hadoop is fine. But Hadoop MapReduce is never going to be useful for sub-second latencies”. Then “The technology [nb Google Dremel] can run queries over trillion-row data tables in seconds…”
Maybe just one more: consider the title “Moving beyond Hadoop” and then the quote from Google’s Ju-kay Kwek: “Google uses Dremel in conjuction with MapReduce. […] Hadoop and Dremel are distributed computing technologies, but each was built to address very different problems.” ↩
Original title and link: Overview of Dremel-Like Solutions: Moving Beyond Hadoop for Big Data Needs (©myNoSQL)
Tuesday, 2 October 2012
Cloudera Disitribution of Hadoop 4.1 Released
The yearly major release of CDH is out.
Original title and link: Cloudera Disitribution of Hadoop 4.1 Released (©myNoSQL)
via: http://www.cloudera.com/blog/2012/10/cdh4-1-now-released/
Monday, 6 August 2012
HttpFS: Another Hadoop File System Over HTTP
Just a new HTTP interface for Hadoop file system. The main differences between HttpFS and WebHDFS are that this one is created by Cloudera, not Hortonworks (on top of their previos Hoop library) and:
HttpFs is a proxy so, unlike WebHDFS, it does not require clients be able to access every machine in the cluster. This allows clients to to access a cluster that is behind a firewall via the WebHDFS REST API.
Question is: if they are API compatible and both open source, why not unifying them?
Original title and link: HttpFS: Another Hadoop File System Over HTTP (©myNoSQL)
via: http://www.cloudera.com/blog/2012/08/httpfs-for-cdh3-the-hadoop-filesystem-over-http/
Thursday, 26 July 2012
Cloudera and HP Partnership to Simplify Hadoop Deployments
As I was expecting after the series of announcements coming from MapR, Cloudera is announcing its partnership with HP:
Under the terms of the joint development and licensing agreement, the two companies will deliver open standards-based reference architectures that simplify management and accelerate deployment of Hadoop Cluster environments. Clients can purchase the Cloudera Enterprise platform and future Cloudera products either directly from HP or bundled in HP AppSystem for Apache Hadoop.
The new HP reference architecture for Apache Hadoop for Cloudera and HP AppSystem for Apache Hadoop—Cloudera are based on HP Converged Infrastructure. They include the Cloudera Enterprise platform and HP Insight Cluster Manager Utility (CMU) software.
Original title and link: Cloudera and HP Partnership to Simplify Hadoop Deployments (©myNoSQL)
Friday, 13 July 2012
MapR Claims Title as De Facto Standard for Hadoop
Maureen O’Gara:
The champagne has been flowing over at MapR since Google announced the integration of its Distribution for Hadoop with Google Compute Engine, the start-up’s second big win in a row.
Indeed, MapR on Amazon Elastic MapReduce and Google Compute Engine are two very important events in the life of MapR and for the Hadoop ecosystem in general. But there’s still a long way from these to being a de facto standard.
Original title and link: MapR Claims Title as De Facto Standard for Hadoop (©myNoSQL)
Cloudera or MapR for Hadoop Distribution?
A couple of links covering various aspects of this question:
- Quora thread covering this subject
- Joe Stein’s Hadoop distribution bake-off and my experience with Cloudera and MapR
- How I’d choose a Hadoop distribution
- MapR claims title as de facto standard for Hadoop
If you have other good references answering the question of what Hadoop distribution to choose please leave a comment.
Original title and link: Cloudera or MapR for Hadoop Distribution? (©myNoSQL)
Wednesday, 11 July 2012
The Hadoop Ecosystem Relationships
Excellent infographic about the relationships in the Hadoop market created with Datameer:
A while ago I’ve created a Google Spreadsheet in which I’ve tried to track all these relationships, but going through PR announcements wasn’t really my thing. Now there’s a CSV file with all this data.
Original title and link: The Hadoop Ecosystem Relationships (©myNoSQL)
via: http://www.cloudera.com/blog/2012/07/the-hadoop-ecosystem-visualized-in-datameer/
Friday, 15 June 2012
Pricing for Hadoop Support: Cloudera, Hortonworks, MapR
Found the following bits in a post on The Register by Timothy Prickett Morgan:
While Cloudera and MapR are charging $4,000 per node for their enterprise-class Hadoop distributions (including their proprietary extensions and tech support), Hortonworks doesn’t have any proprietary extensions and is living off of the support contracts for the HDP 1.0 stack. […] Hortonworks is not providing its full list price, but for a starter ten-node cluster, you can get a standard support contract for $12,000 per year.
Hortonworks’s pricing looks a bit aggressive, but this could be explained by the fact that Hortonworks Data Platform 1.0 was made available only this week.
For running Hadoop in the cloud, there’s also Amazon Elastic MapReduce whose pricing was always clear. And Amazon has recently announced support for MapR Hadoop distribution on Elastic MapReduce.
Original title and link: Pricing for Hadoop Support: Cloudera, Hortonworks, MapR (©myNoSQL)
Thursday, 7 June 2012
Looking to Stay Ahead of Hortonworks and MapR in the Hadoop Market, Cloudera Delivers High Availability, Better Security, and Easier System Management
Compare the title, which is the subtitle of the InformationWeek post, with this paragraph which reflects the reality:
Both Cloudera and Hortonworks will be distributing open source software from Apache’s Hadoop 2.3 release, which includes upgrades aimed at high-availability and improved security. The release includes a hot-failover for the NameNode (metadata server) of the Hadoop Distributed File System (HDFS), which has long been a single point of failure.
Cloudera is indeed one of the biggest Hadoop contributors and a company that have helped a lot proving and thus popularizing Hadoop through their packaging of open source Hadoop ecosystem components paired with their management tool (Cloudera Manager). But NameNode high availability and security improvements are part of the Apache Hadoop source code.
Original title and link: Looking to Stay Ahead of Hortonworks and MapR in the Hadoop Market, Cloudera Delivers High Availability, Better Security, and Easier System Management (©myNoSQL)
via: http://www.informationweek.com/news/software/info_management/240001574
Tuesday, 24 April 2012
Notes on the Hadoop and HBase Markets
Curt Monash shares what he heard from his customers:
- Over half of Cloudera’s customers (nb 100 subscription customers) use HBase
- Hortonworks thinks a typical enterprise Hadoop cluster has 20-50 nodes, with 50-100 already being on the large side.
- There are huge amounts of Elastic MapReduce/Hadoop processing in the Amazon cloud. Some estimates say it’s the majority of all Amazon Web Services processing.
Original title and link: Notes on the Hadoop and HBase Markets (©myNoSQL)
via: http://www.dbms2.com/2012/04/24/notes-on-the-hadoop-and-hbase-markets/
Friday, 6 April 2012
What Are the Pros and Cons of Running Cloudera’s Distribution for Hadoop vs Amazon Elastic MapReduce Service?
Old Quora question, but still very relevant. Top response from Jeff Hammerbacher:
Elastic MapReduce Pros:
- Dynamic MapReduce cluster sizing.
- Ease of use for simple jobs via their proprietary web console.
- Great documentation.
- Integrates nicely with other Amazon Web Services.
Cloudera Distribution for Hadoop:
- CDH is open source; you have access to the source code and can inspect it for debugging purposes and make modifications as required.
- CDH can be run on a number of public or private clouds using an open source framework, Whirr, so you’re not tied to a single cloud provider
- With CDH, you can move your cluster to dedicated hardware with little disruption when the economics make sense. Most non-trivial applications will benefit from this move.
- CDH packages a number of open source projects that are not included with EMR: Sqoop, Flume, HBase, Oozie, ZooKeeper, Avro, and Hue. You have access to the complete platform composed of data collection, storage, and processing tools.
- CDH packages a number of critical bug fixes and features and the most recent stable releases, so you’re usually using a more stable and feature-rich product.
- You can purchase support and management tools for CDH via Cloudera Enterprise.
- CDH uses the open source Oozie framework for workflow management. EMR implemented a proprietary “job flow” system before major Hadoop users standardized on Oozie for workload management.
- CDH uses the open source Hue framework for its user interface. If you require new features from your web interface, you can easily implement them using the Hue SDK.
- CDH includes a number of integrations with other software components of the data management stack, including Talend, Informatica, Netezza, Teradata, Greenplum, Microstrategy, and others. […]
- CDH has been designed and deployed in common Linux environments and you can use standard tools to debug your programs. […]
Make sure you also read Hadoop in the Cloud: Pros and Cons which addresses (almost) the same question.
A Twitter-style answer to this question would be: “Control and customization vs Automated and Managed Service”. 80 characters left to add your own perspective.
Original title and link: What Are the Pros and Cons of Running Cloudera’s Distribution for Hadoop vs Amazon Elastic MapReduce Service? (©myNoSQL)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling
