ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Cloudera: All content tagged as Cloudera in NoSQL databases and polyglot persistence

Oracle Big Data Appliance Released Features Cloudera Distribution of Hadoop: What You Need to Know

Oracle Big Data Appliance hardware specification

Klint Finley for ServicesANGLE:

18 Oracle Sun servers with a total of:

  • 864 GB main memory;
  • 216 CPU cores;
  • 648 TB of raw disk storage;
  • 40 Gb/s InfiniBand connectivity between nodes and other Oracle engineered systems; and,
  • 10 Gb/s Ethernet data center connectivity.

Joab Jackson for PCWorld Business Center:

The package includes 40Gb/s InfiniBand connectivity among the nodes, a rarity among Hadoop deployments, many of which use Ethernet to connect the nodes. Lumpkin said InfiniBand would speed data transfers within the system. Multiple racks can be tethered together in a cluster configuration. There is no theoretical limit to how many racks can be clustered together, though configurations of more than eight racks would require additional switches, Lumpkin said.

Oracle Big Data Appliance software specification

  • Cloudera’s Distribution including Apache Hadoop
  • Cloudera Manager
  • Open source distribution of R
  • Oracle NoSQL Database Community Edition
  • Oracle Big Data Connectors
  • Oracle Linux

Joab Jackson for PCWorld Business Center:

Along with the release, Oracle also released Oracle Big Data Connectors, a set of drivers for exchanging data between the Big Data Appliance and other Oracle products, such as the Oracle Database 11g, the Oracle Exadata Database Machine, Oracle Exalogic Elastic Cloud and Oracle Exalytics In-Memory Machine.

Derrick Harris for GigaOm:

However, Oracle isn’t blind to the fact that not everyone will be gung ho about buying an appliance. Its custom-built Big Data Connectors are available as separate products for those customers wanting to connect existing Hadoop clusters to Oracle database environments or R statistical-analysis environments.

Klint Finley for ServicesANGLE:

According to Oracle’s announcement “The integrated Oracle and Cloudera architecture has been fully tested and validated by Oracle, who will also collaborate with Cloudera to provide support for Oracle Big Data Appliance.”

Oracle Big Data Appliance Services

George Lumpkin, Oracle’s vice president of data warehousing product management:

Oracle will provide first-line support for the appliance and all software (including the Hadoop distribution and Cloudera Manager) through its case-tracking support infrastructure. But when particularly tough support cases arise, Oracle will tap Cloudera’s expertise.

What’s more, Oracle will refer customers to Cloudera for Hadoop training and consulting engagements.

Oracle Big Data Appliance Positioning

George Lumpkin, Oracle’s vice president of data warehousing product management:

We are positioning this as something that runs alongside other Oracle-based systems. Big data is more than just a cluster of hardware running Hadoop. It is an overall information architecture for enabling companies to analyze data and make decisions.

Doug Hanshen for Informationweek:

Oracle highlighted the Big Data Appliance as a complement to a growing family of “engineered systems” that now includes Exadata, Exalogic, and the Exalytics In-Memory Machine.

Merv Adrian (Gartner analyst) cited by Informationweek:

But what’s more remarkable is the fact that Oracle is finally looking beyond its core database. Oracle’s TimesTen and Essbase databases, which were recently upgraded for use in the Exalytics appliance, and BerkeleyDB, which was Oracle’s development starting point for the new NoSQL database, are examples of that shift.

Oracle is suddenly beginning to act as a data-management portfolio company, not just a company with a big brother and a bunch of starving siblings.

Joab Jackson for PCWorld Business Center:

Oracle is positioning the appliance for managing and analyzing large sets of data that may be too large, or otherwise unsuitable for keeping in databases, such as telemetry data, click-stream data or other log data. “You may not want to keep the data in a database, but you do want to store it and analyze it,” Lumpkin said. The appliance is intended for those organizations that want to undertake Big Data-style analysis but may not have the in-house expertise to assemble large Hadoop or NoSQL-based systems.

Pricing

Kurt Dunn, Cloudera’s chief operating officer told InformationWeek.

Oracle has put together a very comprehensive product that is priced very well.

Brian Proffitt for ITworld:

The cost of the Big Data Appliance is what will really stand out. At $500,000, this may not seem like a bargain, but in reality it is. Typically, commoditized Hadoop systems run at about $4,000 a node. To get this much data storage capacity and power, you would need about 385 nodes… which puts the price tag at around $1.54 million—three times the price of Oracle’s Cloudera-based offering (which, I should add, excludes things like support costs and power).

Doug Hanshen for Informationweek:

The hardware and software combined will sell for $450,000, with an annual support fee for both hardware and software of 12%. That’s highly competitive, working out to less than $700 per terabyte and being in line with the low costs big data practitioners expect from deployments built on commodity hardware.

Oracle - Cloudera Parternship

I wrote earlier my take on what this partnership means to both Oracle and Cloudera.

Doug Hanshen for Informationweek:

But by releasing the product early in the year in partnership with Cloudera, which has more customers and years in the market than any other Hadoop software and services provider, Oracle has made it clear that it is wasting no time and taking no chances with unproven technology.

“Cloudera brings us a couple of very important missing pieces, including its management software and assistance for a deeper second- and third-tier level of support,” said George Lumpkin, Oracle’s vice president of product management, data warehousing.

Speculations about the future of the Oracle - Cloudera partnership

Brian Proffitt for ITworld:

Students of Linux history will well remember that’s exactly what happened when Oracle partnered with Red Hat to introduce commoditized Oracle offerings… and then Larry Ellison and crew decided to roll their own Oracle Enterprise Linux in 2006 when they decided to cut Red Hat out of the stack.

This is strong historical evidence that Oracle will do the same with Cloudera, because frankly the big data market is too big for Oracle not to want to own. Big Data Appliance customers should note this, and be very prepared that future versions may not be tied to Cloudera at all, but rather Oracle’s version of Hadoop.

A few people suggested on Twitter that this partnership is a sign of a possible Oracle’s acquisition of Cloudera. TechCrunch’s Leena Rao links to an old post by Matt Asay suggesting this acquisition.

Media coverage of Oracle Big Data Appliance

Original title and link: Oracle Big Data Appliance Released Features Cloudera Distribution of Hadoop: What You Need to Know (NoSQL database©myNoSQL)


Cloudera Distribution of Hadoop Powers Oracle’s Big Data Appliance

The announcement of the Oracle Big Data Appliance was out for a couple of hours and already hit all media sites. Before looking at the details of the announcement, let’s try to understand what this announcement means for the parties involved.

What does it mean for Oracle?

  • Oracle enters a very busy Hadoop market associated with the best known company in the Hadoop ecosystem
  • With this partnership, Oracle didn’t have to make a huge investment in software development or services
  • Not having to build its own distribution of Hadoop, Oracle could focus on developing the Oracle Big Data Connectors
  • Oracle will delegate everything Hadoop to Cloudera thus it won’t have to deal with a very fast evolving open source project that might see some interesting events due to the
  • Oracle seems to have changed the message about Hadoop being used only for basic ETL.

What does it mean for Cloudera?

  • Cloudera gets access to a pool of customers (many of them possibly very large customers)
  • Cloudera will not need a big sales force to reach to these possible customers. Even if Cloudera knew about them, Oracle’s sales force will do the job
  • If Oracle spells Cloudera’s name in every sales pitch, Cloudera will see a huge publicity bump that will sooner or later lead to more customers

Truth is I was expecting yet another distribution of Hadoop. And even if Oracle’s Big Data Appliance doesn’t feature the official Apache Hadoop distribution, I think that by choosing an existing distribution, Oracle did the right thing. For them and for their customers.

Original title and link: Cloudera Distribution of Hadoop Powers Oracle’s Big Data Appliance (NoSQL database©myNoSQL)


8 Most Interesting Companies for Hadoop’s Future

Filtering and augmenting a Q&A on Quora:

  1. Cloudera: Hadoop distribution, Cloudera Enterprise, Services, Training
  2. Hortonworks: Apache Hadoop major contributions, Services, Training
  3. MapR: Hadoop distribution, Services, Training
  4. HPCC Systems: massive parallel-processing computing platform
  5. HStreaming: real-time data processing and analytics capabilities on top of Hadoop
  6. DataStax: DataStax Enterprise, Apache Cassandra based platform accepting real-time input from online applications, while offering analytic operations, powered by Hadoop
  7. Zettaset: Enterprise Data Analytics Suite built on Hadoop
  8. Hadapt: analytic platform based on Apache Hadoop and relational DBMS technology

I’ve left aside names like IBM, EMC, Informatica, which are doing a lot of integration work.

Original title and link: 8 Most Interesting Companies for Hadoop’s Future (NoSQL database©myNoSQL)


Hadoop Market Competition: comScore From Cloudera to MapR

Mike Brown (comScore CTO):

We could capitalize the purchase [of MapR] with an annual maintenance charge versus a yearly cost per node. NFS allowed our enterprise systems to easily access the data in the cluster.

Some interesting bits:

  • comScore runs a 1000+ self-hosted Hadoop cluster
  • comScore migrated from Cloudera to MapR in 2 days
    • the migration was accomplished by copying and reloading data
    • depending on the size of stored data, a better approach would a rolling migration—
  • comScore MapR’s Direct Access NFS feature, which exposes Hadoop Distributed File System (HDFS) data as NFS files which can then be easily mounted, modified or overwritten
  • comScore will continue to use Cloudera for training purposes
    • Question: what is the advantage of paying two providers and maintaining two different clusters?

As previewed by Cloudera-Hortonworks exchanges, the competition on the Hadoop market is becoming fierce. But at least this story involves companies that are actively involved in innovating and improving Hadoop. Not those that just want to monetize it.

Original title and link: Hadoop Market Competition: comScore From Cloudera to MapR (NoSQL database©myNoSQL)

via: http://searchdatamanagement.techtarget.com/news/2240112247/ComScore-moves-big-data-analytics-environment-from-Cloudera-to-MapR


Cloudera Enterprise: Cloudera Manager and Cloudera support

Cloudera Enterprise is what Cloudera sells in addition to their Cloudera Hadoop Distribution (CDH):

  • Cloudera Manager and Cloudera support
  • Cloudera Manager: end-to-end management application for Apache Hadoop
    • Deploy: automated installation
    • Discover: service health and monitoring, including events and alerts
    • Diagnose
      • Job analytics
      • Log search
      • Configuration recommendations
    • Act
      • Service and configuration management
      • Security management
    • Optimize
      • Resource and quota management
  • Free and Enterprise editions
  • Free edition: up to 50 nodes
  • Enterprise edition: no available pricing
  • Feature comparison
Cloudera Manager Editions

After the break: a short video about Cloudera Manager and media coverage:


Hadoop Market: Hortonworks’ Positioning

Eric Baldeschweiler in a recent briefing—transcript by Bert Latamore over Wikibon:

We’re really committed to building out Apache Hadoop and doing it in the Open Source community, so what really differentiates us is being really committed, besides shipping 100% pure Apache Hadoop code, which nobody else does, to taking a very partnering ecosystem-centric approach.[…] We’re the only ones committed to shipping Apache Hadoop code. We’ve been the drivers behind every major release of Apache Hadoop since its inception. Other companies are packaging and distributing Hadoop, but when they do that they add lots of their own custom stuff, both as patches to the Apache Hadoop distribution and also as independent products. A lot of that work is going into Apache, and since we committed to the Open Source model we’ve seen a lot more third party code going into Apache, which is obviously a win for the community. But to date no other company is actually taking releases from Apache & supporting them. They create their own versions that are slightly different from what comes from Apache, and try to build a business around that.

The political message from both Cloudera and Hortonworks is “we compete as businesses, but collaborate for the good of Hadoop“. But behind the curtains, they both prepare the big guns.

Original title and link: Hadoop Market: Hortonworks’ Positioning (NoSQL database©myNoSQL)


Why Is Cloudera Packing Mahout With Hadoop?

Machine learning is an entire field devoted to Information Retrieval, Statistics, Linear Algebra, Analysis of Algorithms, and many other subjects. This field allows us to examine things such as recommendation engines involving new friends, love interests, and new products. We can do incredibly advanced analysis around genetic sequencing and examination, distributed search and frequency pattern matching, as well mathematical analysis with vectors, matrices, and singular value decomposition (SVD).

All these fields have deep connections in the big data space.

Original title and link: Why Is Cloudera Packing Mahout With Hadoop? (NoSQL database©myNoSQL)

via: http://www.cloudera.com/blog/2011/11/cdh3u2-apache-mahout-integration/


Hortonworks Data Platform: Hortonworks’ Hadoop Distribution

Announcement came out today[1]:

Hortonworks Data Platform, powered by Apache Hadoop — As we began to interact with enterprises and ecosystem partners, the one constant was the need for a base distribution of Apache Hadoop that is 100% open source and that contains the essential components used with every Hadoop installation.  A distribution was needed to provide an easy to install, tightly integrated and well tested set of servers and tools. As we interacted with potential partners, we also heard the message loud and clear that they wanted open and secure APIs to easily integrate and extend Hadoop. We believe we have succeeded on both fronts. The Hortonworks Data Platform is such an open source distribution.  It is powered by Apache Hadoop and includes the essential Hadoop components, plus some that make it more manageable, open and extensible. Our distribution is based on Hadoop 0.20.205, the first Apache Hadoop release that supports security and HBase.  It also includes some new APIs, such as WebHDFS and those in Ambari and HCatalog, which will make it easy for our partners to integrate their products with Apache Hadoop. For those new to Ambari, it is an open source Apache project that will bring improved installation and management to Hadoop. HCatalog is a metadata management service for simplifying the sharing of data between Hadoop and other data systems. We are releasing Hortonworks Data Platform initially as a limited technology preview with plans to open it up to the public in early 2012.

The fight is on–even if for now the tone is still polite. And if we are adding to the mix MapR and LexisNexis’ HPCC, not to mention the armies of marketers and sales coming from Oracle, IBM, EMC, NetApp, etc. this actually smells like war.

Edward Ribeiro apty commented: “This reminds me of Linux distros war circa 2001”.


  1. The emphasis in the text is mine to underline the most important aspects of the announcement.  

Original title and link: Hortonworks Data Platform: Hortonworks’ Hadoop Distribution (NoSQL database©myNoSQL)


Hadoop, Hortonworks, Cloudera: A Page of History

At a time when everyone is reading, writing, or talking about Steve Job’s biography, Wired has published a long article looking at the history of Hadoop (Yahoo-era), the Hortonworks spin-off, and Cloudera. While the article doesn’t cover the late rush into Hadoop world by giants like Oracle, IBM, EMC, and others which all want a piece, it gives an interesting overview of the Hadoop ecosystem dynamics:

The initial result is an amusingly heated rivalry between Cloudera and Hortonworks — the kind of rivalry you only see in the open source world. […] But ultimately, this Hadoop civil war shows just how vibrant the platform is.

“Additional investment in the platform and more people concentrating on the open source distro is good for community and good for Cloudera,” Olson says. It’s the sort of thing you always hear from a competitor when a new company enters a market. But in this case, there’s a truth to it. Bearden and Baldeschwieler’s efforts to expand the open source project can only help Cloudera — and the rest of the market.

Original title and link: Hadoop, Hortonworks, Cloudera: A Page of History (NoSQL database©myNoSQL)

via: http://www.wired.com/wiredenterprise/2011/10/how-yahoo-spawned-hadoop/all/1


Datameer Is the First BI/Analytics Platform Built Natively on Hadoop

Brian Smith (Datameer Regional Director of Sales):

DAS is an open book at every stage of the data pipeline, with plug and play support at each phase – integration, analysis and visualization. Under the covers, DAS generates Java/MapReduce code that runs natively on the Hadoop cluster. All current Hadoop distros are supported – we’re Switzerland when it comes to platform support for Apache, Cloudera, MapR, IBM and the rest, we run all of it in a browser on Windows, Mac and Linux.

As always I won’t comment on statements referring to “first” or “best”. But I find Brian Smith’s assessment of the Hadoop economics very accurate:

The economics are compelling — Hadoop is moving out costly analytic databases and warehouses, driving IT to re-look at ADBMS sales cycles, shifting IT dollars and vendor roadmaps, and generally wreaking havoc in the traditional vendor community. We’ve gone from one or two distributions to nine in the last year! And, literally every vendor in the BI/DBMS space has a Hadoop connector, the latest being the recent Oracle announcement. Everybody is on board this train — All this based upon the premise of unlimited scale and data variety at a fraction of traditional costs.  Technical challenges exist, but its clear that there’s a sea change.

Original title and link: Datameer Is the First BI/Analytics Platform Built Natively on Hadoop (NoSQL database©myNoSQL)

via: http://datameer.com/blog/uncategorized/why-i-am-at-datameer.html


Mine Is Bigger Than Yours: Hadoop Code Contributions

Who’s bigger? Hortonworks’ The Yahoo! Effect or Cloudera’s The Community Effect?

This is ugly and should never happen to an open source project.

Still Joe Brockmeier (RWW) describes this as a superb win-win situation:

It might seem unhealthy for companies to be clamoring for credit in open source projects, but it’s a sign of health for projects. If companies position themselves to be top contributors, and care about their standing, the projects win. Users win too. Developers in the ecosystem also win – since it’s far easier to hire existing contributors than trying to push outsiders in to a project.

But there’s just a minor thing missing. Who gets the cheese?

Original title and link: Mine Is Bigger Than Yours: Hadoop Code Contributions (NoSQL database©myNoSQL)


R and Hadoop: Revolution Analytics and Cloudera Partnership Announced

In the series of big announcements coming out this month, Cloudera and Revolution Analytics, the enterprise provider of R software, have announced their partnership to integrate Cloudera’s Hadoop distribution with Revolution R Enterprise platform thus offering R developers direct access to Hadoop data stores and the possibility to write MapReduce jobs directly in R.

The integration packages, named RevoConnectR for Apache Hadoop, are already available freely on GitHub and they will also get commercial support with Revolution R Enterprise 5.0 Server for Linux.

You can read more about this announcement on:

Original title and link: R and Hadoop: Revolution Analytics and Cloudera Partnership Announced (NoSQL database©myNoSQL)