Hortonworks: All content tagged as Hortonworks in NoSQL databases and polyglot persistence
Hortonworks, eBay and Scaled Risk have been collaborating in improving the mean time to recovery in HBase and after long testing performed at eBay, some results are now available for 2 scenarios:
- Node/RegionServer failures while writing
- Node/RegionServer failures while reading
Original title and link: Results of collaboration on improving the Mean Time to Recovery in HBase ( ©myNoSQL)
In 4 years of writing this blog I haven’t seen such a prolific month:
- Apache Hadoop 2.2.0 (more links here)
- Apache HBase 0.96 (here and here)
- Apache Hive 0.12 (more links here)
- Apache Ambari 1.4.1
- Apache Pig 0.12
- Apache Oozie 4.0.0
- Plus Presto.
Actually I don’t think I’ve ever seen such an ecosystem like the one created around Hadoop.
Original title and link: A prolific season for Hadoop and its ecosystem ( ©myNoSQL)
I’m catching up with the news these days and this rumor about Hortonworks from Curt Monash’s post sounds pretty big:
There’s a widespread belief that Hortonworks is being shopped. Numerous folks — including me — believe the rumor of an Intel offer for $700 million. Higher figures and alternate buyers aren’t as widely believed.
First of all, I don’t know anything about this—and just to be clear that means I really don’t know anything. But if it turns out to be true:
- it’s huge news for the Hadoop market
- it’s big news for the open source world as I think it would represent the 2nd largest acquisition of a pure open source company after MySQL. Achieved in 5th of the time
- this could make things simpler or much more complicated for Cloudera. Depending on how the acquirer will decide to operate the business
- this could be good news or pretty bad news for the Hadoop community and ecosystem considering the contributions Hortonworks made over time
If someone decides to drop me an “anonymous” email I promise I won’t hear anything.
Original title and link: Rumors about a Hortonworks Acquisition ( ©myNoSQL)
Eric Baldeschwieler’s keynote from HadoopSummit has been published on YouTube. It’s mainly about the goals and effort behind Hadoop 2.0 and the new tools in the Hadoop’s ecosystem meant to simplify different aspects of a Hadoop deployment (HCatalog, Ambary, Tez, Stinger Initiative).
✚ Datanami has published a summary of the keynote here
Original title and link: Hadoop Now, Next and Beyond - Keynote by Eric Baldeschwieler ( ©myNoSQL)
Where is MapR today?
- MapR raised a total of $59mil.
- According to John Schroeder (CEO) “92% of MapR customers pay primarely for licenses and not for ancillary services and support”.
- According to Wikibon, MapR had $23mil. revenue in 2012, 49% of which coming from services (nb: this seem to contradict the above point)
- Support for MapR installations is offered by Accenture and Booz Allen Hamilton
How will MapR use the new capital?
With the new funding, the company plans to invest in research & development, and expand into Asia.
How is MapR seeing its competitors?
John Schroeder (CEO):
“Our competitors’ model is very cash intensive and you have to wonder whether or not they’ll ever be cash-flow positive”.
Cloudera has raised until now $141mil:
- Series A: $5mil
- Series B: $6mil
- Series C: $25mil
- Series D: $40mil
- Series E: $65mil
According to this, Cloudera raised $36mil in the first 3 rounds. I couldn’t find any official data about the capital raised by Hortonworks, but the number I’ve seen in a couple of places is $50mil. So far MapR raised $59mil.
Sources for these bits:
- VentureBeat: MapR gets $30M to push Hadoop deeper into the enterprise
- AllThingsD: MapR Lands $30 Million Series C Led by Mayfield Fund - Arik Hesseldahl - Enterprise - AllThingsD
- CrunchBase: Cloudera | CrunchBase Profile
- Wikibon: Big Data Vendor Revenue And Market Forecast 2012-2017 - Wikibon
Original title and link: MapR Raises $30mil in Series C ( ©myNoSQL)
The short answer is there is only one Apache Hadoop distribution.
The long answer is that there are many distributions that include Apache Hadoop or are claiming compatibility with Apache Hadoop.
The oldest and probably most popular: Cloudera’s Distribution of Hadoop (CDH)
The 100% open source: Hortonworks Data Platform.
The prioprietary: MapR.
The blue one: IBM InfoSphere BigInsights.
There’s also the version Facebook’s running on their cluster which includes Facebook Corona: a different approach to job scheduling and resource management.
But this list is not complete as it doesn’t include appliances featuring Hadoop. In this category we have:
- Oracle’s Big Data appliance featuring Cloudera’s Distribution of Hadoop
- Netapp’s Hadooplers
- EMC Greenplum DCA
- Teradata Aster Discovery Platform featuring Hortonworks’s Hadoop Data Platform
- Data Direct Networks (DDN)
I hope I didn’t miss any important ones1. As a conclusion for this list, my question is: who is actually benefiting from all these distributions?
I left aside for now Hadoop-as-a-Service. ↩
Original title and link: How Many Hadoops? ( ©myNoSQL)