NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



HortonWorks: All content tagged as HortonWorks in NoSQL databases and polyglot persistence

Hortonworks wishing Eric Baldeschwieler well

Three sentences in a 5 paragraphs post from Hortonwork’s CEO about Eric Baldeschwieler departure:

I’d like to start off first by thanking Eric for his contributions to the Hadoop community since its inception over 7 years ago, and I’d like to express my personal appreciation for his help in getting Hortonworks off the ground.

This smells like a not so friendly breakup.

✚ The first to notice this change was Derrick Harris on GigaOm.

✚ The new Hortonworks CTO is Ari Zilka (previously founder and CTO of IMDG Terracotta, but at the time already working at Hortonworks)

Original title and link: Hortonworks wishing Eric Baldeschwieler well (NoSQL database©myNoSQL)


Instead of an acquisition, Hortonworks announces $50 million in new financing

We are delighted to announce a new round of funding led by new investors Tenaya Capital and Dragoneer Investment Group, with participation from our existing investors Benchmark Capital, Index Ventures and Yahoo!.

I guess the rumors about a possible acquisition of Hortonworks aren’t true.

✚ If you are interested to see the history of rounds of the three major Hadoop players, here’s a summary:

Cloudera: $141M

  • 2009: $5M
  • 2009: $6M
  • 2010: $25M
  • 2011: $40M
  • 2012: $65M

Hortonworks: $70M

  • 2011: $20M
  • 2013: $50M

MapR: $52M

  • 2011: $20M
  • 2013: $32M

Note: the data I had for the MapR raises $30mil in Series C seems to be a bit different to the data I’ve collected today.

Original title and link: Instead of an acquisition, Hortonworks announces $50 million in new financing (NoSQL database©myNoSQL)


Rumors about a Hortonworks Acquisition

I’m catching up with the news these days and this rumor about Hortonworks from Curt Monash’s post sounds pretty big:

There’s a widespread belief that Hortonworks is being shopped. Numerous folks — including me — believe the rumor of an Intel offer for $700 million. Higher figures and alternate buyers aren’t as widely believed.

First of all, I don’t know anything about this—and just to be clear that means I really don’t know anything. But if it turns out to be true:

  1. it’s huge news for the Hadoop market
  2. it’s big news for the open source world as I think it would represent the 2nd largest acquisition of a pure open source company after MySQL. Achieved in 5th of the time
  3. this could make things simpler or much more complicated for Cloudera. Depending on how the acquirer will decide to operate the business
  4. this could be good news or pretty bad news for the Hadoop community and ecosystem considering the contributions Hortonworks made over time

If someone decides to drop me an “anonymous” email I promise I won’t hear anything.

Original title and link: Rumors about a Hortonworks Acquisition (NoSQL database©myNoSQL)

Project Savanna: Hadoop and OpenStack

Timothy Prickett Morgan for The Register about Project Savanna, a collaboration between Mirantis, Hortonworks, and Red Hat:

Batman and Robin. Peanut butter and chocolate. OpenStack and Hadoop. These are things that go together, with the latter pairing being something that commercial OpenStack distie Mirantis, commercial Hadoop distie Hortonworks, and commercial KVM and Linux distie (and soon to be OpenStack commercializer) Red Hat are putting together under a new OpenStack effort dubbed Project Savanna.

Hadoop is at the age where everyone tries to package it and claim they’ll be the Red Hat of the Hadoop ecosystem. I cannot really dot the i-s and cross the t-s, but my gut feeling is that right now all these are actually more similar to the attempts of bringing Linux to the desktop.

We know how successful these have been so far.

Original title and link: Project Savanna: Hadoop and OpenStack (NoSQL database©myNoSQL)


Project Falcon: Tackling Hadoop Data Lifecycle Management

Venkatesh Seetharam announcing a new Apache incubating project in the Hadoop ecosystem open sourced by InMobi and Hortonworks:

Today we are excited to see another example of the power of community at work as we highlight the newly approved Apache Software Foundation incubator project named Falcon. This incubation project was initiated by the team at InMobi together with engineers from Hortonworks. Falcon is useful to anyone building apps on Hadoop as it simplifies data management through the introduction of a data lifecycle management framework.

I think this diagram describes Project Falcon best:

Project Falcon at a Glance

✚ Was there any other project addressing this space?

Original title and link: Project Falcon: Tackling Hadoop Data Lifecycle Management (NoSQL database©myNoSQL)


Hadoop Now, Next and Beyond - Keynote by Eric Baldeschwieler

Eric Baldeschwieler’s keynote from HadoopSummit has been published on YouTube. It’s mainly about the goals and effort behind Hadoop 2.0 and the new tools in the Hadoop’s ecosystem meant to simplify different aspects of a Hadoop deployment (HCatalog, Ambary, Tez, Stinger Initiative).

✚ Datanami has published a summary of the keynote here

Original title and link: Hadoop Now, Next and Beyond - Keynote by Eric Baldeschwieler (NoSQL database©myNoSQL)

Halo 4: A Success Case Study of HDInsight, Microsoft's Hadoop on Azure

Besides a bit too many businessy words, this is a nice story of using HDInsight, the Hadoop solution for Windows developed by Microsoft and Hortonworks:

Behind the scenes, a powerful new Microsoft technology platform called HDInsight was capturing data from the cloud and feeding daily game statistics to the tournament’s operator, Virgin Gaming. Virgin not only used the data to update online leaderboards each day; it also relied on the data to detect cheaters, removing them from the boards to ensure that the right gamers got the chance to win.

But this new technology didn’t just support the Infinity Challenge. From day one, the Xbox 360 game has been using the Hadoop open source framework to gain deep insights into players. The Halo 4 development team at 343 Industries is taking these insights and updating the game almost weekly, using direct player feedback to tweak the game. In the process, the game’s multiplayer ecosystem continues to evolve with the community as the title matures in the marketplace.

Original title and link: Halo 4: A Success Case Study of HDInsight, Microsoft’s Hadoop on Azure (NoSQL database©myNoSQL)


MapR Raises $30mil in Series C

Where is MapR today?

  1. MapR raised a total of $59mil.
  2. According to John Schroeder (CEO) “92% of MapR customers pay primarely for licenses and not for ancillary services and support”.
  3. According to Wikibon, MapR had $23mil. revenue in 2012, 49% of which coming from services (nb: this seem to contradict the above point)
  4. Support for MapR installations is offered by Accenture and Booz Allen Hamilton

How will MapR use the new capital?

With the new funding, the company plans to invest in research & development, and expand into Asia.

How is MapR seeing its competitors?

John Schroeder (CEO):

“Our competitors’ model is very cash intensive and you have to wonder whether or not they’ll ever be cash-flow positive”.

Cloudera has raised until now $141mil:

  1. Series A: $5mil
  2. Series B: $6mil
  3. Series C: $25mil
  4. Series D: $40mil
  5. Series E: $65mil

According to this, Cloudera raised $36mil in the first 3 rounds. I couldn’t find any official data about the capital raised by Hortonworks, but the number I’ve seen in a couple of places is $50mil. So far MapR raised $59mil.

Sources for these bits:

Original title and link: MapR Raises $30mil in Series C (NoSQL database©myNoSQL)

How Many Hadoops?

The short answer is there is only one Apache Hadoop distribution.

The long answer is that there are many distributions that include Apache Hadoop or are claiming compatibility with Apache Hadoop.

The oldest and probably most popular: Cloudera’s Distribution of Hadoop (CDH)

The 100% open source: Hortonworks Data Platform.

The prioprietary: MapR.

The blue one: IBM InfoSphere BigInsights.

The latest: WANdisco Hadoop WDD, Intel Distribution of Hadoop and Pivotal HD from EMC Greenplum.

There’s also the version Facebook’s running on their cluster which includes Facebook Corona: a different approach to job scheduling and resource management.

But this list is not complete as it doesn’t include appliances featuring Hadoop. In this category we have:

  1. Oracle’s Big Data appliance featuring Cloudera’s Distribution of Hadoop
  2. Netapp’s Hadooplers
  3. EMC Greenplum DCA
  4. Teradata Aster Discovery Platform featuring Hortonworks’s Hadoop Data Platform
  5. Data Direct Networks (DDN)

I hope I didn’t miss any important ones1. As a conclusion for this list, my question is: who is actually benefiting from all these distributions?

  1. I left aside for now Hadoop-as-a-Service.  

Original title and link: How Many Hadoops? (NoSQL database©myNoSQL)

Hortonworks: The Fastest Path to Innovation: Community Driven Open Source

Shaun Connolly for the Hortonworks blog:

we believe the fastest way to innovate is to do our work within the open source community, introduce enterprise feature requirements into that public domain, and to work diligently to progress existing open source projects and incubate new projects to meet those needs.

In support of our approach, this week we’ve announced the submission of two new incubation projects to the Apache Software foundation together with the launch of the “Stinger Initiative”, all aimed at enhancing the security and performance of Hadoop applications.

I’m forced, but extremely happy to take back what I said.

  • Stinger: an initiative to speed up Apache Hive for interactive queries. Read about it here
  • Know Gateway: a solution for authentication and security in Hadoop. More details here
  • Tez framework: a new Hadoop YARN-based runtime for improved latency and throughput. Details here

Hortonworks believes in open source.

Original title and link: Hortonworks: The Fastest Path to Innovation: Community Driven Open Source (NoSQL database©myNoSQL)


Hortonworks and Community Driven Hadoop

First, “We Believe… in community driven Enterprise Apache Hadoop” and then the next day “Announcing Apache Hadoop 2.0.3 Release and Roadmap“. These two posts published within 2 days on Hortonworks’s blog don’t entirely support each other. At least not without a bit of a different formulation and linking to the announcement sent to the Hadoop mailing list.

Original title and link: Hortonworks and Community Driven Hadoop (NoSQL database©myNoSQL)

VMware Sues Hortonworks

Stay calm. Hadoop is safe.

The Register:

VMware has taken Hortonworks to court along with four ex-VMers who now work at the startup - and among them is VMWare’s former global sales chief.

Original title and link: VMware Sues Hortonworks (NoSQL database©myNoSQL)