NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Mapr: All content tagged as Mapr in NoSQL databases and polyglot persistence

Three questions about MapR and their products.

There are three things that I’d really appreciate some help understanding:

  1. MapR says it is an Apache Hadoop distribution. Does any of the MapR products include the

    While I know there’s no definition of such a thing, as far as I know self-claimed API compatibility is by no means the same thing as Apache Hadoop.

    I’m also not aware of any action from ASF on this matter.

  2. MapR says it’s the most complete distribution of Hadoop. The matrix below, from Kirill Grigorchuk’s summary of Altoros’s Hadoop Distributions: Cloudera vs. Hortonworks vs. MapR paper, doesn’t seem to confirm this.

    Hadoop distros compared: Cloudera vs Hortonworks vs MapR

  3. MapR says it is committed to open source. I’ve checked the list of committers for Apache Hadoop, Apache HBase, Apache Pig, and Apache ZooKeeper and except Ted Dunning’s PMC role in Apache ZooKeeper, I couldn’t find any MapR employee listed.

Original title and link: Three questions about MapR and their products. (NoSQL database©myNoSQL)

The Forrester Wave for Hadoop market

Update: I’d like to thank the people that pointed out in the comment thread that I’ve messed up quite a few aspects in my comments about the report. I don’t believe in taking down posts that have been out for a while, so please be warned that basically this article can be ignored.

Thank you and my apologies for those comments that were a misinterpretation of the report..

This is the Q1 2014 Forrester Wave for Hadoop:

Forrester wave for Hadoop

A couple of thoughts:

  1. Cloudera, Hortonworks, MapR are positioned very (very) close.

    1. Hortonworks is position closer to the top right meaning they report more customers/larger install base
    2. MapR is higher on the vertical axis meaning that MapR’s strategy is slightly better.

      For me, MapR’s strategy can be briefly summarized as:

      1. address some of the limitations in the Hadoop ecosystem
      2. provide API-compatible products for major components of the Hadoop ecosystem
      3. use these Apache product (trade marked) names to advertise their products

      I think the 1st point above explains the better positioning of MapR’s current offering.

    3. Even if Cloudera has been the first pure-play Hadoop distribution it’s positioned behind behind both Hortonworks and MapR.

  2. IBM has the largest market presence. That’s a big surprise as I’m very rarely hearing clear messages from IBM.

  3. IBM and Pivotal Software are considered to have the strongest strategy. That’s another interesting point in Forrester’s report. Except the fact that IBM has a ton of data products and that Pivotal Software is offering more than Hadoop, I don’t know what exactly explains this position.

    The Forrester report Strategy positioning is based on quantifying the following categories: Licensing and pricing, Ability to execute, Product road map, Customer support. IBM and Pivotal are ranked the first in all these categories (with maximum marks for the last 3). As a comparison Hortonworks has 3/5 for Ability to execute — this must be related only to budget; Cloudera has 3/5 for both Ability to execute and Customer support.

    Pivotal is the 3rd last in terms of current offering. I guess my hypothesis for ranking Pivotal as 1st in terms of strategy is wrong.

  4. Microsoft who through the collaboration with Hortonworks came up with HDInsight, which basically enabled Hadoop for Excel and its data warehouse offering, it positioned the 2nd last on all 3 axes.

    No one seems to love Microsoft anymore.

  5. While not a pure Hadoop player, DataStax has been offering the DataStax Enterprise platform that includes support for analytics through Hadoop and search through Solr for at least 2 years. That’s actually way before anyone else from the group of companies in the Forrester’s report had anything similar1.

    This report focuses only on “general-purpose Hadoop solutions based on a differentiated, commercial Hadoop distribution”.

You can download the report after registering on Hortonwork’s site: here.

  1. DataStax is my employer. But what I wrote is a pure fact. 

Original title and link: The Forrester Wave for Hadoop market (NoSQL database©myNoSQL)

MapR product strategy

Maria Deutscher (SiliconAngle) quoting MapR CMO Jack Norris:

The MapR strategy centers on what chief marketing officer Jack Norris described in an interview as a “proven business model of really focusing on a product, selling a product, making a product enterprise grade, utilizing the innovations of the community but providing some [additional] advantages so customers can be even more successful.”

I thought that a part of a proven business is innovating on the product and less so utilizing the innovations of the community. Or at least finding some ways to paying back for those community innovations.

Original title and link: MapR product strategy (NoSQL database©myNoSQL)


Hadoop Buyer's Guide

Alan Gardner reads a marketing material about Hadoop choices:

…this guide is specifically designed to be incorporated into your RFP when it comes to evaluating Hadoop platforms. - Hadoop Buyer’s Guide, page 1

The Guide makes some bold promises right from page one. Not only will it literally write your RFP, but it will also explain “… why selecting a Hadoop platform is so vital”. Ostensibly the alternative, a Hadoop quantum superposition, is difficult and costly to maintain at room temperature.

I have always wondered who’s the target audience of these pseudo-technical marketing materials. Moreover, I’ve always wondered if there’s a single person that made a decision based on such a thing1.

  1. I really cannot call this a (white)paper

Original title and link: Hadoop Buyer’s Guide (NoSQL database©myNoSQL)


NoSQL and Full Text Indexing: Two Trends

On one side:

  1. DataStax with Solr
  2. MapR with LucidWorks Search (nb: Solr)

and on the other side:

  1. Riak Searching: Solr-like but custom prioprietary implementation
  2. MongoDB text search: custom prioprietary implementation

I’m not going to argue about the pros and cons of each of these approaches, but I’m sure you already know which of these approaches I’m in favor of.

Original title and link: NoSQL and Full Text Indexing: Two Trends (NoSQL database©myNoSQL)

Hadoop and Canonical Bring MapR to Ubuntu

Some announcements from MapR about “MapR and Canonical bringing Hadoop Support to Ubuntu“:

First, MapR is partnering with Canonical, the organization behind the Ubuntu operating system, to package and make available for download an integrated offering of MapR Distibution with Ubuntu. The free MapR M3 Edition includes HBase, Pig, Hive, Mahout, Cascading, Sqoop, Flume and other Hadoop support tools. MapR is the only distribution that enables Linux applications and commands to access data directly in the cluster via the NFS interface that is available with all MapR Editions.

As far as I know, Apache Hadoop works just fine on Ubuntu. And there was already a partnership between Cloudera and Canonical to bring Hadoop to Ubuntu. So, I guess my title might be more accurate.

Original title and link: Hadoop and Canonical Bring MapR to Ubuntu (NoSQL database©myNoSQL)


MapR Raises $30mil in Series C

Where is MapR today?

  1. MapR raised a total of $59mil.
  2. According to John Schroeder (CEO) “92% of MapR customers pay primarely for licenses and not for ancillary services and support”.
  3. According to Wikibon, MapR had $23mil. revenue in 2012, 49% of which coming from services (nb: this seem to contradict the above point)
  4. Support for MapR installations is offered by Accenture and Booz Allen Hamilton

How will MapR use the new capital?

With the new funding, the company plans to invest in research & development, and expand into Asia.

How is MapR seeing its competitors?

John Schroeder (CEO):

“Our competitors’ model is very cash intensive and you have to wonder whether or not they’ll ever be cash-flow positive”.

Cloudera has raised until now $141mil:

  1. Series A: $5mil
  2. Series B: $6mil
  3. Series C: $25mil
  4. Series D: $40mil
  5. Series E: $65mil

According to this, Cloudera raised $36mil in the first 3 rounds. I couldn’t find any official data about the capital raised by Hortonworks, but the number I’ve seen in a couple of places is $50mil. So far MapR raised $59mil.

Sources for these bits:

Original title and link: MapR Raises $30mil in Series C (NoSQL database©myNoSQL)

How Does MapR Compare to Cloudera?

Staying in the MapR land, the question of comparing MapR to Cloudera is answered by people from all sides (MapR, Cloudera and Hortonworks). My summary: “cool proprietary technology addressing some of the current limitations of the Hadoop, but also missing some of the features the Hadoop community has come up with”.

Original title and link: How Does MapR Compare to Cloudera? (NoSQL database©myNoSQL)


Hadoop: What Matters Are Open and Standardized Interfaces

Michael Hausenblas (MapR) about the topic of the day: “Hadoop distributions”, about which I’ve already linked to Steve Loughran’s If There Is a Problem in the Hadoop JARs, How Are You Going to Fix It?, Merv Adrian’s Open Source “Purity”, Hadoop, and Market Realities and Matthew Aslett’s What It Means to Be “all In” on Hadoop:

One aspect I’d like to highlight is the importance of ‘standard’ interfaces, defined through community consensus, and enforced by the Apaches and the likes.I think it makes perfect sense to offer a commercial implementation that is superior to the implementation you get ‘for free’ — as long as you’re 100% compatible with the community-defined standard.

Here’s something I don’t understand about the above. The “Defining Hadoop wiki page” dedicates a complete paragraph to compatibility. The most important and relevant part of it is:

Other entities may claim that other products (including derivative works) are compatible with Apache Hadoop. The Apache Hadoop development team is not a standards body, and cannot confirm or deny such assertions. All that we can say is “there is no official certification that a product is compatible with Hadoop, other than when a release of the Apache source tree is declared a new release of Apache Hadoop itself”.

Going back to MapR’s post my question is: if the Apache Hadoop project doesn’t offer a certification toolkit and the project team doesn’t validate the compatibility, what exactly does it mean to be “100% compatible” with something that can change any time and is completely out of your control?

Original title and link: Hadoop: What Matters Are Open and Standardized Interfaces (NoSQL database©myNoSQL)


How Many Hadoops?

The short answer is there is only one Apache Hadoop distribution.

The long answer is that there are many distributions that include Apache Hadoop or are claiming compatibility with Apache Hadoop.

The oldest and probably most popular: Cloudera’s Distribution of Hadoop (CDH)

The 100% open source: Hortonworks Data Platform.

The prioprietary: MapR.

The blue one: IBM InfoSphere BigInsights.

The latest: WANdisco Hadoop WDD, Intel Distribution of Hadoop and Pivotal HD from EMC Greenplum.

There’s also the version Facebook’s running on their cluster which includes Facebook Corona: a different approach to job scheduling and resource management.

But this list is not complete as it doesn’t include appliances featuring Hadoop. In this category we have:

  1. Oracle’s Big Data appliance featuring Cloudera’s Distribution of Hadoop
  2. Netapp’s Hadooplers
  3. EMC Greenplum DCA
  4. Teradata Aster Discovery Platform featuring Hortonworks’s Hadoop Data Platform
  5. Data Direct Networks (DDN)

I hope I didn’t miss any important ones1. As a conclusion for this list, my question is: who is actually benefiting from all these distributions?

  1. I left aside for now Hadoop-as-a-Service.  

Original title and link: How Many Hadoops? (NoSQL database©myNoSQL)

Hadoop Business Ecosystem as of January 2013

As I was hoping and expecting, Datameer updated the chart visualizing Hadoop’s business side ecosystem:


It shouldn’t be a surprise to anyone that the top most connected companies in the Hadoop space are Cloudera and Hortonworks. They outrank the IT industry mammoths: IBM, HP, Microsoft, Oracle, SAP, etc.

Original title and link: Hadoop Business Ecosystem as of January 2013 (NoSQL database©myNoSQL)


MapR’s New Partnership With Drawn to Scale

MapR is definitely up to some interesting partnerships. Last year it announced a partnership with EMC for Greenplum HD Enterprise Edition, then this year MapR became available on Amazon Elastic MapReduce and Google Compute Engine. And today MapR and Drawn to Scale, creator of the real-time database for Hadoop Spire, are announcing a new partnership.

Bradford Stephens (CEO, Drawn to Scale):

MapR provides the fastest, most reliable Hadoop for our customers. We are thrilled to work with MapR to deliver M3 as part of Spire as the first real-time database for Hadoop.

Jack Norris (VP of marketing, MapR Technologies):

Real-time SQL on Hadoop is a big gap in the market that is addressed by Spire. Spire is a complementary solution to our products and it made sense to work with Drawn to Scale to make it easier for customers to deploy M3, pre-integrated with Spire, for real-time SQL-based workloads.

It might sound strange coming from me, but MapR is making quite some big steps towards becoming the de facto standard for Hadoop. I’m looking forward to seeing the reactions from Cloudera and Hortonworks.

Original title and link: MapR’s New Partnership With Drawn to Scale (NoSQL database©myNoSQL)