ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Pivotal: All content tagged as Pivotal in NoSQL databases and polyglot persistence

The Forrester Wave for Hadoop market

Update: I’d like to thank the people that pointed out in the comment thread that I’ve messed up quite a few aspects in my comments about the report. I don’t believe in taking down posts that have been out for a while, so please be warned that basically this article can be ignored.

Thank you and my apologies for those comments that were a misinterpretation of the report..


This is the Q1 2014 Forrester Wave for Hadoop:

Forrester wave for Hadoop

A couple of thoughts:

  1. Cloudera, Hortonworks, MapR are positioned very (very) close.

    1. Hortonworks is position closer to the top right meaning they report more customers/larger install base
    2. MapR is higher on the vertical axis meaning that MapR’s strategy is slightly better.

      For me, MapR’s strategy can be briefly summarized as:

      1. address some of the limitations in the Hadoop ecosystem
      2. provide API-compatible products for major components of the Hadoop ecosystem
      3. use these Apache product (trade marked) names to advertise their products

      I think the 1st point above explains the better positioning of MapR’s current offering.

    3. Even if Cloudera has been the first pure-play Hadoop distribution it’s positioned behind behind both Hortonworks and MapR.

  2. IBM has the largest market presence. That’s a big surprise as I’m very rarely hearing clear messages from IBM.

  3. IBM and Pivotal Software are considered to have the strongest strategy. That’s another interesting point in Forrester’s report. Except the fact that IBM has a ton of data products and that Pivotal Software is offering more than Hadoop, I don’t know what exactly explains this position.

    The Forrester report Strategy positioning is based on quantifying the following categories: Licensing and pricing, Ability to execute, Product road map, Customer support. IBM and Pivotal are ranked the first in all these categories (with maximum marks for the last 3). As a comparison Hortonworks has 3/5 for Ability to execute — this must be related only to budget; Cloudera has 3/5 for both Ability to execute and Customer support.

    Pivotal is the 3rd last in terms of current offering. I guess my hypothesis for ranking Pivotal as 1st in terms of strategy is wrong.

  4. Microsoft who through the collaboration with Hortonworks came up with HDInsight, which basically enabled Hadoop for Excel and its data warehouse offering, it positioned the 2nd last on all 3 axes.

    No one seems to love Microsoft anymore.

  5. While not a pure Hadoop player, DataStax has been offering the DataStax Enterprise platform that includes support for analytics through Hadoop and search through Solr for at least 2 years. That’s actually way before anyone else from the group of companies in the Forrester’s report had anything similar1.

    This report focuses only on “general-purpose Hadoop solutions based on a differentiated, commercial Hadoop distribution”.

You can download the report after registering on Hortonwork’s site: here.


  1. DataStax is my employer. But what I wrote is a pure fact. 

Original title and link: The Forrester Wave for Hadoop market (NoSQL database©myNoSQL)


When should I use Greenplum Database versus HAWQ?

Jon Roberts about the use cases for Greenplum and HAWQ, both technologies offered by Pivotal:

Greenplum is a robust MPP database that works very well for Data Marts and Enterprise Data Warehouses that tackles historical Business Intelligence reporting as well as predictive analytical use cases. HAWQ provides the most robust SQL interface for Hadoop and can tackle data exploration and transformation in HDFS.

First questions that popped in my mind:

  1. why isn’t HAWQ good for reporting?
  2. why isn’t HAWQ good for predictive analytics?

I don’t have a good answer for any of these. For the first, I assume that the implied answer is Hadoop’s latency. On the other hand, what I know is that Microsoft and Hortonworks are trying to bring Hadoop data into Excel with HDInsight. This is not traditional reporting, but if that’s acceptable from a latency point of view, I’m not sure why it wouldn’t work for reporting too.

For the second question, Hadoop and the tools built around it are well known for predictive analytics. So maybe this separation is due only to HAWQ. Another explanation could be product positioning.

This last part seems to be confirmed by the rest of the post which is making the point that data stored in HDFS is temporary and once it is processed with HAWQ it is moved into Greenplum.

Greenplum and HAWQ

In other words, HAWQ is just for ETL/ELT on Hadoop.

✚ I’m pretty sure that many traditional data warehouse companies that are forced to come up with coherent proposals for architectures based on their core products and Hadoop are facing the same product positioning problem — it’s difficult to accept in front of the customers that Hadoop might be capable to replace core functionality of the products you are selling.

What is the best answer to this positioning dilemma?

  1. Find a spot for Hadoop that is not hurting your core products. Let’s say ETL.
  2. Propose an architecture where your core products and Hadoop are fully complementing and interacting with each other.

You already know my answer.

Original title and link: When should I use Greenplum Database versus HAWQ? (NoSQL database©myNoSQL)

via: http://www.pivotalguru.com/?p=642


Aster Data, HAWQ, GPDB and the First Hadoop Squeeze

Rob Klopp:

But there are three products, the Greenplum database (GPDB), HAWQ, and Aster Data, that will be squeezed more quickly as they are positioned either in between the EDW and Hadoop… or directly over Hadoop. In this post I’ll explain what I suspect Pivotal and Teradata are trying to do… why I believe their strategy will not work for long… and why readers of this blog should be careful moving forward.

This is a very interesting analysis of the enterprise data warehouse market. There’s also a nice visualization of this prediction:

the-first-squeeze2

Here’s an alternative though. As showed in the picture above, the expansion of in-memory databases’ depends heavily on the evolution of the price of memory. It’s hard to argument against price predictions or Moore’s law. But accidents even if rare are still possible. Any significant change in the trend of memory costs, or other hardware market conditions (e.g. an unpredicted decrease of the price for SSDs), could give Teradata and Pivotal the extra time/conditions to break into advanced hybrid storage solutions that would offer slightly less fast but also less expensive products than their competitors’ in-memory databases.

Original title and link: Aster Data, HAWQ, GPDB and the First Hadoop Squeeze (NoSQL database©myNoSQL)

via: http://robklopp.wordpress.com/2013/12/11/aster-data-hawq-gpdb-and-the-first-hadoop-squeeze/


Pivotal People

I really like the people page on recently announced Pivotal’s website and in particular a couple of the individual pictures:

Pivotal People

✚ I did some digging but I came out empty about the relationship between Pivotal and Sir Tim Berners-Lee and Professor Joseph M. Hellerstein (CEO of Trifacta). Update: according to this post, the page lists people that inspire the Pivotal team.

Original title and link: Pivotal People (NoSQL database©myNoSQL)