ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

IBM: All content tagged as IBM in NoSQL databases and polyglot persistence

The Forrester Wave for Hadoop market

Update: I’d like to thank the people that pointed out in the comment thread that I’ve messed up quite a few aspects in my comments about the report. I don’t believe in taking down posts that have been out for a while, so please be warned that basically this article can be ignored.

Thank you and my apologies for those comments that were a misinterpretation of the report..


This is the Q1 2014 Forrester Wave for Hadoop:

Forrester wave for Hadoop

A couple of thoughts:

  1. Cloudera, Hortonworks, MapR are positioned very (very) close.

    1. Hortonworks is position closer to the top right meaning they report more customers/larger install base
    2. MapR is higher on the vertical axis meaning that MapR’s strategy is slightly better.

      For me, MapR’s strategy can be briefly summarized as:

      1. address some of the limitations in the Hadoop ecosystem
      2. provide API-compatible products for major components of the Hadoop ecosystem
      3. use these Apache product (trade marked) names to advertise their products

      I think the 1st point above explains the better positioning of MapR’s current offering.

    3. Even if Cloudera has been the first pure-play Hadoop distribution it’s positioned behind behind both Hortonworks and MapR.

  2. IBM has the largest market presence. That’s a big surprise as I’m very rarely hearing clear messages from IBM.

  3. IBM and Pivotal Software are considered to have the strongest strategy. That’s another interesting point in Forrester’s report. Except the fact that IBM has a ton of data products and that Pivotal Software is offering more than Hadoop, I don’t know what exactly explains this position.

    The Forrester report Strategy positioning is based on quantifying the following categories: Licensing and pricing, Ability to execute, Product road map, Customer support. IBM and Pivotal are ranked the first in all these categories (with maximum marks for the last 3). As a comparison Hortonworks has 3/5 for Ability to execute — this must be related only to budget; Cloudera has 3/5 for both Ability to execute and Customer support.

    Pivotal is the 3rd last in terms of current offering. I guess my hypothesis for ranking Pivotal as 1st in terms of strategy is wrong.

  4. Microsoft who through the collaboration with Hortonworks came up with HDInsight, which basically enabled Hadoop for Excel and its data warehouse offering, it positioned the 2nd last on all 3 axes.

    No one seems to love Microsoft anymore.

  5. While not a pure Hadoop player, DataStax has been offering the DataStax Enterprise platform that includes support for analytics through Hadoop and search through Solr for at least 2 years. That’s actually way before anyone else from the group of companies in the Forrester’s report had anything similar1.

    This report focuses only on “general-purpose Hadoop solutions based on a differentiated, commercial Hadoop distribution”.

You can download the report after registering on Hortonwork’s site: here.


  1. DataStax is my employer. But what I wrote is a pure fact. 

Original title and link: The Forrester Wave for Hadoop market (NoSQL database©myNoSQL)


From IBM to… IBM: The short, but complicated history of CouchDB, Cloudant, and a lot of other companies and projects

Damien Katz created CouchDB after working at IBM on Lotus Notes: CouchDB and Me. CouchDB went the Apache way. Then things got complicated…

On the West coast, Damien Katz and a team of committers created Couchio, later renamed to CouchOne, later merged with Membase to become Couchbase, which finally dropped CouchDB. Damien Katz left Couchbase.

A confusing history with a very complicated genealogy of projects (don’t worry, this goes on) and companies. And this was only West Coast.

East Coast, Cloudant took CouchDB and made it BigCouch. I thought that Cloudant will be the CouchDB company — and in a way it was. Cloudant put BigCouch on the cloud as a service and on GitHub as open source. BigCouch is supposed to get back into Apache CouchDB, but many months later this hasn’t materialized yet.

To complete the circle, today IBM announced signing an agreement to acquire Cloudant — news coverage on GigaOm, BostInno, TechCrunch. Which probably makes sense considering Cloudant’s relationship with SoftLayer and IBM’s $1 billion Platform-as-a-Service Investment, but less so if you consider the IBM and 10genMongoDB collaboration.

Anyways, the future of Apache CouchDB is bright. Yep.

Original title and link: From IBM to… IBM: The short, but complicated history of CouchDB, Cloudant, and a lot of other companies and projects (NoSQL database©myNoSQL)


IBM and 10gen are collaborating on a standard that would make it easier to write applications that can access data from both MongoDB and relational systems such as IBM DB2

The details are pretty confusing1

[…] the new standard — which encompasses the MongoDB API, data representation (BSON), query language and wire protocol — appears to be all about establishing a way for mobile and other next-generation applications to connect with enterprise database systems such as IBM’s popular DB2 database and its WebSphere eXtreme Scale data grid.

But the juicy part is in the comments; if you can ignore the pitches.


  1. if this is a new standard and it is all based on the already existing MongoDB API, BSON, and wire protocol, then 1) what’s new about it and 2) what exactly will make it a standard

Original title and link: IBM and 10gen are collaborating on a standard that would make it easier to write applications that can access data from both MongoDB and relational systems such as IBM DB2 (NoSQL database©myNoSQL)

via: http://gigaom.com/2013/06/04/ibm-throws-its-weight-behind-mongodb-for-mobile-apps/


IBM Accelerates Its Big Data Portfolio

Jeff Kelly takes a look at IBM’s data solutions portfolio:

IBM has the broadest and deepest Big Data product and services portfolio in the industry, as well as the market leading revenue to show for it. But IBM’s greatest asset also lies at the heart of its biggest challenge. With such a diverse set of Big Data capabilities, IBM has struggled to unify them into distinct, compelling offerings. How IBM responds to the challenge of bringing together such a broad and deep set of technologies and services - many the result of $16 billion worth of analytics-related acquisitions since 2005 - into consumable and effective product offerings will largely determine the company’s success (or failure) in the Big Data space and will have major implications for enterprise CIOs.

There are two things that I’m not sure I understand:

  1. is it a known strategy leading to more sales to have a confusing portfolio of products?

    Basically you offer so many products that a customer will be so confused that he’ll have to hire your consultant to make the buying recommendation decision.

  2. when ranking companies by sales, wouldn’t make more sense to compare revenue/employee than raw numbers?

    Which company is better? A company with 2 sales people generating $1mil in revenue or a company with 100 sales people and 100 consultants generating $20mil?

Original title and link: IBM Accelerates Its Big Data Portfolio (NoSQL database©myNoSQL)

via: http://wikibon.org/wiki/v/IBM_Accelerates_Its_Big_Data_Portfolio


Paper: M3R - Increased Performance for In-Memory Hadoop Jobs

For the weekend reads, a paper authored by a reseach team from IBM:

Main Memory Map Reduce (M3R) is a new implementation of the Hadoop Map Reduce (HMR) API targeted at online analytics on high mean-time-to-failure clusters. It does not support resilience, and supports only those workloads which can fit into cluster memory. In return, it can run HMR jobs unchanged — including jobs produced by compilers for higher-level languages such as Pig, Jaql, and SystemML and interactive front-ends like IBM BigSheets — while providing significantly better performance than the Hadoop engine on several workloads (e.g. 45x on some input sizes for sparse matrix vector multiply). M3R also supports extensions to the HMR API which can enable Map Reduce jobs to run faster on the M3R engine, while not affecting their perfor- mance under the Hadoop engine.


How Many Hadoops?

The short answer is there is only one Apache Hadoop distribution.

The long answer is that there are many distributions that include Apache Hadoop or are claiming compatibility with Apache Hadoop.

The oldest and probably most popular: Cloudera’s Distribution of Hadoop (CDH)

The 100% open source: Hortonworks Data Platform.

The prioprietary: MapR.

The blue one: IBM InfoSphere BigInsights.

The latest: WANdisco Hadoop WDD, Intel Distribution of Hadoop and Pivotal HD from EMC Greenplum.

There’s also the version Facebook’s running on their cluster which includes Facebook Corona: a different approach to job scheduling and resource management.

But this list is not complete as it doesn’t include appliances featuring Hadoop. In this category we have:

  1. Oracle’s Big Data appliance featuring Cloudera’s Distribution of Hadoop
  2. Netapp’s Hadooplers
  3. EMC Greenplum DCA
  4. Teradata Aster Discovery Platform featuring Hortonworks’s Hadoop Data Platform
  5. Data Direct Networks (DDN)

I hope I didn’t miss any important ones1. As a conclusion for this list, my question is: who is actually benefiting from all these distributions?


  1. I left aside for now Hadoop-as-a-Service.  

Original title and link: How Many Hadoops? (NoSQL database©myNoSQL)


The Three Pillars of Data-Based Computing: SQL, Hadoop And

IBM’s Arvind Krishna in an interview for The Register:

Krishna said he sees the potential for three pillars of data-based computing: SQL – to give a language and syntax for programming; Hadoop – to provide a MapReduce semantic; and a third pillar which is yet to be decided upon. That could be a MongoDB or HBase, but the market will pick a winner. “There’s a whole set: one will survive,” Krishna said.

I’m pretty sure that last part (i.e. “that could be MongoDB or HBase”) is a mis-quote as the rest of what Krishna is saying makes a lot of sense:

“Wherever open source is mature I will leverage it; I won’t compete with it. To believe one can be monolithic, proprietary and closed and … succeed is a foolish proposition. One has to embrace open source and work with an ecosystem. Clients are looking to you to add value.”

Original title and link: The Three Pillars of Data-Based Computing: SQL, Hadoop And (NoSQL database©myNoSQL)

via: http://www.theregister.co.uk/2012/04/05/ibm_arvind_krishna/


IBM: Behind the Buzz About NoSQL

Mature database management systems like DB2 also offer advantages like high availability and data compression that the newer NoSQL systems have not had time to develop.

Misinform your customers to save them the trouble of discovering alternative solutions.

Original title and link: IBM: Behind the Buzz About NoSQL (NoSQL database©myNoSQL)

via: http://ibmdatamag.com/2012/03/behind-the-buzz-about-nosql/


Netezza Query History Table

Using Netezza’s in-database analytics package FPGROWTH, database administrators can identify the most commonly used combination of tables and the performance of the queries that reference those sets of tables.

Nice feature. Sort of the rich men’s all-included slow query log in MySQL. Do you know if other databases support a similar feature?

Original title and link: Netezza Query History Table (NoSQL database©myNoSQL)

via: http://netezzaadmin.wordpress.com/2012/03/16/ibm-netezza-analytics-to-analyze-query-history-table-usage/


IBM Debuts Netezza Customer Intelligence Appliance

A new motto could be “An appliance for every vertical”. IBM Netezza’s first is for retailers.

Original title and link: IBM Debuts Netezza Customer Intelligence Appliance (NoSQL database©myNoSQL)


12 Hadoop Vendors to Watch in 2012

My list of 8 most interesting companies for the future of Hadoop didn’t try to include anyone having a product with the Hadoop word in it. But the list from InformationWeek does. To save you 15 clicks, here’s their list:

  • Amazon Elastic MapReduce
  • Cloudera
  • Datameer
  • EMC (with EMC Greenplum Unified Analytics Platform and EMC Data Computing Appliance)
  • Hadapt
  • Hortonworks
  • IBM (InfoSphere BigInsights)
  • Informatica (for HParser)
  • Karmasphere
  • MapR
  • Microsoft
  • Oracle

Original title and link: 12 Hadoop Vendors to Watch in 2012 (NoSQL database©myNoSQL)


Partnerships in the Hadoop Market

Just a quick recap:

Amazon doesn’t partner with anyone for their Amazon Elastic Map Reduce. And IBM is walking alone with the software-only InfoSphere BigInsights.

Original title and link: Partnerships in the Hadoop Market (NoSQL database©myNoSQL)