NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Teradata: All content tagged as Teradata in NoSQL databases and polyglot persistence

The Forrester Wave for Hadoop market

Update: I’d like to thank the people that pointed out in the comment thread that I’ve messed up quite a few aspects in my comments about the report. I don’t believe in taking down posts that have been out for a while, so please be warned that basically this article can be ignored.

Thank you and my apologies for those comments that were a misinterpretation of the report..

This is the Q1 2014 Forrester Wave for Hadoop:

Forrester wave for Hadoop

A couple of thoughts:

  1. Cloudera, Hortonworks, MapR are positioned very (very) close.

    1. Hortonworks is position closer to the top right meaning they report more customers/larger install base
    2. MapR is higher on the vertical axis meaning that MapR’s strategy is slightly better.

      For me, MapR’s strategy can be briefly summarized as:

      1. address some of the limitations in the Hadoop ecosystem
      2. provide API-compatible products for major components of the Hadoop ecosystem
      3. use these Apache product (trade marked) names to advertise their products

      I think the 1st point above explains the better positioning of MapR’s current offering.

    3. Even if Cloudera has been the first pure-play Hadoop distribution it’s positioned behind behind both Hortonworks and MapR.

  2. IBM has the largest market presence. That’s a big surprise as I’m very rarely hearing clear messages from IBM.

  3. IBM and Pivotal Software are considered to have the strongest strategy. That’s another interesting point in Forrester’s report. Except the fact that IBM has a ton of data products and that Pivotal Software is offering more than Hadoop, I don’t know what exactly explains this position.

    The Forrester report Strategy positioning is based on quantifying the following categories: Licensing and pricing, Ability to execute, Product road map, Customer support. IBM and Pivotal are ranked the first in all these categories (with maximum marks for the last 3). As a comparison Hortonworks has 3/5 for Ability to execute — this must be related only to budget; Cloudera has 3/5 for both Ability to execute and Customer support.

    Pivotal is the 3rd last in terms of current offering. I guess my hypothesis for ranking Pivotal as 1st in terms of strategy is wrong.

  4. Microsoft who through the collaboration with Hortonworks came up with HDInsight, which basically enabled Hadoop for Excel and its data warehouse offering, it positioned the 2nd last on all 3 axes.

    No one seems to love Microsoft anymore.

  5. While not a pure Hadoop player, DataStax has been offering the DataStax Enterprise platform that includes support for analytics through Hadoop and search through Solr for at least 2 years. That’s actually way before anyone else from the group of companies in the Forrester’s report had anything similar1.

    This report focuses only on “general-purpose Hadoop solutions based on a differentiated, commercial Hadoop distribution”.

You can download the report after registering on Hortonwork’s site: here.

  1. DataStax is my employer. But what I wrote is a pure fact. 

Original title and link: The Forrester Wave for Hadoop market (NoSQL database©myNoSQL)

Hadoop and Teradata’s business

Earlier today I’ve posted about Teradata’s take on the evolution of databases. As expected, everything is safe and under control. Now this report from Larry Dignan for ZDNet about Teradata Q4 earnings call presents Teradata’s perspective about Hadoop:

Teradata’s fourth quarter earnings were solid, but analysts peppered management with questions about Hadoop as data warehouse revenue worries persist.

Teradata CEO Mike Koehler and CFO Steve Scheppmann talked Hadoop throughout the company’s conference call. Was Hadoop taking Teradata’s business away? What’s the revenue hit? Can Teradata co-exist?

Once again everything is safe with a bright future. Until it isn’t anymore and Hadoop eats the enterprise data warehouse space. In Teradata’s defense, they’ve been one of the first companies that has looked seriously at Hadoop and came up with a coherent positioning.

Original title and link: Hadoop and Teradata’s business (NoSQL database©myNoSQL)

Stream Processors, DBMS Persistence, and Trends of Memory Costs

Mike Hogan in a post about stream processors and the actual trends of DRAM costs:

Some might argue that the trend is their friend, because DRAM is getting cheaper. Well DRAM prices have dropped about 33% per year, until 2012 when they started flat-lining and actually increasing.


This can actually sustain my scenario about Teradata and Pivotal’s enterprise data warehouse market ascension compared to pure in-memory solutions.

Original title and link: Stream Processors, DBMS Persistence, and Trends of Memory Costs (NoSQL database©myNoSQL)


Aster Data, HAWQ, GPDB and the First Hadoop Squeeze

Rob Klopp:

But there are three products, the Greenplum database (GPDB), HAWQ, and Aster Data, that will be squeezed more quickly as they are positioned either in between the EDW and Hadoop… or directly over Hadoop. In this post I’ll explain what I suspect Pivotal and Teradata are trying to do… why I believe their strategy will not work for long… and why readers of this blog should be careful moving forward.

This is a very interesting analysis of the enterprise data warehouse market. There’s also a nice visualization of this prediction:


Here’s an alternative though. As showed in the picture above, the expansion of in-memory databases’ depends heavily on the evolution of the price of memory. It’s hard to argument against price predictions or Moore’s law. But accidents even if rare are still possible. Any significant change in the trend of memory costs, or other hardware market conditions (e.g. an unpredicted decrease of the price for SSDs), could give Teradata and Pivotal the extra time/conditions to break into advanced hybrid storage solutions that would offer slightly less fast but also less expensive products than their competitors’ in-memory databases.

Original title and link: Aster Data, HAWQ, GPDB and the First Hadoop Squeeze (NoSQL database©myNoSQL)


Hadoop will be made better through engineering

Dan Woods prefacing an interview with Scott Gnau of Teradata:

In this vision, because Hadoop can store unstructured and structured information, because it can scale massively, because it is open source, because it allows many forms of analysis, because it has a thriving ecosystem, it will become the one repository to rule them all.

In my view, the most extreme advocates for Hadoop need to sober up and right size both expectations and time frames. Hadoop is important but it won’t replace all other repositories. Hadoop will change the world of data, but not in the next 18 months. The open source core of Hadoop is a masterful accomplishment, but like many open source projects, it will be made better through engineering.

You have to agree: there’s no engineering behind Hadoop. Just a huge number of intoxicated… brogrammers.

Original title and link: Hadoop will be made better through engineering (NoSQL database©myNoSQL)


Teradata: Hadoop, big data technologies 'small factor' in our slowdown

Larry Dignan for ZDNet reporting from Teradata‘s quarterly earnings call:

Teradata on Thursday moved to shoot down the theory that Hadoop and open source big data technologies are putting the kibosh on data warehouse rollouts.

The explanation offered for the slowdown:

The major contributor to our reduced revenue guidance for 2013 was the number of data warehouse opportunities that have moved out into 2014 with a large amount of that happening in the US where the pent-up demand in our user base that we expected to see in the second half has not materialized yet.

Wondering what‘s the real reason for not closing these deals. Maybe, just maybe, it’s those customers that decided to spend a bit more time learning about new technologies before writing the big checks.

Original title and link: Teradata: Hadoop, big data technologies ‘small factor’ in our slowdown (NoSQL database©myNoSQL)


Hadoop and the EDW

Rob Klopp summarizes a whitepaper published by Cloudera and Teradata:

Simply put, Hadoop becomes the staging area for “raw data streams” while the EDW stores data from “operational systems”. Hadoop then analyzes the raw data and shares the results with the EDW. […] The paper then positions Hadoop as an active archive. I like this idea very much. Hadoop can store archived data that is only accessed once a month or once a quarter or less often.. and that data can be processed directly by Hadoop programs or shared with the EDW data using facilities such as Teradata’s SQL-H, or Greenplum’s External Hadoop tables (not by HAWQ, though… see here), or by other federation engines connected to HANA, SQL Server, Oracle, etc.

It’s an interesting positioning of Hadoop. And it’s very similar to the approach Linux has taken when penetrating the walls of enterprises. Then it slowly replaced pretty much everything.

In the early days—we are still in those days, the EDW vendors could still believe this story: Hadoop is complicated and meant for batch processing and it lacks the tools and refinements built over years in EDW.

But the story is starting to change. Fast. Hadoop is becoming more of a platform (YARN), it gets support for (almost) real-time querying (Impala, Project Stinger, HAWQ, just to name a few), and Hadoop leaders are signing partnerships with challengers and incumbents of the big data market at a rate that I don’t think I’ve seen before.

In the end, guess who will become the pillar of the big data platforms: the solution storing all the data or those tools being able to process, indeed very fast and with much control, limited amounts of that data?

✚ The Cloudera-Teradata paper titled “Hadoop and the Data Warehouse: When to Use Which” can be found here.

Original title and link: Hadoop and the EDW (NoSQL database©myNoSQL)

Nokia’s Big Data Ecosystem: Hadoop, Teradata, Oracle, MySQL

Nokia’s big data ecosystem consists of a centralized, petabyte-scale Hadoop cluster that is interconnected with a 100-TB Teradata enterprise data warehouse (EDW), numerous Oracle and MySQL data marts, and visualization technologies that allow Nokia’s 60,000+ users around the world tap into the massive data store. Multi-structured data is constantly being streamed into Hadoop from the relational systems, and hundreds of thousands of Scribe processes run every day to move data from, for example, servers in Singapore to a Hadoop cluster in the UK. Nokia is also a big user of Apache Sqoop and Apache HBase.

In the coming years you’ll hear more often stories—sales pitches—about single unified platforms solving all these problems at once. But platforms that will survive and thrive are those that will accomplish two things:

  1. keep the data gates open: in and out.
  2. work with different other platform to make this efficiently for users

Original title and link: Nokia’s Big Data Ecosystem: Hadoop, Teradata, Oracle, MySQL (NoSQL database©myNoSQL)


Teradata Deployments:Apple, Walmart, eBay, Verizon, AT&T, BoA

Impressive roster for Teradata. I’d also love to see a list of deployments where Teradata and Hadoop are meeting.

Original title and link: Teradata Deployments:Apple, Walmart, eBay, Verizon, AT&T, BoA (NoSQL database©myNoSQL)


Hadoop Weaknesses and Where Teradata Aster Sees the Big Data Money

An interesting post on Teradata Aster blog which is indirectly emphasizing the weaknesses of the Hadoop platform:

  1. Make platform and tools to be easier to use to manage and curate data. Otherwise, garbage in = garbage out, and you will get garbage analytics.
  2. Provide rich analytics functions out of the box. Each line of programming cuts your reachable audience by 50%.
  3. Provide tools to update or delete data. Otherwise, data consistency will drift away from truth as history accumulates.
  4. Provide applications to leverage data and find answers relevant to business. Otherwise the cost of DIY applications is too high to influence business – and won’t be done.

It’s difficult to argue against these points, but they are not insurmountable. I’d even say that once the operational complexity of Hadoop deployments will get simpler—I think the Apache community, Cloudera, and Hortonworks are already working on these aspects—, Hadoop will see even more adoption and with that contributions addressing points 2 to 4 will follow shortly.

Yet another interesting part of the post is the two “equations” describing the two environments:

big clusters = big administration = big programs = big friction = low influence (Hadoop)
big data = small clusters = easy administration = big analytics = big influence (ideal/Teradata Aster)

I think these are revealing how Teradata Aster is positioning their solutions and where they see themselves making money in the Big Data market. It goes like this: “we can make a lot of money if we offer a platform with lower complexity and operational costs and higher productivity leading to better business results”. This is a sound strategy and the competitors from the Hadoop space should better focus on these same aspects which are essential to wide adoption.

Original title and link: Hadoop Weaknesses and Where Teradata Aster Sees the Big Data Money (NoSQL database©myNoSQL)


Big Data for C-Suites: Teradata and Big Data the Best Decision Possible

In the series of Big Data for C-Suites, here’s a video from Teradata:

Notice how this one focuses on two dimensions only: keywords and Teradata. For now Hortonworks’s Big Data and Hadoop for C-Suites resonates better with me.

Original title and link: Big Data for C-Suites: Teradata and Big Data the Best Decision Possible (NoSQL database©myNoSQL)

Big Data Implications for IT Architecture and Infrastructure

Teradata’s Martin Willcox:

From an IT architecture / infrastructure perspective, I think that the key thing to understand about all of this is that, at least for the foreseeable future, we’ll need at least two different types of “database” technology to efficiently manage and exploit the relational and non-relational data, respectively: an integrated data warehouse, built on an Massively Parallel Processing (MPP) DBMS platform for the relational data, and the relational meta-data that we generate by processing the non-relational data (for example, that a call was made at this date and time, by this customer, and that they were assessed as being stressed and agitated); and another platform for the processing of the non-relational data, that enables us to parallelise complex algorithms - and so bring them to bear on large data-sets - using the MapReduce programming model. Since the value of these data are much greater in combination than in isolation – and because we may be shipping very large volumes of data between the different platforms - considerations of how best to connect and integrate these two repositories become very important.

One of the few corporate blog posts that do not try to position Hadoop (and implicitely MapReduce) in a corner.

This sane perspective could be a validation of my thoughts about the Teradata and Hortwonworks partnership.

Original title and link: Big Data Implications for IT Architecture and Infrastructure (NoSQL database©myNoSQL)