Teradata: All content tagged as Teradata in NoSQL databases and polyglot persistence
Simply put, Hadoop becomes the staging area for “raw data streams” while the EDW stores data from “operational systems”. Hadoop then analyzes the raw data and shares the results with the EDW. […] The paper then positions Hadoop as an active archive. I like this idea very much. Hadoop can store archived data that is only accessed once a month or once a quarter or less often.. and that data can be processed directly by Hadoop programs or shared with the EDW data using facilities such as Teradata’s SQL-H, or Greenplum’s External Hadoop tables (not by HAWQ, though… see here), or by other federation engines connected to HANA, SQL Server, Oracle, etc.
It’s an interesting positioning of Hadoop. And it’s very similar to the approach Linux has taken when penetrating the walls of enterprises. Then it slowly replaced pretty much everything.
In the early days—we are still in those days, the EDW vendors could still believe this story: Hadoop is complicated and meant for batch processing and it lacks the tools and refinements built over years in EDW.
But the story is starting to change. Fast. Hadoop is becoming more of a platform (YARN), it gets support for (almost) real-time querying (Impala, Project Stinger, HAWQ, just to name a few), and Hadoop leaders are signing partnerships with challengers and incumbents of the big data market at a rate that I don’t think I’ve seen before.
In the end, guess who will become the pillar of the big data platforms: the solution storing all the data or those tools being able to process, indeed very fast and with much control, limited amounts of that data?
✚ The Cloudera-Teradata paper titled “Hadoop and the Data Warehouse: When to Use Which” can be found here.
Original title and link: Hadoop and the EDW ( ©myNoSQL)
In the series of Big Data for C-Suites, here’s a video from Teradata:
Notice how this one focuses on two dimensions only: keywords and Teradata. For now Hortonworks’s Big Data and Hadoop for C-Suites resonates better with me.
Original title and link: Big Data for C-Suites: Teradata and Big Data the Best Decision Possible ( ©myNoSQL)
The Cloudera deal from September 2010 provided a pipe from a Hadoop cluster into the Teradata data warehouses, while the Hortonworks partnership announced today is providing a pipe between Hadoop and Aster Data appliances.
Hortonworks and Teradata will do joint marketing and development, and are exploring ways to better integrate their respective software. This will specifically be done on Data Platform 1.0 from Hortonworks and Aster Database 5.0 from Teradata. Future engineering work could include running the HortonWorks and Aster Data programs on the same physical clusters, side-by-side, although this is not the way customers tend to do it today, according to Argyros.
Original title and link: More Details About the Teradata and Hortonworks Partnership ( ©myNoSQL)
Teradata sells software, hardware, and services for data warehouses and analytic applications. Part of the Teradata portfolio is also the Teradata Aster MapReduce Platform a massively parallel processing infrastructure with a software solution that embeds both SQL and MapReduce analytic processing for deeper analytic insights on multi-structured data and new analytic capabilities driven by data science.
Hortonworks offers services around the 100% Apache-licensed, open source Hortonworks Data Platform, an integrated solution built around Hadoop.
The interesting bits from the announcement and media coverage:
Teradata and Hortonworks will join forces to provide technologies and strategic guidance to help businesses build integrated, transparent, enterprise-class big data analytic solutions that leverage Apache Hadoop. The partnership will focus on enabling businesses to use Apache Hadoop to harness the value from new sources of data. Businesses will be able to quickly load and refine multi-structured data, some of which is being discarded today, for discovery and analytics. The resulting insights will enable analysts and front line users to make the best business decision possible.
For example, each day websites generate many terabytes of raw, complex data about customers’ viewing and buying habits. These web logs can be directly loaded into Teradata Aster or Apache Hadoop where they can be stored, transformed, and refined in preparation for analysis by the Teradata Aster MapReduce platform (nb: my emphasis).
The company [Teradata] has already worked with Hortonworks’ competitor Cloudera on a connector between the Teradata Database and Cloudera’s Hadoop distribution, but the Hortonworks deal appears a little deeper and more strategic.
The alliance between Teradata and Hortonworks means that companies can get strategic advice about how to get into the new analytics game from Teradata, and have practical help on running the systems from Hortonworks.
However, there are two important challenges that need to be addressed before broad enterprise adoption can occur:
- Understanding the right use cases in which to utilize Apache Hadoop.
- Integrating Apache Hadoop with existing data architectures in an appropriate manner to get better value from existing investments.
My sense of excitement about the Teradata/Hortonworks partnership is amplified by the fact that it addresses these two core challenges for Apache Hadoop:
- We will be rolling out a reference architecture that provides guidance to enterprises that want to understand the best use cases for which to apply Hadoop. As part of that, we will be helping Teradata customers use Hadoop in conjunction with their Teradata and Teradata Aster analytic data solutions investments.
- We will also be working closely with the Teradata engineering teams on jointly engineered solutions that optimize the integration points with Apache Hadoop.
From Hortonworks perspective this deal is weaker than the Oracle-Cloudera deal.
In the former case, new Teradata sales do not necessary result in new Hortonworks Data Platform installations, while in the case of the Oracle-Cloudera partnership, every sale results in a new business for Cloudera.
From Teradata perspective, this partnership gives them a perfect answer and solution for clients asking about unstructured data scenarios.
Depending on the level of integration the two team will pull together, this partnership might result in one of the most complete and powerful structured and unstructured data warehouse and analytics platform.
I’m looking forward to seeing the proposed architecture blueprint once it’s finalized.
- terradata.com: Teradata-Hortonworks Partnership to Accelerate Business Value from Big Data Technologies
- hortonworks.com: The Importance of the Teradata & Hortonworks Partnership
- The Data Blog: Aster Data Blog » Blog Archive » Perspectives on Teradata-Hortonworks Partnership
- Bits NYTimes.com:Teradata and Hortonworks Join Forces for a Big Data Boost
- GigaOM: Teradata taps Hortonworks to improve Hadoop story
- ServicesANGLE: Hortonworks Announces Partnership with Teradata
Original title and link: Teradata and Hortonworks Partnership and What It Means ( ©myNoSQL)
Shawn Rogers has a short but compelling list of Big Data deployments in his article Big Data is Scaling BI and Analytics. This list also shows that even if there are some common components like Hadoop, there are no blueprints yet for dealing with Big Data.
Facebook: Hadoop analytic data warehouse, using HDFS to store more than 30 petabytes of data. Their Big Data stack is based only on open source solutions.
Quantcast: 3,000 core, 3,500 terabyte Hadoop deployment that processes more than a petabyte of raw data each day
University of Nebraska-Lincoln: 1.6 petabytes of physics data Hadoop cluster
Yahoo!: 100,000 CPUs in 40,000 computers, all running Hadoop. Also running a 12 terabyte MOLAP cube based on Tableau Software
eBay: has 3 separate analytics environments:
- 6PB data warehouse for structured data and SQL access
- 40PB deep analytics (Teradata)
- 20PB Hadoop system to support advanced analytic workload on unstructured data
Original title and link: Big Data Is Going Mainstream: Facebook, Yahoo!, eBay, Quantcast, and Many Others ( ©myNoSQL)