Teradata: All content tagged as Teradata in NoSQL databases and polyglot persistence
Wednesday, 22 May 2013
Nokia’s Big Data Ecosystem: Hadoop, Teradata, Oracle, MySQL
Nokia’s big data ecosystem consists of a centralized, petabyte-scale Hadoop cluster that is interconnected with a 100-TB Teradata enterprise data warehouse (EDW), numerous Oracle and MySQL data marts, and visualization technologies that allow Nokia’s 60,000+ users around the world tap into the massive data store. Multi-structured data is constantly being streamed into Hadoop from the relational systems, and hundreds of thousands of Scribe processes run every day to move data from, for example, servers in Singapore to a Hadoop cluster in the UK. Nokia is also a big user of Apache Sqoop and Apache HBase.
In the coming years you’ll hear more often stories—sales pitches—about single unified platforms solving all these problems at once. But platforms that will survive and thrive are those that will accomplish two things:
- keep the data gates open: in and out.
- work with different other platform to make this efficiently for users
Original title and link: Nokia’s Big Data Ecosystem: Hadoop, Teradata, Oracle, MySQL (©myNoSQL)
Thursday, 28 March 2013
Teradata Deployments:Apple, Walmart, eBay, Verizon, AT&T, BoA
Impressive roster for Teradata. I’d also love to see a list of deployments where Teradata and Hadoop are meeting.
Original title and link: Teradata Deployments:Apple, Walmart, eBay, Verizon, AT&T, BoA (©myNoSQL)
Thursday, 17 May 2012
Hadoop Weaknesses and Where Teradata Aster Sees the Big Data Money
An interesting post on Teradata Aster blog which is indirectly emphasizing the weaknesses of the Hadoop platform:
- Make platform and tools to be easier to use to manage and curate data. Otherwise, garbage in = garbage out, and you will get garbage analytics.
- Provide rich analytics functions out of the box. Each line of programming cuts your reachable audience by 50%.
- Provide tools to update or delete data. Otherwise, data consistency will drift away from truth as history accumulates.
- Provide applications to leverage data and find answers relevant to business. Otherwise the cost of DIY applications is too high to influence business – and won’t be done.
It’s difficult to argue against these points, but they are not insurmountable. I’d even say that once the operational complexity of Hadoop deployments will get simpler—I think the Apache community, Cloudera, and Hortonworks are already working on these aspects—, Hadoop will see even more adoption and with that contributions addressing points 2 to 4 will follow shortly.
Yet another interesting part of the post is the two “equations” describing the two environments:
big clusters = big administration = big programs = big friction = low influence (Hadoop)
big data = small clusters = easy administration = big analytics = big influence (ideal/Teradata Aster)
I think these are revealing how Teradata Aster is positioning their solutions and where they see themselves making money in the Big Data market. It goes like this: “we can make a lot of money if we offer a platform with lower complexity and operational costs and higher productivity leading to better business results”. This is a sound strategy and the competitors from the Hadoop space should better focus on these same aspects which are essential to wide adoption.
Original title and link: Hadoop Weaknesses and Where Teradata Aster Sees the Big Data Money (©myNoSQL)
Wednesday, 4 April 2012
Big Data for C-Suites: Teradata and Big Data the Best Decision Possible
In the series of Big Data for C-Suites, here’s a video from Teradata:
Notice how this one focuses on two dimensions only: keywords and Teradata. For now Hortonworks’s Big Data and Hadoop for C-Suites resonates better with me.
Original title and link: Big Data for C-Suites: Teradata and Big Data the Best Decision Possible (©myNoSQL)
Wednesday, 14 March 2012
Big Data Implications for IT Architecture and Infrastructure
Teradata’s Martin Willcox:
From an IT architecture / infrastructure perspective, I think that the key thing to understand about all of this is that, at least for the foreseeable future, we’ll need at least two different types of “database” technology to efficiently manage and exploit the relational and non-relational data, respectively: an integrated data warehouse, built on an Massively Parallel Processing (MPP) DBMS platform for the relational data, and the relational meta-data that we generate by processing the non-relational data (for example, that a call was made at this date and time, by this customer, and that they were assessed as being stressed and agitated); and another platform for the processing of the non-relational data, that enables us to parallelise complex algorithms - and so bring them to bear on large data-sets - using the MapReduce programming model. Since the value of these data are much greater in combination than in isolation – and because we may be shipping very large volumes of data between the different platforms - considerations of how best to connect and integrate these two repositories become very important.
One of the few corporate blog posts that do not try to position Hadoop (and implicitely MapReduce) in a corner.
This sane perspective could be a validation of my thoughts about the Teradata and Hortwonworks partnership.
Original title and link: Big Data Implications for IT Architecture and Infrastructure (©myNoSQL)
via: http://blogs.teradata.com/emea/What-is-meant-by-the-idea-of-big-data/
Wednesday, 22 February 2012
More Details About the Teradata and Hortonworks Partnership
Some more interesting bits about the Teradata and Hortonworks partnership in Timothy Prickett Morgan’s “Teradata grabs Hortonworks by trunk” on The Register:
The Cloudera deal from September 2010 provided a pipe from a Hadoop cluster into the Teradata data warehouses, while the Hortonworks partnership announced today is providing a pipe between Hadoop and Aster Data appliances.
Hortonworks and Teradata will do joint marketing and development, and are exploring ways to better integrate their respective software. This will specifically be done on Data Platform 1.0 from Hortonworks and Aster Database 5.0 from Teradata. Future engineering work could include running the HortonWorks and Aster Data programs on the same physical clusters, side-by-side, although this is not the way customers tend to do it today, according to Argyros.
Original title and link: More Details About the Teradata and Hortonworks Partnership (©myNoSQL)
Teradata and Hortonworks Partnership and What It Means
Context
Teradata sells software, hardware, and services for data warehouses and analytic applications. Part of the Teradata portfolio is also the Teradata Aster MapReduce Platform a massively parallel processing infrastructure with a software solution that embeds both SQL and MapReduce analytic processing for deeper analytic insights on multi-structured data and new analytic capabilities driven by data science.
Hortonworks offers services around the 100% Apache-licensed, open source Hortonworks Data Platform, an integrated solution built around Hadoop.

Announcement
The interesting bits from the announcement and media coverage:
Teradata and Hortonworks will join forces to provide technologies and strategic guidance to help businesses build integrated, transparent, enterprise-class big data analytic solutions that leverage Apache Hadoop. The partnership will focus on enabling businesses to use Apache Hadoop to harness the value from new sources of data. Businesses will be able to quickly load and refine multi-structured data, some of which is being discarded today, for discovery and analytics. The resulting insights will enable analysts and front line users to make the best business decision possible.

For example, each day websites generate many terabytes of raw, complex data about customers’ viewing and buying habits. These web logs can be directly loaded into Teradata Aster or Apache Hadoop where they can be stored, transformed, and refined in preparation for analysis by the Teradata Aster MapReduce platform (nb: my emphasis).
The company [Teradata] has already worked with Hortonworks’ competitor Cloudera on a connector between the Teradata Database and Cloudera’s Hadoop distribution, but the Hortonworks deal appears a little deeper and more strategic.
The alliance between Teradata and Hortonworks means that companies can get strategic advice about how to get into the new analytics game from Teradata, and have practical help on running the systems from Hortonworks.
However, there are two important challenges that need to be addressed before broad enterprise adoption can occur:
- Understanding the right use cases in which to utilize Apache Hadoop.
- Integrating Apache Hadoop with existing data architectures in an appropriate manner to get better value from existing investments.
My sense of excitement about the Teradata/Hortonworks partnership is amplified by the fact that it addresses these two core challenges for Apache Hadoop:
- We will be rolling out a reference architecture that provides guidance to enterprises that want to understand the best use cases for which to apply Hadoop. As part of that, we will be helping Teradata customers use Hadoop in conjunction with their Teradata and Teradata Aster analytic data solutions investments.
- We will also be working closely with the Teradata engineering teams on jointly engineered solutions that optimize the integration points with Apache Hadoop.
Commentary
-
From Hortonworks perspective this deal is weaker than the Oracle-Cloudera deal.
In the former case, new Teradata sales do not necessary result in new Hortonworks Data Platform installations, while in the case of the Oracle-Cloudera partnership, every sale results in a new business for Cloudera.
-
From Teradata perspective, this partnership gives them a perfect answer and solution for clients asking about unstructured data scenarios.
-
The announcement is slightly positioning Hadoop as part of ETL process, but is not as strict about this as other Hadoop integration architectures—see Netezza and Hadoop and Vertica and Hadoop.
-
Depending on the level of integration the two team will pull together, this partnership might result in one of the most complete and powerful structured and unstructured data warehouse and analytics platform.
I’m looking forward to seeing the proposed architecture blueprint once it’s finalized.
Links
- terradata.com: Teradata-Hortonworks Partnership to Accelerate Business Value from Big Data Technologies
- hortonworks.com: The Importance of the Teradata & Hortonworks Partnership
- The Data Blog: Aster Data Blog » Blog Archive » Perspectives on Teradata-Hortonworks Partnership
- Bits NYTimes.com:Teradata and Hortonworks Join Forces for a Big Data Boost
- GigaOM: Teradata taps Hortonworks to improve Hadoop story
- ServicesANGLE: Hortonworks Announces Partnership with Teradata
Original title and link: Teradata and Hortonworks Partnership and What It Means (©myNoSQL)
Wednesday, 30 November 2011
Explaining Hadoop to Your CEO
Dan Woods (Forbes):
The answer is, yes, Hadoop could be helpful, but there are other technologies as well. For example, technologies such as Splunk allow you to explore big data sets in a way that’s more interactive than most Hadoop implementations. Splunk not only lets you play with big data; you can also distill it and visualize it. Pervasive’s DataRush allows you to write parallel programs using a simplified programming model, and then process lots of data at scale. 1010data allows you to look at a spreadsheet that has a trillion rows, as well as handle time series data. EMC Greenplum and Teradata Aster Data and SAP HANA will also want a crack at your business. If you take any of these technologies and combine them with QlikView, Tableau, or TIBCO Spotfire, you can figure out what a big data set means to your business very quickly. So if your job is understanding the business value of the data, Hadoop is one of many things that you should analyze.
Translation:
Blah blah blah Big Data, blah blah blah list of vendors, blah blah blah Big Data
It might even work for a dummy CEO.
Original title and link: Explaining Hadoop to Your CEO (©myNoSQL)
via: http://www.forbes.com/sites/danwoods/2011/11/03/explaining-hadoop-to-your-ceo/
Wednesday, 5 October 2011
Hadoop: It's Still a Niche Technology
In an otherwise generic but interesting post about Hadoop and its integration with data analytics and data warehouse solutions, Jessica Twentyman writes:
It’s still a niche technology, but Hadoop’s profile received a serious boost over that past year, thanks in part to start-up companies such as Cloudera and MapR that offer commercially licensed and supported distributions of Hadoop. Its growing popularity is also the result of serious interest shown by EDW vendors like EMC, IBM and Teradata. EMC bought Hadoop specialist Greenplum in June 2010; Teradata announced its acquisition of Aster Data in March 2011; and IBM announced its own Hadoop offering, Infosphere, in May 2011.
Unfortunately she got this all wrong. It is the open source community, developers, data scientists, and Cloudera that help popularize Hadoop.
These data analytics and data warehouse vendors are just capitalizing on Hadoop delivering results. They haven’t been knocking at doors asking: “Have you heard of Hadoop? Do you want to try it?”. They’ve run into Hadoop in most of the places they went and that made them realize it is a business opportunity.
So, I’ll say it again: Hadoop is popular thanks to the open source community, developers, data scientists and Cloudera.
Original title and link: Hadoop: It’s Still a Niche Technology (©myNoSQL)
Thursday, 22 September 2011
Big Data Is Going Mainstream: Facebook, Yahoo!, eBay, Quantcast, and Many Others
Shawn Rogers has a short but compelling list of Big Data deployments in his article Big Data is Scaling BI and Analytics. This list also shows that even if there are some common components like Hadoop, there are no blueprints yet for dealing with Big Data.
-
Facebook: Hadoop analytic data warehouse, using HDFS to store more than 30 petabytes of data. Their Big Data stack is based only on open source solutions.
-
Quantcast: 3,000 core, 3,500 terabyte Hadoop deployment that processes more than a petabyte of raw data each day
-
University of Nebraska-Lincoln: 1.6 petabytes of physics data Hadoop cluster
-
Yahoo!: 100,000 CPUs in 40,000 computers, all running Hadoop. Also running a 12 terabyte MOLAP cube based on Tableau Software
-
eBay: has 3 separate analytics environments:
- 6PB data warehouse for structured data and SQL access
- 40PB deep analytics (Teradata)
- 20PB Hadoop system to support advanced analytic workload on unstructured data
Original title and link: Big Data Is Going Mainstream: Facebook, Yahoo!, eBay, Quantcast, and Many Others (©myNoSQL)
Monday, 4 July 2011
Aster Data SQL-MapReduce Technology Patent
From a Teradata PR announcement:
SQL-MapReduce® is a framework which enables fast, investigative analysis of complex information by data scientists and business analysts. It enables procedural expressions in software languages (such as Java, C#, Python, C++, and R) to be parallelized across a group of linked computers (compute cluster) and then activated for use (invoked) with standard SQL.
The closest open source solution I can think of is Pig , created and open sourced by Yahoo! (PDF).
Original title and link: Aster Data SQL-MapReduce Technology Patent (©myNoSQL)
Friday, 24 June 2011
2 Ways to Tackle Really Big Data
So there you have the two approaches to handling machine-generated-data. If you have vast archives, EMC, IBM Netezza, and Teradata all have purpose-build appliances that scale into the petabytes. You also could use Hadoop, which promises much lower cost, but you’ll have to develop separate processes and applications for that environment. You’ll also have to establish or outsource expertise on Hadoop deployment, management, and data processing. For fast-query needs, EMC, IBM Netezza, and Teradata all have fast, standard appliances and faster, high-performance appliances (and companies including Kognitio and Oracle have similar configuration choices). Column-oriented database and appliance vendors including HP Vertica, InfoBright, ParAccel, and Sybase have speed advantages inherent in their database architectures.
I’m wondering why Hadoop is mentioned just in passing considering how many large datasets it is already handling.
Original title and link: 2 Ways to Tackle Really Big Data (NoSQL database©myNoSQL)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling