NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



AsterData: All content tagged as AsterData in NoSQL databases and polyglot persistence

Hadoop Weaknesses and Where Teradata Aster Sees the Big Data Money

An interesting post on Teradata Aster blog which is indirectly emphasizing the weaknesses of the Hadoop platform:

  1. Make platform and tools to be easier to use to manage and curate data. Otherwise, garbage in = garbage out, and you will get garbage analytics.
  2. Provide rich analytics functions out of the box. Each line of programming cuts your reachable audience by 50%.
  3. Provide tools to update or delete data. Otherwise, data consistency will drift away from truth as history accumulates.
  4. Provide applications to leverage data and find answers relevant to business. Otherwise the cost of DIY applications is too high to influence business – and won’t be done.

It’s difficult to argue against these points, but they are not insurmountable. I’d even say that once the operational complexity of Hadoop deployments will get simpler—I think the Apache community, Cloudera, and Hortonworks are already working on these aspects—, Hadoop will see even more adoption and with that contributions addressing points 2 to 4 will follow shortly.

Yet another interesting part of the post is the two “equations” describing the two environments:

big clusters = big administration = big programs = big friction = low influence (Hadoop)
big data = small clusters = easy administration = big analytics = big influence (ideal/Teradata Aster)

I think these are revealing how Teradata Aster is positioning their solutions and where they see themselves making money in the Big Data market. It goes like this: “we can make a lot of money if we offer a platform with lower complexity and operational costs and higher productivity leading to better business results”. This is a sound strategy and the competitors from the Hadoop space should better focus on these same aspects which are essential to wide adoption.

Original title and link: Hadoop Weaknesses and Where Teradata Aster Sees the Big Data Money (NoSQL database©myNoSQL)


Big Data Implications for IT Architecture and Infrastructure

Teradata’s Martin Willcox:

From an IT architecture / infrastructure perspective, I think that the key thing to understand about all of this is that, at least for the foreseeable future, we’ll need at least two different types of “database” technology to efficiently manage and exploit the relational and non-relational data, respectively: an integrated data warehouse, built on an Massively Parallel Processing (MPP) DBMS platform for the relational data, and the relational meta-data that we generate by processing the non-relational data (for example, that a call was made at this date and time, by this customer, and that they were assessed as being stressed and agitated); and another platform for the processing of the non-relational data, that enables us to parallelise complex algorithms - and so bring them to bear on large data-sets - using the MapReduce programming model. Since the value of these data are much greater in combination than in isolation – and because we may be shipping very large volumes of data between the different platforms - considerations of how best to connect and integrate these two repositories become very important.

One of the few corporate blog posts that do not try to position Hadoop (and implicitely MapReduce) in a corner.

This sane perspective could be a validation of my thoughts about the Teradata and Hortwonworks partnership.

Original title and link: Big Data Implications for IT Architecture and Infrastructure (NoSQL database©myNoSQL)


Explaining Hadoop to Your CEO

Dan Woods (Forbes):

The answer is, yes, Hadoop could be helpful, but there are other technologies as well. For example, technologies such as Splunk allow you to explore big data sets in a way that’s more interactive than most Hadoop implementations. Splunk not only lets you play with big data; you can also distill it and visualize it. Pervasive’s DataRush allows you to write parallel programs using a simplified programming model, and then process lots of data at scale. 1010data allows you to look at a spreadsheet that has a trillion rows, as well as handle time series data. EMC Greenplum and Teradata Aster Data and SAP HANA will also want a crack at your business. If you take any of these technologies and combine them with QlikView, Tableau, or TIBCO Spotfire, you can figure out what a big data set means to your business very quickly. So if your job is understanding the business value of the data, Hadoop is one of many things that you should analyze.


Blah blah blah Big Data, blah blah blah list of vendors, blah blah blah Big Data

It might even work for a dummy CEO.

Original title and link: Explaining Hadoop to Your CEO (NoSQL database©myNoSQL)


R: the Leading Statistics Language and Key Weapon in Advanced Analytics Today

David Smith (Revolution Analytics):

Of course, this isn’t the first time that R has been embedded into a data warehousing appliance. IBM Netezza’s iClass device integrates with Revolution R, and AsterData, the Teradata Data Warehouse Appliance, and Greenplum all provide connections to R as well. Here at Revolution Analytics, we think that such enterprise-level integrations with R serve to grow the R ecosystem and serve as validation of R as a key platform for advanced analytics. As CEO Norman Nie said to GigaOm this weekend, 

“Oracle’s announcement to embed R demonstrates validation for the leading statistics language and offers further evidence that R is a key weapon in advanced analytics today”

And let’s not leave aside the strategic partnership between Revolution Analytics and Cloudera to include RevoConnectR in the CDH.

Original title and link: R: the Leading Statistics Language and Key Weapon in Advanced Analytics Today (NoSQL database©myNoSQL)


The Data Processing Platform for Tomorrow

In the blue corner we have IBM with Netezza as analytic database, Cognos for BI, and SPSS for predictive analytics. In the green corner we have EMC with Greenplum and the partnership with SAS[1]. And in the open source corner we have Hadoop and R.

Update: there’s also another corner I don’t know how to color where Teradata and its recently acquired Aster Data partner with SAS.

Who is ready to bet on which of these platforms will be processing more data in the next years?

  1. GigaOm has a good article on this subject here  

Original title and link: The Data Processing Platform for Tomorrow (NoSQL databases © myNoSQL)

Types of Big Data Work

Mike Minelli: Working with big data can be classified into three basic categories […] One is information management, a second is business intelligence, and the third is advanced analytics

Information management captures and stores the information, BI analyzes data to see what has happened in the past, and advanced analytics is predictive, looking at what the data indicates for the future.

There’s also a list of tools for BigData: AsterData (acquired by Teradata), Datameer, Paraccel, IBM Netezza, Oracle Exadata, EMC Greenplum.

Original title and link: Types of Big Data Work (NoSQL databases © myNoSQL)


MySpace and Big Data

As you can imagine MySpace has to deal with huge amounts of data too, but they are doing it differently:

In its search for new data warehousing technology, the company evaluated technology from vendors including Teradata and Netezza. However, says Watters, “we didn’t think any of it could scale according to our needs.”

Instead it turned to Aster Data, whose “massively parallel” database technology is based on Google’s MapReduce distributed analytics engine[1]

  1. According to ☞ this: “Aster’s patent-pending analytics framework called SQL-MapReduce is unique to the Aster Data platform.”  ()