ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

netezza: All content tagged as netezza in NoSQL databases and polyglot persistence

Netezza Query History Table

Using Netezza’s in-database analytics package FPGROWTH, database administrators can identify the most commonly used combination of tables and the performance of the queries that reference those sets of tables.

Nice feature. Sort of the rich men’s all-included slow query log in MySQL. Do you know if other databases support a similar feature?

Original title and link: Netezza Query History Table (NoSQL database©myNoSQL)

via: http://netezzaadmin.wordpress.com/2012/03/16/ibm-netezza-analytics-to-analyze-query-history-table-usage/


IBM Debuts Netezza Customer Intelligence Appliance

A new motto could be “An appliance for every vertical”. IBM Netezza’s first is for retailers.

Original title and link: IBM Debuts Netezza Customer Intelligence Appliance (NoSQL database©myNoSQL)


R: the Leading Statistics Language and Key Weapon in Advanced Analytics Today

David Smith (Revolution Analytics):

Of course, this isn’t the first time that R has been embedded into a data warehousing appliance. IBM Netezza’s iClass device integrates with Revolution R, and AsterData, the Teradata Data Warehouse Appliance, and Greenplum all provide connections to R as well. Here at Revolution Analytics, we think that such enterprise-level integrations with R serve to grow the R ecosystem and serve as validation of R as a key platform for advanced analytics. As CEO Norman Nie said to GigaOm this weekend, 

“Oracle’s announcement to embed R demonstrates validation for the leading statistics language and offers further evidence that R is a key weapon in advanced analytics today”

And let’s not leave aside the strategic partnership between Revolution Analytics and Cloudera to include RevoConnectR in the CDH.

Original title and link: R: the Leading Statistics Language and Key Weapon in Advanced Analytics Today (NoSQL database©myNoSQL)

via: http://www.r-bloggers.com/oracles-big-data-appliance-to-include-r/


Hadoop and Netezza: Differences & Similarities

Most of the time vendor videos are emphasizing the superiority of their own commercial platform. But this short video gives a fair overview of the similarities and differences between Hadoop and Netezza.

The video is 5 minutes long and well worth watching.


BI Pentaho Integrates Hadoop, NoSQL Databases, and Analytic Databases

Dr.Dobb’s:

  • The ability to orchestrate execution of Hadoop related tasks (i.e., executing a Hive Query, Pig Script, or M/R job) as part of a broader IT workflow.
  • The ability to setup dependencies, so if a step fails the job can branch down a recovery path or send a notification, or if it’s a success it goes on to subsequent dependent tasks. Likewise it supports initiating several tasks in parallel.
  • New integration for Pig — so that developers have the ability to execute a Pig job from a PDI Job flow, integrate the execution of Pig jobs in broader IT workflows through PDI Jobs, take advantage of our out of the box scheduler, and so on.

The list of tools Pentaho 4 integrates with is quite long:

  • a long list of traditional RDBMS
  • analytics databases (Greenplum, Vertica, Netezza, Teradata, etc.)
  • NoSQL databases (MongoDB, HBase, etc.)
  • Hadoop variants
  • LexisNexis HPCC

This is the world of polyglot persistence and hybrid data storage.

Original title and link: BI Pentaho Integrates Hadoop, NoSQL Databases, and Analytic Databases (NoSQL database©myNoSQL)


Hadoop and IBM Netezza: Compete or Co-Exist?

I assume people on both sides of data warehouses (users and providers) are asking the same question. IBM Netezza and Cloudera seem to agree on the answer:

IBM Netezza had worked with Cloudera to put together a compelling demo to highlight the value of our combined solution of CDH/Hadoop and Netezza.  Through an interesting use case, the demo showed how businesses could have their “hot” data (most recent data) residing in Netezza, “warm” data (longer time range data) residing in HDFS, while leveraging the Cloudera Connector for Netezza and Oozie (workflow engine part of CDH) to provide deeper insights to business executives.

I would have liked to know more details about the use case though. Just categorizing data in “hot” and “warm” is not enough to understand the advantages of each piece.

Original title and link: Hadoop and IBM Netezza: Compete or Co-Exist? (NoSQL database©myNoSQL)

via: http://www.cloudera.com/blog/2011/06/reflections-from-enzee-universe-2011/


2 Ways to Tackle Really Big Data

So there you have the two approaches to handling machine-generated-data. If you have vast archives, EMC, IBM Netezza, and Teradata all have purpose-build appliances that scale into the petabytes. You also could use Hadoop, which promises much lower cost, but you’ll have to develop separate processes and applications for that environment. You’ll also have to establish or outsource expertise on Hadoop deployment, management, and data processing. For fast-query needs, EMC, IBM Netezza, and Teradata all have fast, standard appliances and faster, high-performance appliances (and companies including Kognitio and Oracle have similar configuration choices). Column-oriented database and appliance vendors including HP Vertica, InfoBright, ParAccel, and Sybase have speed advantages inherent in their database architectures.

I’m wondering why Hadoop is mentioned just in passing considering how many large datasets it is already handling.

Original title and link: 2 Ways to Tackle Really Big Data (NoSQL database©myNoSQL)

via: http://www.informationweek.com/news/software/info_management/231000314?cid=RSSfeed_IWK_Business_Intelligence


IBM Launches First Netezza Appliance

The pitch:

The IBM® Netezza High Capacity Appliance extends IBM Netezza’s family of data warehouse appliances to new extremes of data capacity, scaling to multiple petabytes of user data. This will enable organizations to meet a variety of analytical and historical data storage requirements with a single cost-effective appliance.

The reason for posting about it is this price information from the ZDNet announcement :

The big pitch for Netezza is the price per user per terabyte[1]. Mills said the Netezza appliance will run about $2,500 per user per terabye compared to an average of $10,000.


  1. My emphasis.  

Original title and link: IBM Launches First Netezza Appliance (NoSQL database©myNoSQL)

via: http://www.zdnet.com/blog/btl/ibm-launches-new-netezza-appliance-eyes-big-data/51135


The Data Processing Platform for Tomorrow

In the blue corner we have IBM with Netezza as analytic database, Cognos for BI, and SPSS for predictive analytics. In the green corner we have EMC with Greenplum and the partnership with SAS[1]. And in the open source corner we have Hadoop and R.

Update: there’s also another corner I don’t know how to color where Teradata and its recently acquired Aster Data partner with SAS.

Who is ready to bet on which of these platforms will be processing more data in the next years?


  1. GigaOm has a good article on this subject here  

Original title and link: The Data Processing Platform for Tomorrow (NoSQL databases © myNoSQL)


Types of Big Data Work

Mike Minelli: Working with big data can be classified into three basic categories […] One is information management, a second is business intelligence, and the third is advanced analytics

Information management captures and stores the information, BI analyzes data to see what has happened in the past, and advanced analytics is predictive, looking at what the data indicates for the future.

There’s also a list of tools for BigData: AsterData (acquired by Teradata), Datameer, Paraccel, IBM Netezza, Oracle Exadata, EMC Greenplum.

Original title and link: Types of Big Data Work (NoSQL databases © myNoSQL)

via: http://www.linuxinsider.com/story/71945.html


Cloudera: A Business Inteligence Leader

The Informatica accord is Cloudera’s second partnership this year with a leading DI player. Back in August, Cloudera cemented a deal with open source software (OSS) data integration (DI) specialist Talend. It also has partnerships with Teradata Corp., the former Netezza Inc., the former Greenplum Software Corp., Aster Data Systems Inc., Vertica Inc., and Pentaho.

One thing’s for sure: Cloudera is certainly attracting attention.

The strategy is surprisingly simple: make it easy to put data in and get it out.

Original title and link: Cloudera: A Business Inteligence Leader (NoSQL databases © myNoSQL)

via: http://tdwi.org/articles/2011/02/16/cloudera-leader-bi-hadoop.aspx


Hadoop Spreading through Cloudera Parternships

Cloudera in its attempt to Hadoopize the world goes on partnership spree:

Many of you may have read about some of the recent announcements of partnerships between Cloudera and some of the leading data management software companies like Teradata, Netezza, Greenplum (EMC), Quest and Aster Data. We established these partnerships because Hadoop is increasingly serving as an open platform that many different applications and complimentary technologies work with. Our goal is to to make this as easy and as standardized as possible.

Checking the ☞ press release section turns out the following parnerships:

  • Membase
  • Talend
  • Quest
  • Pentaho
  • NTT Data
  • Aster Data
  • EMC Greenplum
  • Teradata
  • Netezza

Quite a few companies from the non-relational market.

Original title and link: Hadoop Spreading through Cloudera Parternships (NoSQL databases © myNoSQL)

via: http://www.cloudera.com/blog/2010/10/cdh3-beta-3-now-available/