ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

ibm: All content tagged as ibm in NoSQL databases and polyglot persistence

IBM Accelerates Its Big Data Portfolio

Jeff Kelly takes a look at IBM’s data solutions portfolio:

IBM has the broadest and deepest Big Data product and services portfolio in the industry, as well as the market leading revenue to show for it. But IBM’s greatest asset also lies at the heart of its biggest challenge. With such a diverse set of Big Data capabilities, IBM has struggled to unify them into distinct, compelling offerings. How IBM responds to the challenge of bringing together such a broad and deep set of technologies and services - many the result of $16 billion worth of analytics-related acquisitions since 2005 - into consumable and effective product offerings will largely determine the company’s success (or failure) in the Big Data space and will have major implications for enterprise CIOs.

There are two things that I’m not sure I understand:

  1. is it a known strategy leading to more sales to have a confusing portfolio of products?

    Basically you offer so many products that a customer will be so confused that he’ll have to hire your consultant to make the buying recommendation decision.

  2. when ranking companies by sales, wouldn’t make more sense to compare revenue/employee than raw numbers?

    Which company is better? A company with 2 sales people generating $1mil in revenue or a company with 100 sales people and 100 consultants generating $20mil?

Original title and link: IBM Accelerates Its Big Data Portfolio (NoSQL database©myNoSQL)

via: http://wikibon.org/wiki/v/IBM_Accelerates_Its_Big_Data_Portfolio


Paper: M3R - Increased Performance for In-Memory Hadoop Jobs

For the weekend reads, a paper authored by a reseach team from IBM:

Main Memory Map Reduce (M3R) is a new implementation of the Hadoop Map Reduce (HMR) API targeted at online analytics on high mean-time-to-failure clusters. It does not support resilience, and supports only those workloads which can fit into cluster memory. In return, it can run HMR jobs unchanged — including jobs produced by compilers for higher-level languages such as Pig, Jaql, and SystemML and interactive front-ends like IBM BigSheets — while providing significantly better performance than the Hadoop engine on several workloads (e.g. 45x on some input sizes for sparse matrix vector multiply). M3R also supports extensions to the HMR API which can enable Map Reduce jobs to run faster on the M3R engine, while not affecting their perfor- mance under the Hadoop engine.


How Many Hadoops?

The short answer is there is only one Apache Hadoop distribution.

The long answer is that there are many distributions that include Apache Hadoop or are claiming compatibility with Apache Hadoop.

The oldest and probably most popular: Cloudera’s Distribution of Hadoop (CDH)

The 100% open source: Hortonworks Data Platform.

The prioprietary: MapR.

The blue one: IBM InfoSphere BigInsights.

The latest: WANdisco Hadoop WDD, Intel Distribution of Hadoop and Pivotal HD from EMC Greenplum.

There’s also the version Facebook’s running on their cluster which includes Facebook Corona: a different approach to job scheduling and resource management.

But this list is not complete as it doesn’t include appliances featuring Hadoop. In this category we have:

  1. Oracle’s Big Data appliance featuring Cloudera’s Distribution of Hadoop
  2. Netapp’s Hadooplers
  3. EMC Greenplum DCA
  4. Teradata Aster Discovery Platform featuring Hortonworks’s Hadoop Data Platform
  5. Data Direct Networks (DDN)

I hope I didn’t miss any important ones1. As a conclusion for this list, my question is: who is actually benefiting from all these distributions?


  1. I left aside for now Hadoop-as-a-Service.  

Original title and link: How Many Hadoops? (NoSQL database©myNoSQL)


The Three Pillars of Data-Based Computing: SQL, Hadoop And

IBM’s Arvind Krishna in an interview for The Register:

Krishna said he sees the potential for three pillars of data-based computing: SQL – to give a language and syntax for programming; Hadoop – to provide a MapReduce semantic; and a third pillar which is yet to be decided upon. That could be a MongoDB or HBase, but the market will pick a winner. “There’s a whole set: one will survive,” Krishna said.

I’m pretty sure that last part (i.e. “that could be MongoDB or HBase”) is a mis-quote as the rest of what Krishna is saying makes a lot of sense:

“Wherever open source is mature I will leverage it; I won’t compete with it. To believe one can be monolithic, proprietary and closed and … succeed is a foolish proposition. One has to embrace open source and work with an ecosystem. Clients are looking to you to add value.”

Original title and link: The Three Pillars of Data-Based Computing: SQL, Hadoop And (NoSQL database©myNoSQL)

via: http://www.theregister.co.uk/2012/04/05/ibm_arvind_krishna/


IBM: Behind the Buzz About NoSQL

Mature database management systems like DB2 also offer advantages like high availability and data compression that the newer NoSQL systems have not had time to develop.

Misinform your customers to save them the trouble of discovering alternative solutions.

Original title and link: IBM: Behind the Buzz About NoSQL (NoSQL database©myNoSQL)

via: http://ibmdatamag.com/2012/03/behind-the-buzz-about-nosql/


Netezza Query History Table

Using Netezza’s in-database analytics package FPGROWTH, database administrators can identify the most commonly used combination of tables and the performance of the queries that reference those sets of tables.

Nice feature. Sort of the rich men’s all-included slow query log in MySQL. Do you know if other databases support a similar feature?

Original title and link: Netezza Query History Table (NoSQL database©myNoSQL)

via: http://netezzaadmin.wordpress.com/2012/03/16/ibm-netezza-analytics-to-analyze-query-history-table-usage/


IBM Debuts Netezza Customer Intelligence Appliance

A new motto could be “An appliance for every vertical”. IBM Netezza’s first is for retailers.

Original title and link: IBM Debuts Netezza Customer Intelligence Appliance (NoSQL database©myNoSQL)


12 Hadoop Vendors to Watch in 2012

My list of 8 most interesting companies for the future of Hadoop didn’t try to include anyone having a product with the Hadoop word in it. But the list from InformationWeek does. To save you 15 clicks, here’s their list:

  • Amazon Elastic MapReduce
  • Cloudera
  • Datameer
  • EMC (with EMC Greenplum Unified Analytics Platform and EMC Data Computing Appliance)
  • Hadapt
  • Hortonworks
  • IBM (InfoSphere BigInsights)
  • Informatica (for HParser)
  • Karmasphere
  • MapR
  • Microsoft
  • Oracle

Original title and link: 12 Hadoop Vendors to Watch in 2012 (NoSQL database©myNoSQL)


Partnerships in the Hadoop Market

Just a quick recap:

Amazon doesn’t partner with anyone for their Amazon Elastic Map Reduce. And IBM is walking alone with the software-only InfoSphere BigInsights.

Original title and link: Partnerships in the Hadoop Market (NoSQL database©myNoSQL)


Data Is the New Currency. But Who’s Leading the Way?

In 2005, Tim O’Reilly said: “data is the next Intel Inside“. Today IDC Mario Morales (VP of semiconductor research) says data is the new currency. All’s good until you read the continuation:

And the companies that understand this are the ones already developing the analytics and infrastructure to extract that value—companies like IBM, HP, Intel, Microsoft, TI, Freescale and Oracle.

The article (nb: may require registration) continues by looking at what each of these companies are doing in the Big Data space, but focuses a large part on IBM Watson.

Going back to the question “who’s leading the Big Data way“, let’s take a quick look at the technology behind Watson. According to Jeopardy Goes to Hadoop and About Watson, Watson technology is based on Apache Hadoop, using an IBM language technology built on the Apache UIMA platform[1] and running Linux on IBM boxes.

To me it looks like open source is leading the advances in Big Data and these large organizations are just connecting the dots (as in packaging these technologies for enterprise environments and contributing missing pieces here and there)[2]. When did this happen before?


  1. Dmitriy Ryaboy taught me that UIMA came out of IBM in the first place and they’ve been critical in its development.  

  2. Or they are very secretive about their internal initiatives and research.  

Original title and link: Data Is the New Currency. But Who’s Leading the Way? (NoSQL database©myNoSQL)


GPU-Accelerated Databases

Wolfgang Gruener reporting on a new patent filed by IBM:

Instead of traditional disk-based queries and an approach that slows performance via memory latencies and processors waiting for data to be fetched from the memory, IBM envisions in-GPU-memory tables as technology that could, in addition to disk tables, significantly accelerate database processing. According to a patent filed by the company, “GPU enabled programs are well suited to problems that involve data-parallel computations where the same program is executed on different data with high arithmetic intensity.”

IBM GPU-accelerated databases

Amazon has made a move in the GPU-world by offering Cluster GPU instances which can be used for quite a few interesting scenarios.

Original title and link: GPU-Accelerated Databases (NoSQL database©myNoSQL)

via: http://www.tomshardware.com/news/ibm-patent-gpu-accelerated-database-cuda,13866.html