NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



IBM: All content tagged as IBM in NoSQL databases and polyglot persistence

Data Is the New Currency. But Who’s Leading the Way?

In 2005, Tim O’Reilly said: “data is the next Intel Inside“. Today IDC Mario Morales (VP of semiconductor research) says data is the new currency. All’s good until you read the continuation:

And the companies that understand this are the ones already developing the analytics and infrastructure to extract that value—companies like IBM, HP, Intel, Microsoft, TI, Freescale and Oracle.

The article (nb: may require registration) continues by looking at what each of these companies are doing in the Big Data space, but focuses a large part on IBM Watson.

Going back to the question “who’s leading the Big Data way“, let’s take a quick look at the technology behind Watson. According to Jeopardy Goes to Hadoop and About Watson, Watson technology is based on Apache Hadoop, using an IBM language technology built on the Apache UIMA platform[1] and running Linux on IBM boxes.

To me it looks like open source is leading the advances in Big Data and these large organizations are just connecting the dots (as in packaging these technologies for enterprise environments and contributing missing pieces here and there)[2]. When did this happen before?

  1. Dmitriy Ryaboy taught me that UIMA came out of IBM in the first place and they’ve been critical in its development.  

  2. Or they are very secretive about their internal initiatives and research.  

Original title and link: Data Is the New Currency. But Who’s Leading the Way? (NoSQL database©myNoSQL)

GPU-Accelerated Databases

Wolfgang Gruener reporting on a new patent filed by IBM:

Instead of traditional disk-based queries and an approach that slows performance via memory latencies and processors waiting for data to be fetched from the memory, IBM envisions in-GPU-memory tables as technology that could, in addition to disk tables, significantly accelerate database processing. According to a patent filed by the company, “GPU enabled programs are well suited to problems that involve data-parallel computations where the same program is executed on different data with high arithmetic intensity.”

IBM GPU-accelerated databases

Amazon has made a move in the GPU-world by offering Cluster GPU instances which can be used for quite a few interesting scenarios.

Original title and link: GPU-Accelerated Databases (NoSQL database©myNoSQL)


IBM DB2 to Include NoSQL Features

It didn’t take long for IBM to follow Oracle’s foray into the NoSQL space by announcing that IBM DB2 and Informix will include NoSQL features.

Mark Brunelli quoting Curt Cotner, IBM VP and CTO for database servers:

So, we actually took one of these NoSQL triplestores from the open source [community and] we modified it to sit on top of DB2 so that it can use DB2’s indexing, DB2’s logging, DB2’s solution for high availability [and] and all the things you would expect.

Reports are not very clear yet, but it seems that DB2 NoSQLish features are based on IBM’s Rational Jazz tripplestore solution—an approach similar to Oracle’s NoSQL Database 11G which is based on Oracle’s BerkleyDB Java Edition.

When speculating about Oracle’s future in the NoSQL market I was writing that I expect Oracle to extend the support for NoSQLish interfaces to its core database products. And it looks like IBM is taking exactly this route:

Curt Cotner: “All of the DB2 and IBM Informix customers will have access to that and it will be part of your existing stack and you won’t have to pay extra for it. We’ll put that into our database products because we think that this is [something] that people want from their application programming experience, and it makes sense to put it natively inside of DB2.”

Looking back at these events (Oracle’s NoSQL database, Oracle Big Data appliance, IBM DB2 and Informix supporting NoSQL features), makes me think if and how are these related to the new Enterprise NoSQL trend I’ve mentioned earlier.

Original title and link: IBM DB2 to Include NoSQL Features (NoSQL database©myNoSQL)

Hadoop: It's Still a Niche Technology

In an otherwise generic but interesting post about Hadoop and its integration with data analytics and data warehouse solutions, Jessica Twentyman writes:

It’s still a niche technology, but Hadoop’s profile received a serious boost over that past year, thanks in part to start-up companies such as Cloudera and MapR that offer commercially licensed and supported distributions of Hadoop. Its growing popularity is also the result of serious interest shown by EDW vendors like EMC, IBM and Teradata. EMC bought Hadoop specialist Greenplum in June 2010; Teradata announced its acquisition of Aster Data in March 2011; and IBM announced its own Hadoop offering, Infosphere, in May 2011.

Unfortunately she got this all wrong. It is the open source community, developers, data scientists, and Cloudera that help popularize Hadoop.

These data analytics and data warehouse vendors are just capitalizing on Hadoop delivering results. They haven’t been knocking at doors asking: “Have you heard of Hadoop? Do you want to try it?”. They’ve run into Hadoop in most of the places they went and that made them realize it is a business opportunity.

So, I’ll say it again: Hadoop is popular thanks to the open source community, developers, data scientists and Cloudera.

Original title and link: Hadoop: It’s Still a Niche Technology (NoSQL database©myNoSQL)


Hadoop and Netezza: Differences & Similarities

Most of the time vendor videos are emphasizing the superiority of their own commercial platform. But this short video gives a fair overview of the similarities and differences between Hadoop and Netezza.

The video is 5 minutes long and well worth watching.

BigData Market: IBM Acquires Two Analytics Companies

IBM jumps in the “big data” rush as it announced two major acquisitions in two days. On Wednesday, Big Blue announced that it will acquire security intelligence analytics company i2 […] The second major buy was revealed earlier today. IBM announced the deal to acquire Algorithmics, a risk analytics software and advisory service

The higher on the data stack your business is the more challenges it faces but the higher the reward. The good news is that the well established data companies have started the hunting acquisition season.

Original title and link: BigData Market: IBM Acquires Two Analytics Companies (NoSQL database©myNoSQL)


BI Pentaho Integrates Hadoop, NoSQL Databases, and Analytic Databases


  • The ability to orchestrate execution of Hadoop related tasks (i.e., executing a Hive Query, Pig Script, or M/R job) as part of a broader IT workflow.
  • The ability to setup dependencies, so if a step fails the job can branch down a recovery path or send a notification, or if it’s a success it goes on to subsequent dependent tasks. Likewise it supports initiating several tasks in parallel.
  • New integration for Pig — so that developers have the ability to execute a Pig job from a PDI Job flow, integrate the execution of Pig jobs in broader IT workflows through PDI Jobs, take advantage of our out of the box scheduler, and so on.

The list of tools Pentaho 4 integrates with is quite long:

  • a long list of traditional RDBMS
  • analytics databases (Greenplum, Vertica, Netezza, Teradata, etc.)
  • NoSQL databases (MongoDB, HBase, etc.)
  • Hadoop variants
  • LexisNexis HPCC

This is the world of polyglot persistence and hybrid data storage.

Original title and link: BI Pentaho Integrates Hadoop, NoSQL Databases, and Analytic Databases (NoSQL database©myNoSQL)

Hadoop and IBM Netezza: Compete or Co-Exist?

I assume people on both sides of data warehouses (users and providers) are asking the same question. IBM Netezza and Cloudera seem to agree on the answer:

IBM Netezza had worked with Cloudera to put together a compelling demo to highlight the value of our combined solution of CDH/Hadoop and Netezza.  Through an interesting use case, the demo showed how businesses could have their “hot” data (most recent data) residing in Netezza, “warm” data (longer time range data) residing in HDFS, while leveraging the Cloudera Connector for Netezza and Oozie (workflow engine part of CDH) to provide deeper insights to business executives.

I would have liked to know more details about the use case though. Just categorizing data in “hot” and “warm” is not enough to understand the advantages of each piece.

Original title and link: Hadoop and IBM Netezza: Compete or Co-Exist? (NoSQL database©myNoSQL)


What's Next for IBM Watson?

If you are waiting for a financial services version of the powerful artificial intelligence system that won a game of Jeopardy against two of the highest winning champions of all time — Brad Rutter and Ken Jennings — don’t hold your breath … yet. Unfortunately for the Wall Street techno-geeks and quants looking for another tool to add to their algorithmic arsenal, IBM isn’t working on a financial services version of Watson at this time, according to Dr. David Ferrucci […]

Hopefully money will not change this decision too soon.

Original title and link: What’s Next for IBM Watson? (NoSQL database©myNoSQL)


IBM Launches First Netezza Appliance

The pitch:

The IBM® Netezza High Capacity Appliance extends IBM Netezza’s family of data warehouse appliances to new extremes of data capacity, scaling to multiple petabytes of user data. This will enable organizations to meet a variety of analytical and historical data storage requirements with a single cost-effective appliance.

The reason for posting about it is this price information from the ZDNet announcement :

The big pitch for Netezza is the price per user per terabyte[1]. Mills said the Netezza appliance will run about $2,500 per user per terabye compared to an average of $10,000.

  1. My emphasis.  

Original title and link: IBM Launches First Netezza Appliance (NoSQL database©myNoSQL)


Oracle and IBM May Not Know Big Data, but Neither Does Ballmer

The echo chamber is reacting:

Specifically, for a data processing and analytics project to qualify as Big Data, it must encompass not just internal corporate data, but also third-party data that resides outside the firewall, according to Ballmer. He said IBM and Oracle limit their Big Data approaches to internal data, thus they are not in fact Big Data by his definition.


IBM, Oracle and now Microsoft are jockeying to position each of their approaches to Big Data as the industry standard, and Ballmer is clearly trying to steer the Big Data conversation towards Microsoft’s strengths and away from its weaknesses. That means talking up Microsoft’s ability to integrate third-party data with relatively large volumes of corporate data inside Microsoft’s SQL Server R2 Parallel Data Warehouse and away from its lack of petabyte-scale data processing power.

I guess there will be no end to the Oracle-IBM-Microsoft triangle love, so I’ll stop here until real facts are added to the story.

Original title and link: Oracle and IBM May Not Know Big Data, but Neither Does Ballmer (NoSQL database©myNoSQL)