ibm: All content tagged as ibm in NoSQL databases and polyglot persistence
It didn’t take long for IBM to follow Oracle’s foray into the NoSQL space by announcing that IBM DB2 and Informix will include NoSQL features.
So, we actually took one of these NoSQL triplestores from the open source [community and] we modified it to sit on top of DB2 so that it can use DB2’s indexing, DB2’s logging, DB2’s solution for high availability [and] and all the things you would expect.
Reports are not very clear yet, but it seems that DB2 NoSQLish features are based on IBM’s Rational Jazz tripplestore solution—an approach similar to Oracle’s NoSQL Database 11G which is based on Oracle’s BerkleyDB Java Edition.
When speculating about Oracle’s future in the NoSQL market I was writing that I expect Oracle to extend the support for NoSQLish interfaces to its core database products. And it looks like IBM is taking exactly this route:
Curt Cotner: “All of the DB2 and IBM Informix customers will have access to that and it will be part of your existing stack and you won’t have to pay extra for it. We’ll put that into our database products because we think that this is [something] that people want from their application programming experience, and it makes sense to put it natively inside of DB2.”
Looking back at these events (Oracle’s NoSQL database, Oracle Big Data appliance, IBM DB2 and Informix supporting NoSQL features), makes me think if and how are these related to the new Enterprise NoSQL trend I’ve mentioned earlier.
Original title and link: IBM DB2 to Include NoSQL Features ( ©myNoSQL)
Most of the time vendor videos are emphasizing the superiority of their own commercial platform. But this short video gives a fair overview of the similarities and differences between Hadoop and Netezza.
The video is 5 minutes long and well worth watching.
- The ability to orchestrate execution of Hadoop related tasks (i.e., executing a Hive Query, Pig Script, or M/R job) as part of a broader IT workflow.
- The ability to setup dependencies, so if a step fails the job can branch down a recovery path or send a notification, or if it’s a success it goes on to subsequent dependent tasks. Likewise it supports initiating several tasks in parallel.
- New integration for Pig — so that developers have the ability to execute a Pig job from a PDI Job flow, integrate the execution of Pig jobs in broader IT workflows through PDI Jobs, take advantage of our out of the box scheduler, and so on.
The list of tools Pentaho 4 integrates with is quite long:
- a long list of traditional RDBMS
- analytics databases (Greenplum, Vertica, Netezza, Teradata, etc.)
- NoSQL databases (MongoDB, HBase, etc.)
- Hadoop variants
- LexisNexis HPCC
This is the world of polyglot persistence and hybrid data storage.
Original title and link: BI Pentaho Integrates Hadoop, NoSQL Databases, and Analytic Databases ( ©myNoSQL)
jaql was created and is used by IBM InfoSphere BigInsights—the IBM Apache Hadoop distribution:
Jaql’s query language was inspired by many programming and query languages that include: Lisp, SQL, XQuery, and PigLatin. Jaql is a functional, declarative query language that is designed to process large data sets. For parallelism, Jaql rewrites high-level queries when appropriate into a “low-level” query consisting of Map-Reduce jobs that are evaluated using the Apache Hadoop project. Interestingly, the query rewriter produces valid Jaql queries which illustrates a departure from the rigid, declarative-only approach (but with hints!) of most relational databases. Instead, developers can interact with the “low-level” queries if needed and can add in their own low-level functionality such as indexed access or hash-based joins that are missing from Map-Reduce platforms.
Original title and link: Jaql: Query Language for JSON in IBM InfoSphere BigInsights (NoSQL database©myNoSQL)
In the blue corner we have IBM with Netezza as analytic database, Cognos for BI, and SPSS for predictive analytics. In the green corner we have EMC with Greenplum and the partnership with SAS. And in the open source corner we have Hadoop and R.
Update: there’s also another corner I don’t know how to color where Teradata and its recently acquired Aster Data partner with SAS.
Who is ready to bet on which of these platforms will be processing more data in the next years?