IBM: All content tagged as IBM in NoSQL databases and polyglot persistence
Monday, 28 November 2011
GPU-Accelerated Databases
Wolfgang Gruener reporting on a new patent filed by IBM:
Instead of traditional disk-based queries and an approach that slows performance via memory latencies and processors waiting for data to be fetched from the memory, IBM envisions in-GPU-memory tables as technology that could, in addition to disk tables, significantly accelerate database processing. According to a patent filed by the company, “GPU enabled programs are well suited to problems that involve data-parallel computations where the same program is executed on different data with high arithmetic intensity.”

Amazon has made a move in the GPU-world by offering Cluster GPU instances which can be used for quite a few interesting scenarios.
Original title and link: GPU-Accelerated Databases (©myNoSQL)
via: http://www.tomshardware.com/news/ibm-patent-gpu-accelerated-database-cuda,13866.html
Monday, 31 October 2011
IBM DB2 to Include NoSQL Features
It didn’t take long for IBM to follow Oracle’s foray into the NoSQL space by announcing that IBM DB2 and Informix will include NoSQL features.
Mark Brunelli quoting Curt Cotner, IBM VP and CTO for database servers:
So, we actually took one of these NoSQL triplestores from the open source [community and] we modified it to sit on top of DB2 so that it can use DB2’s indexing, DB2’s logging, DB2’s solution for high availability [and] and all the things you would expect.
Reports are not very clear yet, but it seems that DB2 NoSQLish features are based on IBM’s Rational Jazz tripplestore solution—an approach similar to Oracle’s NoSQL Database 11G which is based on Oracle’s BerkleyDB Java Edition.
When speculating about Oracle’s future in the NoSQL market I was writing that I expect Oracle to extend the support for NoSQLish interfaces to its core database products. And it looks like IBM is taking exactly this route:
Curt Cotner: “All of the DB2 and IBM Informix customers will have access to that and it will be part of your existing stack and you won’t have to pay extra for it. We’ll put that into our database products because we think that this is [something] that people want from their application programming experience, and it makes sense to put it natively inside of DB2.”
Looking back at these events (Oracle’s NoSQL database, Oracle Big Data appliance, IBM DB2 and Informix supporting NoSQL features), makes me think if and how are these related to the new Enterprise NoSQL trend I’ve mentioned earlier.
Original title and link: IBM DB2 to Include NoSQL Features (©myNoSQL)
Wednesday, 5 October 2011
Hadoop: It's Still a Niche Technology
In an otherwise generic but interesting post about Hadoop and its integration with data analytics and data warehouse solutions, Jessica Twentyman writes:
It’s still a niche technology, but Hadoop’s profile received a serious boost over that past year, thanks in part to start-up companies such as Cloudera and MapR that offer commercially licensed and supported distributions of Hadoop. Its growing popularity is also the result of serious interest shown by EDW vendors like EMC, IBM and Teradata. EMC bought Hadoop specialist Greenplum in June 2010; Teradata announced its acquisition of Aster Data in March 2011; and IBM announced its own Hadoop offering, Infosphere, in May 2011.
Unfortunately she got this all wrong. It is the open source community, developers, data scientists, and Cloudera that help popularize Hadoop.
These data analytics and data warehouse vendors are just capitalizing on Hadoop delivering results. They haven’t been knocking at doors asking: “Have you heard of Hadoop? Do you want to try it?”. They’ve run into Hadoop in most of the places they went and that made them realize it is a business opportunity.
So, I’ll say it again: Hadoop is popular thanks to the open source community, developers, data scientists and Cloudera.
Original title and link: Hadoop: It’s Still a Niche Technology (©myNoSQL)
Saturday, 1 October 2011
Hadoop and Netezza: Differences & Similarities
Most of the time vendor videos are emphasizing the superiority of their own commercial platform. But this short video gives a fair overview of the similarities and differences between Hadoop and Netezza.
The video is 5 minutes long and well worth watching.
Wednesday, 21 September 2011
BigData Market: IBM Acquires Two Analytics Companies
IBM jumps in the “big data” rush as it announced two major acquisitions in two days. On Wednesday, Big Blue announced that it will acquire security intelligence analytics company i2 […] The second major buy was revealed earlier today. IBM announced the deal to acquire Algorithmics, a risk analytics software and advisory service
The higher on the data stack your business is the more challenges it faces but the higher the reward. The good news is that the well established data companies have started the hunting acquisition season.
Original title and link: BigData Market: IBM Acquires Two Analytics Companies (©myNoSQL)
via: http://siliconangle.com/blog/2011/09/01/ibm-is-on-an-acquisition-spree-for-big-data-software/
Wednesday, 10 August 2011
BI Pentaho Integrates Hadoop, NoSQL Databases, and Analytic Databases
- The ability to orchestrate execution of Hadoop related tasks (i.e., executing a Hive Query, Pig Script, or M/R job) as part of a broader IT workflow.
- The ability to setup dependencies, so if a step fails the job can branch down a recovery path or send a notification, or if it’s a success it goes on to subsequent dependent tasks. Likewise it supports initiating several tasks in parallel.
- New integration for Pig — so that developers have the ability to execute a Pig job from a PDI Job flow, integrate the execution of Pig jobs in broader IT workflows through PDI Jobs, take advantage of our out of the box scheduler, and so on.
The list of tools Pentaho 4 integrates with is quite long:
- a long list of traditional RDBMS
- analytics databases (Greenplum, Vertica, Netezza, Teradata, etc.)
- NoSQL databases (MongoDB, HBase, etc.)
- Hadoop variants
- LexisNexis HPCC
This is the world of polyglot persistence and hybrid data storage.
Original title and link: BI Pentaho Integrates Hadoop, NoSQL Databases, and Analytic Databases (©myNoSQL)
Tuesday, 12 July 2011
Hadoop and IBM Netezza: Compete or Co-Exist?
I assume people on both sides of data warehouses (users and providers) are asking the same question. IBM Netezza and Cloudera seem to agree on the answer:
IBM Netezza had worked with Cloudera to put together a compelling demo to highlight the value of our combined solution of CDH/Hadoop and Netezza. Through an interesting use case, the demo showed how businesses could have their “hot” data (most recent data) residing in Netezza, “warm” data (longer time range data) residing in HDFS, while leveraging the Cloudera Connector for Netezza and Oozie (workflow engine part of CDH) to provide deeper insights to business executives.
I would have liked to know more details about the use case though. Just categorizing data in “hot” and “warm” is not enough to understand the advantages of each piece.
Original title and link: Hadoop and IBM Netezza: Compete or Co-Exist? (©myNoSQL)
via: http://www.cloudera.com/blog/2011/06/reflections-from-enzee-universe-2011/
Friday, 24 June 2011
What's Next for IBM Watson?
If you are waiting for a financial services version of the powerful artificial intelligence system that won a game of Jeopardy against two of the highest winning champions of all time — Brad Rutter and Ken Jennings — don’t hold your breath … yet. Unfortunately for the Wall Street techno-geeks and quants looking for another tool to add to their algorithmic arsenal, IBM isn’t working on a financial services version of Watson at this time, according to Dr. David Ferrucci […]
Hopefully money will not change this decision too soon.
Original title and link: What’s Next for IBM Watson? (NoSQL database©myNoSQL)
via: http://www.wallstreetandtech.com/trading-technology/230700015
Thursday, 23 June 2011
IBM Launches First Netezza Appliance
The IBM® Netezza High Capacity Appliance extends IBM Netezza’s family of data warehouse appliances to new extremes of data capacity, scaling to multiple petabytes of user data. This will enable organizations to meet a variety of analytical and historical data storage requirements with a single cost-effective appliance.
The reason for posting about it is this price information from the ZDNet announcement :
The big pitch for Netezza is the price per user per terabyte[1]. Mills said the Netezza appliance will run about $2,500 per user per terabye compared to an average of $10,000.
-
My emphasis. ↩
Original title and link: IBM Launches First Netezza Appliance (NoSQL database©myNoSQL)
via: http://www.zdnet.com/blog/btl/ibm-launches-new-netezza-appliance-eyes-big-data/51135
Thursday, 16 June 2011
Oracle and IBM May Not Know Big Data, but Neither Does Ballmer
Specifically, for a data processing and analytics project to qualify as Big Data, it must encompass not just internal corporate data, but also third-party data that resides outside the firewall, according to Ballmer. He said IBM and Oracle limit their Big Data approaches to internal data, thus they are not in fact Big Data by his definition.
[…]
IBM, Oracle and now Microsoft are jockeying to position each of their approaches to Big Data as the industry standard, and Ballmer is clearly trying to steer the Big Data conversation towards Microsoft’s strengths and away from its weaknesses. That means talking up Microsoft’s ability to integrate third-party data with relatively large volumes of corporate data inside Microsoft’s SQL Server R2 Parallel Data Warehouse and away from its lack of petabyte-scale data processing power.
I guess there will be no end to the Oracle-IBM-Microsoft triangle love, so I’ll stop here until real facts are added to the story.
Original title and link: Oracle and IBM May Not Know Big Data, but Neither Does Ballmer (NoSQL database©myNoSQL)
via: http://wikibon.org/blog/oracle-and-ibm-may-not-know-big-data-but-neither-does-ballmer/
Wednesday, 15 June 2011
Jaql: Query Language for JSON in IBM InfoSphere BigInsights
jaql was created and is used by IBM InfoSphere BigInsights—the IBM Apache Hadoop distribution:
Jaql’s query language was inspired by many programming and query languages that include: Lisp, SQL, XQuery, and PigLatin. Jaql is a functional, declarative query language that is designed to process large data sets. For parallelism, Jaql rewrites high-level queries when appropriate into a “low-level” query consisting of Map-Reduce jobs that are evaluated using the Apache Hadoop project. Interestingly, the query rewriter produces valid Jaql queries which illustrates a departure from the rigid, declarative-only approach (but with hints!) of most relational databases. Instead, developers can interact with the “low-level” queries if needed and can add in their own low-level functionality such as indexed access or hash-based joins that are missing from Map-Reduce platforms.
Original title and link: Jaql: Query Language for JSON in IBM InfoSphere BigInsights (NoSQL database©myNoSQL)
Sunday, 22 May 2011
IBM Hadoop Commitment
The company also cemented its commitment to the Hadoop open source data analytics tool, identifying it as “the cornerstone of [IBM’s] big data strategy” in a statement.
IBM is the latest in a line of enterprises to stress their commitment to Hadoop. Enterprise storage vendor EMC put a tweaked Hadoop distribution at the heart of a recently updated range of data analytics Greenplum appliances, while business intelligence company Jaspersoft announced plans to better integrate its products with Hadoop in February.
Sometimes I don’t get the meaning of the words commitment and investment. But this makes me believe others are having the same understanding problem.
Original title and link: IBM Hadoop Commitment (NoSQL databases © myNoSQL)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling