IBM: All content tagged as IBM in NoSQL databases and polyglot persistence
Wednesday, 10 April 2013
IBM Accelerates Its Big Data Portfolio
Jeff Kelly takes a look at IBM’s data solutions portfolio:
IBM has the broadest and deepest Big Data product and services portfolio in the industry, as well as the market leading revenue to show for it. But IBM’s greatest asset also lies at the heart of its biggest challenge. With such a diverse set of Big Data capabilities, IBM has struggled to unify them into distinct, compelling offerings. How IBM responds to the challenge of bringing together such a broad and deep set of technologies and services - many the result of $16 billion worth of analytics-related acquisitions since 2005 - into consumable and effective product offerings will largely determine the company’s success (or failure) in the Big Data space and will have major implications for enterprise CIOs.
There are two things that I’m not sure I understand:
-
is it a known strategy leading to more sales to have a confusing portfolio of products?
Basically you offer so many products that a customer will be so confused that he’ll have to hire your consultant to make the buying
recommendationdecision. -
when ranking companies by sales, wouldn’t make more sense to compare revenue/employee than raw numbers?
Which company is better? A company with 2 sales people generating $1mil in revenue or a company with 100 sales people and 100 consultants generating $20mil?
Original title and link: IBM Accelerates Its Big Data Portfolio (©myNoSQL)
via: http://wikibon.org/wiki/v/IBM_Accelerates_Its_Big_Data_Portfolio
Friday, 15 March 2013
Paper: M3R - Increased Performance for In-Memory Hadoop Jobs
For the weekend reads, a paper authored by a reseach team from IBM:
Main Memory Map Reduce (M3R) is a new implementation of the Hadoop Map Reduce (HMR) API targeted at online analytics on high mean-time-to-failure clusters. It does not support resilience, and supports only those workloads which can fit into cluster memory. In return, it can run HMR jobs unchanged — including jobs produced by compilers for higher-level languages such as Pig, Jaql, and SystemML and interactive front-ends like IBM BigSheets — while providing significantly better performance than the Hadoop engine on several workloads (e.g. 45x on some input sizes for sparse matrix vector multiply). M3R also supports extensions to the HMR API which can enable Map Reduce jobs to run faster on the M3R engine, while not affecting their perfor- mance under the Hadoop engine.
Monday, 4 March 2013
How Many Hadoops?
The short answer is there is only one Apache Hadoop distribution.
The long answer is that there are many distributions that include Apache Hadoop or are claiming compatibility with Apache Hadoop.
The oldest and probably most popular: Cloudera’s Distribution of Hadoop (CDH)
The 100% open source: Hortonworks Data Platform.
The prioprietary: MapR.
The blue one: IBM InfoSphere BigInsights.
The latest: WANdisco Hadoop WDD, Intel Distribution of Hadoop and Pivotal HD from EMC Greenplum.
There’s also the version Facebook’s running on their cluster which includes Facebook Corona: a different approach to job scheduling and resource management.
But this list is not complete as it doesn’t include appliances featuring Hadoop. In this category we have:
- Oracle’s Big Data appliance featuring Cloudera’s Distribution of Hadoop
- Netapp’s Hadooplers
- EMC Greenplum DCA
- Teradata Aster Discovery Platform featuring Hortonworks’s Hadoop Data Platform
- Data Direct Networks (DDN)
I hope I didn’t miss any important ones1. As a conclusion for this list, my question is: who is actually benefiting from all these distributions?
-
I left aside for now Hadoop-as-a-Service. ↩
Original title and link: How Many Hadoops? (©myNoSQL)
Thursday, 5 April 2012
The Three Pillars of Data-Based Computing: SQL, Hadoop And
IBM’s Arvind Krishna in an interview for The Register:
Krishna said he sees the potential for three pillars of data-based computing: SQL – to give a language and syntax for programming; Hadoop – to provide a MapReduce semantic; and a third pillar which is yet to be decided upon. That could be a MongoDB or HBase, but the market will pick a winner. “There’s a whole set: one will survive,” Krishna said.
I’m pretty sure that last part (i.e. “that could be MongoDB or HBase”) is a mis-quote as the rest of what Krishna is saying makes a lot of sense:
“Wherever open source is mature I will leverage it; I won’t compete with it. To believe one can be monolithic, proprietary and closed and … succeed is a foolish proposition. One has to embrace open source and work with an ecosystem. Clients are looking to you to add value.”
Original title and link: The Three Pillars of Data-Based Computing: SQL, Hadoop And (©myNoSQL)
via: http://www.theregister.co.uk/2012/04/05/ibm_arvind_krishna/
Wednesday, 21 March 2012
IBM: Behind the Buzz About NoSQL
Mature database management systems like DB2 also offer advantages like high availability and data compression that the newer NoSQL systems have not had time to develop.
Misinform your customers to save them the trouble of discovering alternative solutions.
Original title and link: IBM: Behind the Buzz About NoSQL (©myNoSQL)
via: http://ibmdatamag.com/2012/03/behind-the-buzz-about-nosql/
Friday, 16 March 2012
Netezza Query History Table
Using Netezza’s in-database analytics package FPGROWTH, database administrators can identify the most commonly used combination of tables and the performance of the queries that reference those sets of tables.
Nice feature. Sort of the rich men’s all-included slow query log in MySQL. Do you know if other databases support a similar feature?
Original title and link: Netezza Query History Table (©myNoSQL)
Wednesday, 1 February 2012
IBM Debuts Netezza Customer Intelligence Appliance
A new motto could be “An appliance for every vertical”. IBM Netezza’s first is for retailers.
Original title and link: IBM Debuts Netezza Customer Intelligence Appliance (©myNoSQL)
Wednesday, 25 January 2012
12 Hadoop Vendors to Watch in 2012
My list of 8 most interesting companies for the future of Hadoop didn’t try to include anyone having a product with the Hadoop word in it. But the list from InformationWeek does. To save you 15 clicks, here’s their list:
- Amazon Elastic MapReduce
- Cloudera
- Datameer
- EMC (with EMC Greenplum Unified Analytics Platform and EMC Data Computing Appliance)
- Hadapt
- Hortonworks
- IBM (InfoSphere BigInsights)
- Informatica (for HParser)
- Karmasphere
- MapR
- Microsoft
- Oracle
Original title and link: 12 Hadoop Vendors to Watch in 2012 (©myNoSQL)
Tuesday, 10 January 2012
Partnerships in the Hadoop Market
Just a quick recap:
- Cloudera: Oracle, Dell, NetApp
- Hortonworks: Microsoft
- MapR: EMC (integration with Greenplum HD)
Amazon doesn’t partner with anyone for their Amazon Elastic Map Reduce. And IBM is walking alone with the software-only InfoSphere BigInsights.
Original title and link: Partnerships in the Hadoop Market (©myNoSQL)
Wednesday, 30 November 2011
Data Is the New Currency. But Who’s Leading the Way?
In 2005, Tim O’Reilly said: “data is the next Intel Inside“. Today IDC Mario Morales (VP of semiconductor research) says data is the new currency. All’s good until you read the continuation:
And the companies that understand this are the ones already developing the analytics and infrastructure to extract that value—companies like IBM, HP, Intel, Microsoft, TI, Freescale and Oracle.
The article (nb: may require registration) continues by looking at what each of these companies are doing in the Big Data space, but focuses a large part on IBM Watson.
Going back to the question “who’s leading the Big Data way“, let’s take a quick look at the technology behind Watson. According to Jeopardy Goes to Hadoop and About Watson, Watson technology is based on Apache Hadoop, using an IBM language technology built on the Apache UIMA platform[1] and running Linux on IBM boxes.
To me it looks like open source is leading the advances in Big Data and these large organizations are just connecting the dots (as in packaging these technologies for enterprise environments and contributing missing pieces here and there)[2]. When did this happen before?
-
Dmitriy Ryaboy taught me that UIMA came out of IBM in the first place and they’ve been critical in its development. ↩
-
Or they are very secretive about their internal initiatives and research. ↩
Original title and link: Data Is the New Currency. But Who’s Leading the Way? (©myNoSQL)
Monday, 28 November 2011
GPU-Accelerated Databases
Wolfgang Gruener reporting on a new patent filed by IBM:
Instead of traditional disk-based queries and an approach that slows performance via memory latencies and processors waiting for data to be fetched from the memory, IBM envisions in-GPU-memory tables as technology that could, in addition to disk tables, significantly accelerate database processing. According to a patent filed by the company, “GPU enabled programs are well suited to problems that involve data-parallel computations where the same program is executed on different data with high arithmetic intensity.”

Amazon has made a move in the GPU-world by offering Cluster GPU instances which can be used for quite a few interesting scenarios.
Original title and link: GPU-Accelerated Databases (©myNoSQL)
via: http://www.tomshardware.com/news/ibm-patent-gpu-accelerated-database-cuda,13866.html
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling