ibm: All content tagged as ibm in NoSQL databases and polyglot persistence
For the weekend reads, a paper authored by a reseach team from IBM:
Main Memory Map Reduce (M3R) is a new implementation of the Hadoop Map Reduce (HMR) API targeted at online analytics on high mean-time-to-failure clusters. It does not support resilience, and supports only those workloads which can fit into cluster memory. In return, it can run HMR jobs unchanged — including jobs produced by compilers for higher-level languages such as Pig, Jaql, and SystemML and interactive front-ends like IBM BigSheets — while providing significantly better performance than the Hadoop engine on several workloads (e.g. 45x on some input sizes for sparse matrix vector multiply). M3R also supports extensions to the HMR API which can enable Map Reduce jobs to run faster on the M3R engine, while not affecting their perfor- mance under the Hadoop engine.
The short answer is there is only one Apache Hadoop distribution.
The long answer is that there are many distributions that include Apache Hadoop or are claiming compatibility with Apache Hadoop.
The oldest and probably most popular: Cloudera’s Distribution of Hadoop (CDH)
The 100% open source: Hortonworks Data Platform.
The prioprietary: MapR.
The blue one: IBM InfoSphere BigInsights.
There’s also the version Facebook’s running on their cluster which includes Facebook Corona: a different approach to job scheduling and resource management.
But this list is not complete as it doesn’t include appliances featuring Hadoop. In this category we have:
- Oracle’s Big Data appliance featuring Cloudera’s Distribution of Hadoop
- Netapp’s Hadooplers
- EMC Greenplum DCA
- Teradata Aster Discovery Platform featuring Hortonworks’s Hadoop Data Platform
- Data Direct Networks (DDN)
I hope I didn’t miss any important ones1. As a conclusion for this list, my question is: who is actually benefiting from all these distributions?
I left aside for now Hadoop-as-a-Service. ↩
Original title and link: How Many Hadoops? ( ©myNoSQL)
A new motto could be “An appliance for every vertical”. IBM Netezza’s first is for retailers.
Original title and link: IBM Debuts Netezza Customer Intelligence Appliance ( ©myNoSQL)
My list of 8 most interesting companies for the future of Hadoop didn’t try to include anyone having a product with the Hadoop word in it. But the list from InformationWeek does. To save you 15 clicks, here’s their list:
- Amazon Elastic MapReduce
- EMC (with EMC Greenplum Unified Analytics Platform and EMC Data Computing Appliance)
- IBM (InfoSphere BigInsights)
- Informatica (for HParser)
Original title and link: 12 Hadoop Vendors to Watch in 2012 ( ©myNoSQL)
Just a quick recap:
- Cloudera: Oracle, Dell, NetApp
- Hortonworks: Microsoft
- MapR: EMC (integration with Greenplum HD)
Amazon doesn’t partner with anyone for their Amazon Elastic Map Reduce. And IBM is walking alone with the software-only InfoSphere BigInsights.
Original title and link: Partnerships in the Hadoop Market ( ©myNoSQL)
In 2005, Tim O’Reilly said: “data is the next Intel Inside“. Today IDC Mario Morales (VP of semiconductor research) says data is the new currency. All’s good until you read the continuation:
And the companies that understand this are the ones already developing the analytics and infrastructure to extract that value—companies like IBM, HP, Intel, Microsoft, TI, Freescale and Oracle.
The article (nb: may require registration) continues by looking at what each of these companies are doing in the Big Data space, but focuses a large part on IBM Watson.
Going back to the question “who’s leading the Big Data way“, let’s take a quick look at the technology behind Watson. According to Jeopardy Goes to Hadoop and About Watson, Watson technology is based on Apache Hadoop, using an IBM language technology built on the Apache UIMA platform and running Linux on IBM boxes.
To me it looks like open source is leading the advances in Big Data and these large organizations are just connecting the dots (as in packaging these technologies for enterprise environments and contributing missing pieces here and there). When did this happen before?
Original title and link: Data Is the New Currency. But Who’s Leading the Way? ( ©myNoSQL)