ibm: All content tagged as ibm in NoSQL databases and polyglot persistence
From IBM to… IBM: The short, but complicated history of CouchDB, Cloudant, and a lot of other companies and projects
Damien Katz created CouchDB after working at IBM on Lotus Notes: CouchDB and Me. CouchDB went the Apache way. Then things got complicated…
On the West coast, Damien Katz and a team of committers created Couchio, later renamed to CouchOne, later merged with Membase to become Couchbase, which finally dropped CouchDB. Damien Katz left Couchbase.
East Coast, Cloudant took CouchDB and made it BigCouch. I thought that Cloudant will be the CouchDB company — and in a way it was. Cloudant put BigCouch on the cloud as a service and on GitHub as open source. BigCouch is supposed to get back into Apache CouchDB, but many months later this hasn’t materialized yet.
To complete the circle, today IBM announced signing an agreement to acquire Cloudant — news coverage on GigaOm, BostInno, TechCrunch. Which probably makes sense considering Cloudant’s relationship with SoftLayer and IBM’s $1 billion Platform-as-a-Service Investment, but less so if you consider the IBM and
Anyways, the future of Apache CouchDB is bright. Yep.
Original title and link: From IBM to… IBM: The short, but complicated history of CouchDB, Cloudant, and a lot of other companies and projects ( ©myNoSQL)
For the weekend reads, a paper authored by a reseach team from IBM:
Main Memory Map Reduce (M3R) is a new implementation of the Hadoop Map Reduce (HMR) API targeted at online analytics on high mean-time-to-failure clusters. It does not support resilience, and supports only those workloads which can fit into cluster memory. In return, it can run HMR jobs unchanged — including jobs produced by compilers for higher-level languages such as Pig, Jaql, and SystemML and interactive front-ends like IBM BigSheets — while providing significantly better performance than the Hadoop engine on several workloads (e.g. 45x on some input sizes for sparse matrix vector multiply). M3R also supports extensions to the HMR API which can enable Map Reduce jobs to run faster on the M3R engine, while not affecting their perfor- mance under the Hadoop engine.
The short answer is there is only one Apache Hadoop distribution.
The long answer is that there are many distributions that include Apache Hadoop or are claiming compatibility with Apache Hadoop.
The oldest and probably most popular: Cloudera’s Distribution of Hadoop (CDH)
The 100% open source: Hortonworks Data Platform.
The prioprietary: MapR.
The blue one: IBM InfoSphere BigInsights.
There’s also the version Facebook’s running on their cluster which includes Facebook Corona: a different approach to job scheduling and resource management.
But this list is not complete as it doesn’t include appliances featuring Hadoop. In this category we have:
- Oracle’s Big Data appliance featuring Cloudera’s Distribution of Hadoop
- Netapp’s Hadooplers
- EMC Greenplum DCA
- Teradata Aster Discovery Platform featuring Hortonworks’s Hadoop Data Platform
- Data Direct Networks (DDN)
I hope I didn’t miss any important ones1. As a conclusion for this list, my question is: who is actually benefiting from all these distributions?
I left aside for now Hadoop-as-a-Service. ↩
Original title and link: How Many Hadoops? ( ©myNoSQL)
A new motto could be “An appliance for every vertical”. IBM Netezza’s first is for retailers.
Original title and link: IBM Debuts Netezza Customer Intelligence Appliance ( ©myNoSQL)
My list of 8 most interesting companies for the future of Hadoop didn’t try to include anyone having a product with the Hadoop word in it. But the list from InformationWeek does. To save you 15 clicks, here’s their list:
- Amazon Elastic MapReduce
- EMC (with EMC Greenplum Unified Analytics Platform and EMC Data Computing Appliance)
- IBM (InfoSphere BigInsights)
- Informatica (for HParser)
Original title and link: 12 Hadoop Vendors to Watch in 2012 ( ©myNoSQL)
Just a quick recap:
- Cloudera: Oracle, Dell, NetApp
- Hortonworks: Microsoft
- MapR: EMC (integration with Greenplum HD)
Amazon doesn’t partner with anyone for their Amazon Elastic Map Reduce. And IBM is walking alone with the software-only InfoSphere BigInsights.
Original title and link: Partnerships in the Hadoop Market ( ©myNoSQL)
In 2005, Tim O’Reilly said: “data is the next Intel Inside“. Today IDC Mario Morales (VP of semiconductor research) says data is the new currency. All’s good until you read the continuation:
And the companies that understand this are the ones already developing the analytics and infrastructure to extract that value—companies like IBM, HP, Intel, Microsoft, TI, Freescale and Oracle.
The article (nb: may require registration) continues by looking at what each of these companies are doing in the Big Data space, but focuses a large part on IBM Watson.
Going back to the question “who’s leading the Big Data way“, let’s take a quick look at the technology behind Watson. According to Jeopardy Goes to Hadoop and About Watson, Watson technology is based on Apache Hadoop, using an IBM language technology built on the Apache UIMA platform and running Linux on IBM boxes.
To me it looks like open source is leading the advances in Big Data and these large organizations are just connecting the dots (as in packaging these technologies for enterprise environments and contributing missing pieces here and there). When did this happen before?
Original title and link: Data Is the New Currency. But Who’s Leading the Way? ( ©myNoSQL)