Mapr: All content tagged as Mapr in NoSQL databases and polyglot persistence
Friday, 24 May 2013
NoSQL and Full Text Indexing: Two Trends
On one side:
- DataStax with Solr
- MapR with LucidWorks Search (nb: Solr)
and on the other side:
- Riak Searching: Solr-like but custom prioprietary implementation
- MongoDB text search: custom prioprietary implementation
I’m not going to argue about the pros and cons of each of these approaches, but I’m sure you already know which of these approaches I’m in favor of.
Original title and link: NoSQL and Full Text Indexing: Two Trends (©myNoSQL)
Tuesday, 2 April 2013
Hadoop and Canonical Bring MapR to Ubuntu
Some announcements from MapR about “MapR and Canonical bringing Hadoop Support to Ubuntu“:
First, MapR is partnering with Canonical, the organization behind the Ubuntu operating system, to package and make available for download an integrated offering of MapR Distibution with Ubuntu. The free MapR M3 Edition includes HBase, Pig, Hive, Mahout, Cascading, Sqoop, Flume and other Hadoop support tools. MapR is the only distribution that enables Linux applications and commands to access data directly in the cluster via the NFS interface that is available with all MapR Editions.
As far as I know, Apache Hadoop works just fine on Ubuntu. And there was already a partnership between Cloudera and Canonical to bring Hadoop to Ubuntu. So, I guess my title might be more accurate.
Original title and link: Hadoop and Canonical Bring MapR to Ubuntu (©myNoSQL)
Monday, 18 March 2013
MapR Raises $30mil in Series C
Where is MapR today?
- MapR raised a total of $59mil.
- According to John Schroeder (CEO) “92% of MapR customers pay primarely for licenses and not for ancillary services and support”.
- According to Wikibon, MapR had $23mil. revenue in 2012, 49% of which coming from services (nb: this seem to contradict the above point)
- Support for MapR installations is offered by Accenture and Booz Allen Hamilton
How will MapR use the new capital?
With the new funding, the company plans to invest in research & development, and expand into Asia.
How is MapR seeing its competitors?
John Schroeder (CEO):
“Our competitors’ model is very cash intensive and you have to wonder whether or not they’ll ever be cash-flow positive”.
Clouder has raised until now $141mil as follow:
- Series A: $5mil
- Series B: $6mil
- Series C: $25mil
- Series D: $40mil
- Series E: $65mil
According to this, Cloudera raised $36mil in the first 3 rounds. I couldn’t find any official data about the capital raised by Hortonworks, but the number I’ve seen in a couple of places is $50mil. So far MapR raised $59mil.
Sources for these bits:
- VentureBeat: MapR gets $30M to push Hadoop deeper into the enterprise
- AllThingsD: MapR Lands $30 Million Series C Led by Mayfield Fund - Arik Hesseldahl - Enterprise - AllThingsD
- CrunchBase: Cloudera | CrunchBase Profile
- Wikibon: Big Data Vendor Revenue And Market Forecast 2012-2017 - Wikibon
Original title and link: MapR Raises $30mil in Series C (©myNoSQL)
How Does MapR Compare to Cloudera?
Staying in the MapR land, the question of comparing MapR to Cloudera is answered by people from all sides (MapR, Cloudera and Hortonworks). My summary: “cool proprietary technology addressing some of the current limitations of the Hadoop, but also missing some of the features the Hadoop community has come up with”.
Original title and link: How Does MapR Compare to Cloudera? (©myNoSQL)
via: http://www.quora.com/How-does-MapR-plan-to-compete-with-Cloudera
Monday, 11 March 2013
Hadoop: What Matters Are Open and Standardized Interfaces
Michael Hausenblas (MapR) about the topic of the day: “Hadoop distributions”, about which I’ve already linked to Steve Loughran’s If There Is a Problem in the Hadoop JARs, How Are You Going to Fix It?, Merv Adrian’s Open Source “Purity”, Hadoop, and Market Realities and Matthew Aslett’s What It Means to Be “all In” on Hadoop:
One aspect I’d like to highlight is the importance of ‘standard’ interfaces, defined through community consensus, and enforced by the Apaches and the likes.I think it makes perfect sense to offer a commercial implementation that is superior to the implementation you get ‘for free’ — as long as you’re 100% compatible with the community-defined standard.
Here’s something I don’t understand about the above. The “Defining Hadoop wiki page” dedicates a complete paragraph to compatibility. The most important and relevant part of it is:
Other entities may claim that other products (including derivative works) are compatible with Apache Hadoop. The Apache Hadoop development team is not a standards body, and cannot confirm or deny such assertions. All that we can say is “there is no official certification that a product is compatible with Hadoop, other than when a release of the Apache source tree is declared a new release of Apache Hadoop itself”.
Going back to MapR’s post my question is: if the Apache Hadoop project doesn’t offer a certification toolkit and the project team doesn’t validate the compatibility, what exactly does it mean to be “100% compatible” with something that can change any time and is completely out of your control?
Original title and link: Hadoop: What Matters Are Open and Standardized Interfaces (©myNoSQL)
via: http://www.mapr.com/blog/hadoop-what-matters-are-open-and-standardized-interfaces
Monday, 4 March 2013
How Many Hadoops?
The short answer is there is only one Apache Hadoop distribution.
The long answer is that there are many distributions that include Apache Hadoop or are claiming compatibility with Apache Hadoop.
The oldest and probably most popular: Cloudera’s Distribution of Hadoop (CDH)
The 100% open source: Hortonworks Data Platform.
The prioprietary: MapR.
The blue one: IBM InfoSphere BigInsights.
The latest: WANdisco Hadoop WDD, Intel Distribution of Hadoop and Pivotal HD from EMC Greenplum.
There’s also the version Facebook’s running on their cluster which includes Facebook Corona: a different approach to job scheduling and resource management.
But this list is not complete as it doesn’t include appliances featuring Hadoop. In this category we have:
- Oracle’s Big Data appliance featuring Cloudera’s Distribution of Hadoop
- Netapp’s Hadooplers
- EMC Greenplum DCA
- Teradata Aster Discovery Platform featuring Hortonworks’s Hadoop Data Platform
- Data Direct Networks (DDN)
I hope I didn’t miss any important ones1. As a conclusion for this list, my question is: who is actually benefiting from all these distributions?
-
I left aside for now Hadoop-as-a-Service. ↩
Original title and link: How Many Hadoops? (©myNoSQL)
Monday, 21 January 2013
Hadoop Business Ecosystem as of January 2013
As I was hoping and expecting, Datameer updated the chart visualizing Hadoop’s business side ecosystem:
It shouldn’t be a surprise to anyone that the top most connected companies in the Hadoop space are Cloudera and Hortonworks. They outrank the IT industry mammoths: IBM, HP, Microsoft, Oracle, SAP, etc.
Original title and link: Hadoop Business Ecosystem as of January 2013 (©myNoSQL)
via: http://www.datameer.com/blog/perspectives/hadoop-ecosystem-as-of-january-2013-now-an-app.html
Monday, 23 July 2012
MapR’s New Partnership With Drawn to Scale
MapR is definitely up to some interesting partnerships. Last year it announced a partnership with EMC for Greenplum HD Enterprise Edition, then this year MapR became available on Amazon Elastic MapReduce and Google Compute Engine. And today MapR and Drawn to Scale, creator of the real-time database for Hadoop Spire, are announcing a new partnership.
Bradford Stephens (CEO, Drawn to Scale):
MapR provides the fastest, most reliable Hadoop for our customers. We are thrilled to work with MapR to deliver M3 as part of Spire as the first real-time database for Hadoop.
Jack Norris (VP of marketing, MapR Technologies):
Real-time SQL on Hadoop is a big gap in the market that is addressed by Spire. Spire is a complementary solution to our products and it made sense to work with Drawn to Scale to make it easier for customers to deploy M3, pre-integrated with Spire, for real-time SQL-based workloads.
It might sound strange coming from me, but MapR is making quite some big steps towards becoming the de facto standard for Hadoop. I’m looking forward to seeing the reactions from Cloudera and Hortonworks.
Original title and link: MapR’s New Partnership With Drawn to Scale (©myNoSQL)
Friday, 13 July 2012
MapR Claims Title as De Facto Standard for Hadoop
Maureen O’Gara:
The champagne has been flowing over at MapR since Google announced the integration of its Distribution for Hadoop with Google Compute Engine, the start-up’s second big win in a row.
Indeed, MapR on Amazon Elastic MapReduce and Google Compute Engine are two very important events in the life of MapR and for the Hadoop ecosystem in general. But there’s still a long way from these to being a de facto standard.
Original title and link: MapR Claims Title as De Facto Standard for Hadoop (©myNoSQL)
Cloudera or MapR for Hadoop Distribution?
A couple of links covering various aspects of this question:
- Quora thread covering this subject
- Joe Stein’s Hadoop distribution bake-off and my experience with Cloudera and MapR
- How I’d choose a Hadoop distribution
- MapR claims title as de facto standard for Hadoop
If you have other good references answering the question of what Hadoop distribution to choose please leave a comment.
Original title and link: Cloudera or MapR for Hadoop Distribution? (©myNoSQL)
Wednesday, 11 July 2012
The Hadoop Ecosystem Relationships
Excellent infographic about the relationships in the Hadoop market created with Datameer:
A while ago I’ve created a Google Spreadsheet in which I’ve tried to track all these relationships, but going through PR announcements wasn’t really my thing. Now there’s a CSV file with all this data.
Original title and link: The Hadoop Ecosystem Relationships (©myNoSQL)
via: http://www.cloudera.com/blog/2012/07/the-hadoop-ecosystem-visualized-in-datameer/
Friday, 15 June 2012
MapR Hadoop Distribution on Amazon Elastic MapReduce
Another very interesting news for the Hadoop space, this time coming from Amazon and MapR announcing support for the MapR Hadoop distribution on Amazon Elastic MapReduce:
MapR introduces enterprise-focused features for Hadoop such as high availability, data snapshotting, cluster mirroring across AZs, and NFS mounts. Combined with Amazon Elastic MapReduce’s managed Hadoop environment, seamless integration with other AWS services, and hourly pricing with no upfront fees or long-term commitments, Amazon EMR with the MapR Distribution for Hadoop offers customers a powerful tool for generating insights from their data.
Following the logic of the Amazon Relational Database Services which started with MySQL, the most popular and open source database and then added support for the commercial, but also very popular Oracle and SQL Server, what does this announcement tell us? It’s either that Amazon has got a lot of requests for MapR or that some very big AWS customers have mentioned MapR in their talks with Amazon. I go with the second option.
Original title and link: MapR Hadoop Distribution on Amazon Elastic MapReduce (©myNoSQL)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling

