ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

How to speed up MongoDB Map Reduce by 20x

Antoine Girbal:

Looking back, we’ve started at 1200s and ended at 60s for the same MR job, which represents a 20x improvement! This improvement should be available to most use cases, even if some of the tricks are not ideal (e.g. using multiple output dbs / collections). Nevertheless this can give people ideas on how to speed up their MR jobs and hopefully some of those features will be made easier to use in the future. The following ticket will make ‘splitVector’ command more available, and this ticket will improve multiple MR jobs on the same database.

Looking back at the article, it reads like a series of tricks to go around the limitations of MongoDB’s MapReduce implementation:

  1. a single thread use for MapReduce jobs
  2. lock contention
  3. BSON-to-JSON-and-back serializations

Original title and link: How to speed up MongoDB Map Reduce by 20x (NoSQL database©myNoSQL)

via: http://edgystuff.tumblr.com/post/54709368492/how-to-speed-up-mongodb-map-reduce-by-20x