NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



MongoDB Tutorial: MapReduce

I don’t consider myself the right person to write detailed tutorials as I usually tend to omit a lot of details . But I’d like to try out a different approach: I’ll share with you the best materials I have found and used myself to learn about a specific feature. Please do let me know if you’ll find this approach useful.

Today will take a look at MongoDB MapReduce. As is normal (at least for making sure that we are getting rid of all future RTFM advice) we will start with the ☞ official documents. In MongoDB MapReduce case, the official documentation will provide us with details about:

  • the complete command syntax
  • specs for map and reduce functions
  • as a bonus a couple of basic examples

There are also a couple of important aspects that you’ll have to keep in mind while implementing your own MongoDB MapReduce functions:

  1. The MapReduce engine may invoke reduce functions iteratively; thus, these functions must be idempotent. That is, the following must hold for your reduce function:

    for all k,vals : reduce( k, [reduce(k,vals)] ) == reduce(k,vals)

  2. Currently, the return value from a reduce function cannot be an array (it’s typically an object or a number).
  3. If you need to perform an operation only once, use a finalize function.

Knowing the basics, what I’ve found to work well for me was to take a look at a simple but close to real life example. In this case I have chosen the ☞ following piece of code which implements a basic text search.

I have also found very useful to take a look at how SQL translates to MapReduce in MongoDB.

Just to make sure that I got things straight by now, I used the 3rd part of Kyle Banker’s MongoDB aggregation tutorial: MapReduce basics.

The last step in learning about MapReduce in MongoDB was to take a look at some real usecases. Depending on your programming language preference, I’d recommend one of these two MongoDB MapReduce usecases:

  • Ruby: Visualizing log files with MongoDB, MapReduce, Ruby & Google Charts: ☞ part 1 and ☞ part 2
  • Perl: Using MongoDB and MapReduce on Apache Access Logs

Summarizing our short tutorial on MongoDB MapReduce:

In case you have other materials on MongoDB MapReduce that you consider essential please share them with us!