NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



MapReduce With Hadoop: What Happens During Mapping

An interesting look at what happens during the map phase in Hadoop and the impact of emitting key-value pairs:

  • a direct negative impact on the map time and CPU usage, due to more serialization
  • an indirect negative impact on CPU due to more spilling and additional deserialization in the combine step
  • a direct impact on the map task, due to more intermediate files, which makes the final merge more expensive

Map Reduce Combine

The main point of the dynaTrace blog post is that even if Hadoop makes it easy to throw more hardware at a problem, wasting resources with bad code in MapReduce tasks comes with a noticeable and measurable cost.

Original title and link: MapReduce With Hadoop: What Happens During Mapping (NoSQL database©myNoSQL)