NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Tuple MapReduce: Beyond the Classic MapReduce

If MapReduce were formulated differently, many problems would be easier to code. The high-level tools that could arise from it would also be easier to code.

This is the motivation that has led us to pose and formulate Tuple MapReduce.

No earlier than today I wrote that I don’t believe in the arguments introducing MapReduce as a complex algorithm. The Map-Reduce model is simple. What is complicated is correctly and efficiently decomposing problems in sequences of Map-Reduce phases.

But if Tuple MapReduce could make it simpler why not?

Now we will show an extended MapReduce model, Tuple MapReduce, which we can formalize as:

Tuple MapReduce model

In this case, the map function processes a tuple as input and emits a certain number of tuples as output. These tuples are made up of “n” fields out of which “s” fields are used to sort and “g” fields are used to group by. This diagram shows how sorting and grouping is done in greater detail:

In the reduce function, for each group, we receive a group tuple with “g” fields and a list of tuples for that group. Finally we’ll emit a certain number of tuples as output.

Generalized? Maybe. Simpler? Neah.

Original title and link: Tuple MapReduce: Beyond the Classic MapReduce (NoSQL database©myNoSQL)