ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Transactional Support in MapReduce for Speculative Parallelism

It’s just a couple of days since quoting a GigaOm article mentioning Hadoop limitations (nb: I still think that article was meant for advertising some products). Anyway, it looks like there’s some research done at Purdue university to add transactional support in MapReduce:

MapReduce has emerged as a popular programming model for large-scale distributed computing. Its framework enforces strict synchronization between successive map and reduce phases and limited data-sharing within a phase. Use of key-value based persistent storage with MapReduce presents intriguing opportunities and challenges. These challenges relate primarily to semantic inconsistencies arising from the different fault-tolerant mechanisms employed by the execution environment and the underlying storage medium. We define formal transactional semantics for MapReduce over reliable key-value stores. With minimal performance overhead and no increase in program complexity, our solutions support broad classes of distributed applications hitherto infeasible in MapReduce.

Specifically, this paper (i) motivates the use of key-value stores as the underlying storage for MapReduce, (ii) defines transactional semantics for MapReduce to address any inconsistencies, (iii) demonstrates broader application scope enabled by data shar ing within and across jobs, and (iv) presents a detailed evaluation demonstrating the low overhead of our proposed semantics.

You can find the paper ☞ here.

Original title and link: Transactional Support in MapReduce for Speculative Parallelism (NoSQL databases © myNoSQL)