It’s just a couple of days since quoting a GigaOm article mentioning Hadoop limitations (nb: I still think that article was meant for advertising some products). Anyway, it looks like there’s some research done at Purdue university to add transactional support in MapReduce:
MapReduce has emerged as a popular programming model for large-scale distributed computing. Its framework enforces strict synchronization between successive map and reduce phases and limited data-sharing within a phase. Use of key-value based persistent storage with MapReduce presents intriguing opportunities and challenges. These challenges relate primarily to semantic inconsistencies arising from the different fault-tolerant mechanisms employed by the execution environment and the underlying storage medium. We define formal transactional semantics for MapReduce over reliable key-value stores. With minimal performance overhead and no increase in program complexity, our solutions support broad classes of distributed applications hitherto infeasible in MapReduce.
Specifically, this paper (i) motivates the use of key-value stores as the underlying storage for MapReduce, (ii) defines transactional semantics for MapReduce to address any inconsistencies, (iii) demonstrates broader application scope enabled by data shar ing within and across jobs, and (iv) presents a detailed evaluation demonstrating the low overhead of our proposed semantics.
You can find the paper ☞ here.
Original title and link: Transactional Support in MapReduce for Speculative Parallelism (NoSQL databases © myNoSQL)