How to Run a MapReduce Job Against Common Crawl Data Using Amazon Elastic MapReduce
Steve Salevan’s 7 step guide to setting up, compiling, deploying, and running a basic MapReduce job.
When Google unveiled its MapReduce algorithm to the world in an academic paper in 2004, it shook the very foundations of data analysis. By establishing a basic pattern for writing data analysis code that can run in parallel against huge datasets, speedy analysis of data at massive scale finally became a reality, turning many orthodox notions of data analysis on their head.
Google published the paper. Yahoo open sourced this. And Amazon is offering (unlimited) resources.
Update: The Hacker News thread where the main question answered is what other corporations are using MapReduce (besides the Internet companies). The answer is unfortunately extremely short: too many to be able to enumerate them all.
Original title and link: How to Run a MapReduce Job Against Common Crawl Data Using Amazon Elastic MapReduce (©myNoSQL)