Hadoop and Elastic MapReduce at Yelp

A story of using Hadoop at Yelp and migrating it to Amazon Elastic MapReduce:

We used to do what a lot of companies do, which is run a Hadoop cluster. We had a dozen or so machines that we otherwise would have gotten rid of, and whenever we pushed our code to our webservers, we’d push it to the Hadoop machines.

It was also not so cool. You couldn’t really tell if a job was going to work at all until you pushed it to production. But the worst part was, most of the time our cluster would sit idle, and then every once in a while, a really beefy job would come along and tie up all of our nodes, and all the other jobs would have to wait.

Yelp has released their Python library for running MapReduce jobs on Hadoop or Amazon Elastic MapReduce on ☞ GitHub.

