In order to not have to learn everything about setting up Hadoop and still have the ability to leverage the power of Hadoop’s distributed data processing framework and not have to learn how to write map reduce jobs and … (this could go on for a while so I’ll just stop here). For all these reasons, I choose to use Amazon’s Elastic Map infrastructure and Pig.
I will talk you through how I was able to do all this [take my log data stored on S3 (which is in compressed JSON format) and run queries against it] with a little help from the Pig community and a lot of late nights. I will also provide an example Pig script detailing a little about how I deal with my logs (which are admittedly slightly abnormal).
Sadly such an useful tool in the Hadoop ecosystem doesn’t make the headlines.
Original title and link: Pig Latin and JSON on Amazon Elastic Map Reduce (NoSQL databases © myNoSQL)