Running Ruby Map/Reduce with Apache Hadoop
Here I demonstrate, with repeatable steps, how to fire-up a Hadoop cluster on Amazon EC2, load data onto the HDFS (Hadoop Distributed File-System), write map-reduce scripts in Ruby and use them to run a map-reduce job on your Hadoop cluster. You will not need to ssh into the cluster, as all tasks are run from your local machine.
Overly simplified:
- use Cloudera’s distribution for Apache Hadoop
- build, configure and use Whirr scripts to setup the Hadoop cluster on Amazon EC2
- connect from your laptop to the cluster using a SOCKS proxy
- check Hadoop and HDFS health status
- setup the local Hadoop client
- upload data to HDFS
- code map/reduce tasks using Ruby
- run, check stats, and get results for your Ruby map/reduce tasks
Original title and link: Running Ruby Map/Reduce with Apache Hadoop (NoSQL databases © myNoSQL)
via: http://www.cloudera.com/blog/2011/01/map-reduce-with-ruby-using-apache-hadoop/