ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Hadoop on EC2: A Detailed Guide

Lars George, with his very (very) detailed style, explains step by step how to setup Hadoop on EC2.

Let’s jump into it head first and solve the problem of actually launching a cluster. You have heard that Hadoop is shipped with EC2 support, but how do you actually start up a Hadoop cluster on EC2? You do have a couple of choices and as Tom’s article above explains you could start all instances in the cluster by hand. But why would you want to do that if there are scripts available that do all the work for you? And to complicate matters, how do you select the AMI (the Amazon Machine Image) that has the Hadoop version you need or want? Does it have Hive installed for your subsequent analysis of the collected data? Just running a check to count the available public Hadoop images returns 41! That gets daunting very quickly. Sure you can roll your own - but that implies even more manual labor that you probably better spend on productive work. But there is help available..

Lars is using the CDH tools for this tutorial and points out the the ☞ incubating Whirr project.

Original title and link: Hadoop on EC2: A Detailed Guide (NoSQL databases © myNoSQL)

via: http://www.larsgeorge.com/2010/10/hadoop-on-ec2-primer.html