ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Hadoop in the Cloud: Pros and Cons

Steve Loughran covering the pro and con arguments of running Hadoop in a cloud environment:

  1. If your data is stored in a cloud provider’s storage infrastructure, doing the analysis locally is the only rational action. It’s that “work near the data” philosophy.
  2. If you are only doing some computation -say nightly- then you can rent some cluster time. Even if compute performance is worse, you can just rent some more machines to compensate.
  3. You may be able to achieve better security through isolation of clusters (depends on your IaaS vendor’s abilities).
  4. No upfront capex; fund from ongoing revenue.
  5. Easier to expand your cluster; no need to buy more racks, find more rack space.
  6. You don’t need to care about the problems of networking.
  7. Less of a problem of heterogenous clusters if you expand later.

Interestingly the list of counter-arguments is much shorter and the important bit, further detailed in the post, is: “Hadoop contains lots of assumptions about running in a static infrastructure; it’s scheduling and recovery algorithms assume this.”

Original title and link: Hadoop in the Cloud: Pros and Cons (NoSQL database©myNoSQL)

via: http://steveloughran.blogspot.com/2012/03/hadoop-in-cloud-infrastructures.html