Steve Loughran covering the pro and con arguments of running Hadoop in a cloud environment:
- If your data is stored in a cloud provider’s storage infrastructure, doing the analysis locally is the only rational action. It’s that “work near the data” philosophy.
- If you are only doing some computation -say nightly- then you can rent some cluster time. Even if compute performance is worse, you can just rent some more machines to compensate.
- You may be able to achieve better security through isolation of clusters (depends on your IaaS vendor’s abilities).
- No upfront capex; fund from ongoing revenue.
- Easier to expand your cluster; no need to buy more racks, find more rack space.
- You don’t need to care about the problems of networking.
- Less of a problem of heterogenous clusters if you expand later.
Interestingly the list of counter-arguments is much shorter and the important bit, further detailed in the post, is: “Hadoop contains lots of assumptions about running in a static infrastructure; it’s scheduling and recovery algorithms assume this.”
Original title and link: Hadoop in the Cloud: Pros and Cons ( ©myNoSQL)