ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

6 Ways to Run Hadoop

Tony Baer lists 6 models for running Hadoop:

  1. Go to a cloud service provider that has already created the infrastructure, such as what Microsoft is offering with its Hadoop-on-Azure services;
  2. Look for a happy, simpler medium such as Amazon’s Elastic MapReduce on its DynamoDB service;
  3. Subscribe to SaaS providers that offer Hadoop applications (e.g., social network analysis, smart grid as a service) as a service; Pioneers or purists would scoff at the notion of an appliance approach because it was always simply scaling out inexpensive, commodity hardware, rather than paying premiums for big vendor boxes.
  4. Get a platform and have a systems integrator put it together for you (key to IBM’s BigInsights offering, and applicable to any SI that has a Hadoop practice)
  5. Go to an appliance or engineered systems approach that puts Hadoop and/or its subsystems in a box, such as with Oracle Big Data Appliance or EMC’s Greenplum DCA. The systems engineering is mostly done for you, but the increments for growing the system can be much larger than simply adding a few x86 servers here or there (Greenplum HD DCA can scale in groups of 4 server modules). Entry or expansion costs are not necessarily cheap, but then again, you have to balance capital cost against labor.
  6. Surrounding Hadoop infrastructure with solutions. This is not a mutually exclusive strategy; unless you’re Cloudera or Hortonworks, which make their business bundling and supporting the core Apache Hadoop platform, most of the household names will bundle frameworks, algorithms, and eventually solutions that in effect place Hadoop under the hood. For EMC, the strategy is their recent announcement of a Unified Analytics Platform (UAP) that provides collaborative development capabilities for Big Data applications. EMC is (or will be) hardly alone here.

For me, I’d collapse 1 and 2 as I don’t see any major differences between these platforms. And I’d probably move the 3rd option at the end as it’s covering just specific domains.

Original title and link: 6 Ways to Run Hadoop (NoSQL database©myNoSQL)

via: http://www.nearshorejournal.com/2012/07/emcs-hadoop-strategy-cuts-to-the-chase/