ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Automating Hadoop/HBase deployments with Puppet

The guys from the Adobe SaaS team — same guys that shared with us their experience and reasons for using HBase — have ☞ open sourced their Puppet[1] recipes for automating Hadoop/HBase deployments.

Right now we are open-sourcing on GitHub, Puppet recipes for:

  • creating the user under which the entire hstack runs.
  • changing system settings, like the ssh keys, authorizing machines to talk to each other, aliases for hadoop and hbase executables, /tmp rules.
  • standalone puppet module to deploy Hadoop
  • standalone puppet module to configure the Hadoop NameNode in High-Availability mode via DRBD, heartbeat and mon. For more details on this recipe check out the cloudera blog post on this topic.
  • standalone puppet module to deploy HBase
  • standalone puppet module to deploy Zookeeper.

Their ☞ announcement gives a lot of details of why they created these recipes and how to use them (nb it would be excellent if the ☞ GitHub project would point back to the article as part of the documentation).

Just to get an idea of how complex this process can be you can check the HBase/Hadoop MacOS Installation Guide, so I’d say that these recipes will definitely make things a lot easier!

References

  • [1] ☞ Puppet: the leading open source tool for data center automation. Puppet helps you save time, gain visibility into your server environment, and ensure consistency across your IT infrastructure. ()