NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Hadoop/HBase Capacity Planning

After some Hadoop hardware recommendations and using Amdhal’s law for Hadoop provisioning, Cloudera shares its know-how on Hadoop/HBase capacity planning covering aspects like network, memory, disk, and CPU:

Since we are talking about data, the first crucial parameter is how much disk space we need on all of the Hadoop nodes to store all of your data and what compression algorithm you are going to use to store the data. For the MapReduce components an important consideration is how much computational power you need to process the data and whether the jobs you are going to run on the cluster is CPU or I/O intensive. […] Finally, HBase is mainly memory driven and we need to consider the data access pattern in your application and how much memory you need so that the HBase nodes do not swap the data too often to the disk. Most of the written data end up in memstores before they finally end up on disk, so you should plan for more memory in write-intensive workloads like web crawling.

Hadoop/HBase capacity planning

Original title and link for this post: Hadoop/HBase Capacity Planning (published on the NoSQL blog: myNoSQL)