NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Applying Amdhal's Law to Hadoop Provisioning

Applying ☞ Amdahl’s law (or in this case a self-deduced variant) to Hadoop provisioning might give you some good answers for questions like:

Credits to Wikipedia

  • Why doesn’t the speed of my workflow double when I double the amount of processing power?
  • Why does a 10% failure rate cause my runtime to go up by 300%?
  • How does optimizing out 30% of my workflow runtime cause the runtime to decrease by 80%?
  • How many machines should I have in my cluster to be adequately performant and fault-tolerant?

The last scenario we’ve read about in which Hadoop is used is Digg’s data migration from MySQL to Cassandra. At that moment I was wondering why they weren’t using less complex solutions for this migration (f.e. Scribe). Arin has been kind enough to provide an explanation:

we use Hadoop for legacy data and/or when the data needs to go through “big” transformation or denormalization.