ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Pros and Cons of Using MapReduce With Distributed Key-Value Stores: HBase, Cassandra, Riak

Old Quora question with very good answers.

  • (pro) can (potentially) query live data
  • (pro) can (conceptually) be highly efficient at joining data sets that are identically sharded on the join key (the joins can be pushed down into the key-value store itself)
  • (con) full scans (the most common pattern for map-reduce) is most likely to be much faster with raw file system access
  • (con) because of the better decoupling of computation and storage in the GFS+Map-Reduce model - tolerating hot spots (resulting from MR jobs) is much easier
  • (con) key-value stores are rarely arranged to have schemas optimized for analytics

Naoki Yanai

Original title and link: Pros and Cons of Using MapReduce With Distributed Key-Value Stores: HBase, Cassandra, Riak (NoSQL database©myNoSQL)