NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Riak Map/Reduce Improvements

Kevin Smith[1] has recently given a presentation about a set of upcoming improvements to the Riak Map/Reduce implementation.

From the current status where:

  • Map phase executes in parallel with data locality
  • Reduce phase executes on the node where the job was submitted
  • Results are not cached or stored

Basho guys are working to improve the behavior for the following 2 issues:

  • Mapping beats up nodes and is inefficient for large buckets => write a real query scheduler that can
    • group keys into batches
    • reduce contention for javascript VMs
    • use replicas for better cluster utilization
  • Querying data is expensive when all you have are map/reduce functions => integrate key filtering operations into the MapReduce pipeline

And for the future there are scheduled more improvements:

  • upgrading the javascript VM
  • distributing the reduce phase
  • allowing external MapReduce processes

While I’ve always mentioned the possible distributed reduce phase improvement, I like even more the ones they are currently working on.

The complete slidesdeck embedded below:

  1. Kevin Smith: Basho Technologies engineer  ()

Original title and link: Riak Map/Reduce Improvements (NoSQL databases © myNoSQL)