NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



CouchDB Built-In Reduce Functions

Via ☞ Mikeal Rogers:

Currently (CouchDB 0.11.0) there are three built-in reduce functions. Built-in reduce functions are performed right inside CouchDB implemented in Erlang. In most cases it is very fast because they are way more efficient.

J. Chris Anderson explains why built-in reduce functions are faster:

The deal is that map function (and intermediate reduce function) output is cached in the view index (as it is generated). So in those cases the function overhead is not an issue.

But in the case of reduce, for any query that has parameters (even a startkey and endkey) the JavaScript function will be executed once (and fed the intermediate cached results), to give the accurate answer to the query. Once is fine and fast enough, but in the case of group=true or group_level=N queries, the JavaScript function is executed once per row of output so that starts to slow things down (in a linear way, related to the # of group rows, not the # of map rows, so it’s still “scalable”)

Using the builtin reduces avoids the interprocess communication overhead of calling the JavaScript function once per row, in a situation where the output cannot be cached.

In the normal operations, the JavaScript is batched and cached, so the effects of the slowness are mitigated.