NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



CouchDB and Varnish Caching - Why it Does Not Work

It was said many times that thanks to the fact that CouchDB is HTTP uberfriendly, you could use web tools to get it to scale. One such tool that was mentioned is the web cache ☞ Varnish. Mathias Meyer pointed out in the CouchDB post 1.0 roadmap that the lack of caching in CouchDB (basically all reads go to disk) should be addressed sooner than later.

Recently there was a conversation on the ☞ CouchDB mailing list concluding that Varnish cannot really help caching CouchDB results:

As I understand it now, the only way how to cache Couch’s response would be with time-based caching, and either using the cached response until it auto-expires, or expire the cached response via PURGE commands. Of course, it would be possible and technically trivial to send purge requests via the -changes feed or via the “update-notification” mechanism. As I see it, the tricky part would be to know which objects to purge, based on individual document changes. Because not only single documents, but also aggregated view results or fulltext queries would get cached.

Summarizing it, CouchDB and Varnish don’t work perfectly because:

  1. Varnish does not pass ETags tot he backend
  2. Even if Varnish would pass ETags in a GET to validate, CouchDB would go to the disk to read the document
  3. Even if Varnish would pass ETags in a HEAD to validate, CouchDB would still have to hit the disk for the document

So I guess the conclusion is that CouchDB must add caching.

Update: As pointed out by @karmiq it is the ETag based caching that doesn’t work, while expiration-based caching is OK (as long as dealing with possibly stale data is OK).

Update 2: Jan Lehnardt @janl pointed me to a JIRA issue ☞ Caching of BTree nodes and documents that was closed as won’t fix as the performance improvements were not significant. Personally I think that’s only a problem with that particular solution, as I’m pretty sure that not having to reach the disk for each read would yield better results (nb: see update 4)

Update 3: The Cost of I/O

Update 4: Conversation with @janl continued and we agreed that probably the best formulation would be that keeping in mind kernel and filesystem caches, crafted caches with knowledge of data layout may yield better results, but there’re no guarantees

Original title and link: CouchDB and Varnish Caching - Why it Does Not Work (NoSQL databases © myNoSQL)