NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



varnish: All content tagged as varnish in NoSQL databases and polyglot persistence

Factual API Powered by Node.js and Redis

Continuing my search for non trivial node.js + NoSQL database application, here’s Factual stack for serving their API:

Factual API Stack

Factual architectural components:

  • Varnish
  • HAProxy
  • Node.js
  • Redis
  • Solr

Why Node.js?

We chose Node because of three F’s: it’s fast, flexible, and familiar. In particular, the flexibility is  what allowed us to use our Node layer to handle things like caching logic and load balancing, in addition to the aforementioned authentication and authorization. To make our Node layer scalable, we use multiple instances of Node tied together with Redis to keep things in sync.

Also worth mentioning is that data served through Factual API is always JSON, so having a server side JavaScript engine alsa takes reduces the need for converting data to different formats.

Original title and link: Factual API Powered by Node.js and Redis (NoSQL database©myNoSQL)


Varnish Cache + Riak

When combined with the caching proxy, the high performance of the Riak cluster itself, and proper caching settings on the client side, we have a powerful infrastructure on which to build highly scalable applications. More over, we haven’t even touched the incredibly cool things one can do with search and map/reduce on the Riak cluster, or building complex mappings using the links and key filters.

More or less all web accessible NoSQL databases can benefit from a caching proxy. On the other hand, I’d say that it’s worth having an (even basic) internal cache for scenarios where the working data set is smaller than the total amount of data stored.

Original title and link: Varnish Cache + Riak (NoSQL databases © myNoSQL)


CouchDB and Varnish Caching - Why it Does Not Work

It was said many times that thanks to the fact that CouchDB is HTTP uberfriendly, you could use web tools to get it to scale. One such tool that was mentioned is the web cache ☞ Varnish. Mathias Meyer pointed out in the CouchDB post 1.0 roadmap that the lack of caching in CouchDB (basically all reads go to disk) should be addressed sooner than later.

Recently there was a conversation on the ☞ CouchDB mailing list concluding that Varnish cannot really help caching CouchDB results:

As I understand it now, the only way how to cache Couch’s response would be with time-based caching, and either using the cached response until it auto-expires, or expire the cached response via PURGE commands. Of course, it would be possible and technically trivial to send purge requests via the -changes feed or via the “update-notification” mechanism. As I see it, the tricky part would be to know which objects to purge, based on individual document changes. Because not only single documents, but also aggregated view results or fulltext queries would get cached.

Summarizing it, CouchDB and Varnish don’t work perfectly because:

  1. Varnish does not pass ETags tot he backend
  2. Even if Varnish would pass ETags in a GET to validate, CouchDB would go to the disk to read the document
  3. Even if Varnish would pass ETags in a HEAD to validate, CouchDB would still have to hit the disk for the document

So I guess the conclusion is that CouchDB must add caching.

Update: As pointed out by @karmiq it is the ETag based caching that doesn’t work, while expiration-based caching is OK (as long as dealing with possibly stale data is OK).

Update 2: Jan Lehnardt @janl pointed me to a JIRA issue ☞ Caching of BTree nodes and documents that was closed as won’t fix as the performance improvements were not significant. Personally I think that’s only a problem with that particular solution, as I’m pretty sure that not having to reach the disk for each read would yield better results (nb: see update 4)

Update 3: The Cost of I/O

Update 4: Conversation with @janl continued and we agreed that probably the best formulation would be that keeping in mind kernel and filesystem caches, crafted caches with knowledge of data layout may yield better results, but there’re no guarantees

Original title and link: CouchDB and Varnish Caching - Why it Does Not Work (NoSQL databases © myNoSQL)