GridFS: All content tagged as GridFS in NoSQL databases and polyglot persistence
Aristarkh Zagordnikov wrote me an email describing the reasons that led his company create and open source mod_gridfs.
Some time ago we were looking for a way to serve files to the web right from the GridFS database. We considered different options, including IIS handler (we use .NET on Windows as a backend) that requires a Windows machine to serve files (we planned to use Windows as backend only), nginx-gridfs that was too slow (because it’s synchronous and nginx isn’t, and uses the not-very-much-up-to-date MongoDB C driver that doesn’t do connection pooling, etc.) and does not support slaveOk (horizontal sharding).
At last I decided to roll our own method: a module for Apache 2.2 or higher that uses MongoDB’s own C++ driver. It supports replica sets, slaveOk reads, proper output caching headers (Last-Modified, Etag, Cache-Control, Expires), properly responds to conditional requests (If-Modified-Since/If-None-Match), and uses Apache brigade API to serve large files with less in-memory copying.
While Apache isn’t the most resource-friendly server for a high-load environment (it consumes too much memory per connection and does not yet support production-quality event-based I/O), it really shines as a backend for something like nginx+proxy_cache with optional SSD as proxy_cache storage that does the heavy lifting.
Serving a 4KiB file over a gigabit network on modern hardware, 100 concurrent requests, MongoDB replica set of 3 machines as a backend:
- NGINX + nginx-gridfs: 1.2kr/s
- Apache + mod_gridfs: 6.6kr/s
- Apache + mod_gridfs with slaveOk: 12.1kr/s
I didn’t test with larger files, because this way I’ll be benchmarkng OS I/O performance instead of user-mode code.
The public Mercurial repo is here. It uses Simplified 2-clause BSD license, and contains installation instructions and docs in the README file (building might seem hard, but after building if you have to mass-deploy, you just install dependent libraries like boost and copy the mod_gridfs.so file around).
Original title and link: MongoDB GridFS Over HTTP With Mod_gridfs ( ©myNoSQL)
Did you know that when accessing files from GridFS these are streamed without being loaded entirely in memory?
GridFS splits a file into small chunks storing them in a special
chunks collection. Each file has additional metadata: filename, content type, and custom meta stored in a
GridFS permits range operations, thus one could retrieve only specific ranges of bytes from the file. (nb: I couldn’t find the API for this operation though, so maybe this is not exposed as API in the drivers).
Official GridFS documentation: