Apache: All content tagged as Apache in NoSQL databases and polyglot persistence
Aristarkh Zagordnikov wrote me an email describing the reasons that led his company create and open source mod_gridfs.
Some time ago we were looking for a way to serve files to the web right from the GridFS database. We considered different options, including IIS handler (we use .NET on Windows as a backend) that requires a Windows machine to serve files (we planned to use Windows as backend only), nginx-gridfs that was too slow (because it’s synchronous and nginx isn’t, and uses the not-very-much-up-to-date MongoDB C driver that doesn’t do connection pooling, etc.) and does not support slaveOk (horizontal sharding).
At last I decided to roll our own method: a module for Apache 2.2 or higher that uses MongoDB’s own C++ driver. It supports replica sets, slaveOk reads, proper output caching headers (Last-Modified, Etag, Cache-Control, Expires), properly responds to conditional requests (If-Modified-Since/If-None-Match), and uses Apache brigade API to serve large files with less in-memory copying.
While Apache isn’t the most resource-friendly server for a high-load environment (it consumes too much memory per connection and does not yet support production-quality event-based I/O), it really shines as a backend for something like nginx+proxy_cache with optional SSD as proxy_cache storage that does the heavy lifting.
Serving a 4KiB file over a gigabit network on modern hardware, 100 concurrent requests, MongoDB replica set of 3 machines as a backend:
- NGINX + nginx-gridfs: 1.2kr/s
- Apache + mod_gridfs: 6.6kr/s
- Apache + mod_gridfs with slaveOk: 12.1kr/s
I didn’t test with larger files, because this way I’ll be benchmarkng OS I/O performance instead of user-mode code.
The public Mercurial repo is here. It uses Simplified 2-clause BSD license, and contains installation instructions and docs in the README file (building might seem hard, but after building if you have to mass-deploy, you just install dependent libraries like boost and copy the mod_gridfs.so file around).
Original title and link: MongoDB GridFS Over HTTP With Mod_gridfs ( ©myNoSQL)
This Apache module uses a rule-based engine (based on regular expression parser) to map URLs to REDIS commands on the fly. It supports an unlimited number of rules and can match on the full URL and the request method (GET, POST, PUT or DELETE) to provide a very flexible option for defining a RESTful interface to REDIS.
Original title and link: Apache Mod_redis ( ©myNoSQL)
The National Security Agency has submitted to Apache Incubator a proposal to open source Accumulo, a BigTable inspired key-value store that they were building since 2008. The project proposal page provides more details about Accumulo history, building blocks, and how it compares to the other BigTable open source implementation HBase:
Access Labels: Accumulo has an additional portion of its key that sorts after the column qualifier and before the timestamp. It is called column visibility and enables expressive cell-level access control. Authorizations are passed with each query to control what data is returned to the user.
Iterators: Accumulo has a novel server-side programming mechanism that can modify the data written to disk or returned to the user. This mechanism can be configured for any of the scopes where data is read from or written to disk. It can be used to perform joins on data within a single tablet.
Flexibility: Accumulo places no restrictions on the column families. Also, each column family in HBase is stored separately on disk. Accumulo allows column families to be grouped together on disk, as does BigTable.
Logging: HBase uses a write-ahead log on the Hadoop Distributed File System. Accumulo has its own logging service that does not depend on communication with the HDFS NameNode.
Storage: Accumulo has a relative key file format that improves compression.
Michael Stack has commented on the HBase mailing list:
The cell based ‘access labels’ seem like a matter of adding an extra field to KV and their Iterators seem like a specialization on Coprocessors. The ability to add column families on the fly seems too minor a difference to call out especially if online schema edits are now (soon) supported. They talk of locality group like functionality too — that could be a significant difference. We would have to see the code but at first blush, differences look small.
Original title and link: Accumulo: A New BigTable Inspired Distributed Key/Value by NSA ( ©myNoSQL)
Robert Newson just announced a new version of Apache CouchDB, 1.1.0, featuring native SSL, HTTP range requests, and a other features and improvements listed below:
- Native SSL support.
- Added support for HTTP range requests for attachments.
- Added built-in filters for
- Added configuration option for TCP_NODELAY aka “Nagle”.
- Allow wildcards in vhosts definitions.
- More granular ETag support for views.
- More flexible URL rewriter.
- Added OS Process module to manage daemons outside of CouchDB.
- Added HTTP Proxy handler for more scalable externals.
_replicatordatabase to manage replications.
- Multiple micro-optimizations when reading data.
- Added CommonJS support to map functions.
stale=update_afterquery option that triggers a view update after returning a
- More explicit error messages when it’s not possible to access a file due to lack of permissions.
- Added a “change password”-feature to Futon.
While all these sound interesting, many of the items listed in this user suggested post 1.0 CouchDB roadmap didn’t make it in yet.
Original title and link: Apache CouchDB 1.1.0 Released: Native SSL, HTTP Range Requests (NoSQL databases © myNoSQL)