It is pretty clear by now that many NoSQL stores have decided to rely on a 3rd party tool for full text indexing, the favorites so far being Lucene and Solr.
While some of these integration libraries are going with the default Lucene storage, others have picked a different route: persisting the Lucene index in a NoSQL store.
The first time we’ve heard about this approach was with the CouchDB indexer prototype which was storing the indexes directly in CouchDB. Now it is Cassandra’s turn to persist Lucene indexes and that’s exactly what Lucandra library is proposing:
Lucandra is a Cassandra backend for Lucene. Since Cassandra’s original use within Facebook was for search, integrating Lucene with Cassandra seemed like a “no brainer”.
In case you are wondering what the advantages are, I would remind you some of the Cassandra’s characteristics: scalable, highly available, elastic, fault tolerant, eventually consistent. Using Lucandra will bring you all these, at a small price (note: please check the comment thread for other drawbacks of this approach):
There is a impact on Lucandra searches when compared to native Lucene searches. In our testing we see Lucandra’s
IndexReader is ~10% slower, than the default
IndexReader. However, this is still quite acceptable to us given what you get in return.
For writes Lucandra is comparatively slow to regular Lucene, since every term is effectively written under its own key. Luckily, this will be fixed in the next version of Cassandra, which will allow batched writes for keys.
I’d be very interested to hear if, except Sparse.ly, anyone else is using Lucandra in production.