NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



lucandra: All content tagged as lucandra in NoSQL databases and polyglot persistence

Lucandra: A Different Solution for Storing Lucene Indexes

It is pretty clear by now that many NoSQL stores have decided to rely on a 3rd party tool for full text indexing, the favorites so far being Lucene and Solr.

While some of these integration libraries are going with the default Lucene storage, others have picked a different route: persisting the Lucene index in a NoSQL store.

The first time we’ve heard about this approach was with the CouchDB indexer prototype which was storing the indexes directly in CouchDB. Now it is Cassandra’s turn to persist Lucene indexes and that’s exactly what Lucandra library is proposing:

Lucandra is a Cassandra backend for Lucene. Since Cassandra’s original use within Facebook was for search, integrating Lucene with Cassandra seemed like a “no brainer”.

In case you are wondering what the advantages are, I would remind you some of the Cassandra’s characteristics: scalable, highly available, elastic, fault tolerant, eventually consistent. Using Lucandra will bring you all these, at a small price (note: please check the comment thread for other drawbacks of this approach):

There is a impact on Lucandra searches when compared to native Lucene searches. In our testing we see Lucandra’s IndexReader is ~10% slower, than the default IndexReader. However, this is still quite acceptable to us given what you get in return. For writes Lucandra is comparatively slow to regular Lucene, since every term is effectively written under its own key. Luckily, this will be fixed in the next version of Cassandra, which will allow batched writes for keys.

I’d be very interested to hear if, except, anyone else is using Lucandra in production.