Boris Lublinsky and Mike Segel:
The implementation tries to balance two conflicting requirements - performance: in memory cache can drastically improve performance by minimizing the amount of HBase reads for search and documents retrieval; and scalability: ability to run as many Lucene instances as required to support growing search clients population. The latter requires minimizing of the cache life time to synchronize content with the HBase instance (a single copy of thruth). A compromise is achieved through implementing configurable cache time to live parameter, limiting cache presence in each Lucene instance.
Besides existing Solr scaling approaches and the work to make Solr scalable, there’s also the recently released DataStax Enterprise which integrates Solr on top of Cassandra.
Original title and link: Architecture of HBase-based Lucene Implementation ( ©myNoSQL)