The combination of Hadoop and Solr makes it easy to crunch lots of data and then quickly serve up the results via a fast, flexible search & query API. Because Solr supports query-style requests, it’s suitable as a NoSQL replacement for traditional databases in many situations, especially when the size of the data exceeds what is reasonable with a typical RDBMS.
I think the first time I’ve heard about Solr and Lucene mentioned as NoSQL-like storages was from Grant Ingersoll and from the Guardian.co.uk guys.
From a NoSQL perspective:
- there’s no fixed schema
- there’s key-value access — hopefully that’s very fast and scalable
- even if not standardized, there’s an advanced querying language
But as the original article points out some characteristics are missing:
- Updating the index works best as a batch operation. Individual records can be updated, but each commit (index update) generates a new Lucene segment, which will impact performance.
- Current support for replication, fail-over, and other attributes that you’d want in a production-grade solution aren’t yet there in SolrCloud. If this matters to you, consider Katta instead.
Original title and link: Using Solr and Hadoop as a NoSQL database (NoSQL databases © myNoSQL)