SOLR: All content tagged as SOLR in NoSQL databases and polyglot persistence
Great presentation on searching BigData in real-time integrating Solr and Hadoop from ☞ OpenLogic’s Rod Cope:
And they are definitely not the only one using Hadoop and HBase for search. I guess this would also be a counter-example to Beyond Hadoop - Next-Generation Big Data Architectures.
Original title and link: Real-Time Searching of Big Data with Solr and Hadoop (NoSQL databases © myNoSQL)
I’m not very sure how I’ve managed to be the last to the Riak 0.13 party :(. And I can tell you it is a big party.
Riak 0.13, ☞ announced a couple of days ago, brings quite a few new exciting features:
- Riak search
- MapReduce improvements
- Bitcask storage backend improvements
- improvements to the riak_code and riak_kv modules — the building blocks of Dynamo-like distributed systems — and better code organization allowing easier use of these modules
While everything in this release sounds like an important step forward for Riak, what sets it aside the Riak search a feature that is currently unique in the NoSQL databases space.
Riak search is using Lucene and builds a Solr like API on top of it (nb I think that reusing known interfaces and protocols is most of the time the right approach).
At a very high level, Search works like this: when a bucket in Riak has been enabled for Search integration (by installing the Search pre-commit hook), any objects stored in that bucket are also indexed seamlessly in Riak Search. You can then find and retrieve your Riak objects using the objects’ values. The Riak Client API can then be used to perform Search queries that return a list of bucket/key pairs matching the query. Alternatively, the query results can be used as the input to a Riak MapReduce operation. Currently the PHP, Python, Ruby, and Erlang APIs support integration with Riak Search.
Riak Search shows a lot of great decisions made by the Basho team, as it avoids reinventing the wheel or creating some new protocols/interfaces. I’ve stressed these aspects a couple of times already, when writing that NoSQL databases should follow the Unix Philosophy and also when writing about how important NoSQL protocols are. Mathias Meyer has a ☞ post detailing why these are important.
Last, but not least the Ruby Riak ripple library ☞ got updated too, but not sure it supports all the new features in Riak 0.13.
Here is a Rusty Klophaus (Basho) talking about Riak search at Berlin Buzzwords NoSQL event:
- First post about Riak search Notes on scaling out with Riak and Riak search podcast dates back to December 14th, 2009, just a couple of days after setting up myNoSQL. (↩)
Given a set of requirements (prepared to scale, data models can evolve, data must be searchable, common access to entities), a data definition language (think Protocol Buffers
), a NoSQL database, how do you build a searchable, evolvable entity store?
Sam Pullara explains how he solved these while ☞ creating HAvroBase:
The first choice you have to make against these requirements is which data definition language are you going to use?
Whereas the data definition choice is basically commodity at this point and your choice can be somewhat arbitrary, the choice of storage technology will likely be something that has more trade-offs to consider.
When it comes to text search you really don’t get better than Lucene in open source and the features that Solr builds on top of Lucene make it even better. I don’t think there is reasonable argument for using something besides Solr at this point. Especially with their support for sharding and replication that comes with Solr Cloud.
The only remark is that the solution might also use other NoSQL databases especially key-value stores (basically, once entities are encoded with Avro, data will become opaque to HBase so its wide-column data model is not a strong requirement).
Source code is available on ☞ GitHub.
While waiting for the release of Riak Search, I think that you can already start doing full text indexing using one of the existing indexing solutions (Lucene, Solr, ElasticSearch, etc.) and Riak post-commit hooks.
Simply put, all you’ll have to do is to create a Riak post-commit hook that feeds data into your indexing system.
The downside of this solution is that:
- you’ll still have to make sure that your indexing system is scalable, elastic, etc.
- you’ll not be able to use indexed data directly from Riak mapreduce functions, a feature that will be available through Riak Search.
Anyways, until Riak Search is out, why not having some fun!
Update: Embedded below a presentation on Riak Search providing some more details about this upcoming Basho product:
Update: Looks like the other presentation is not available anymore, so here is another on Riak search: