Lucene: All content tagged as Lucene in NoSQL databases and polyglot persistence
Wednesday, 13 October 2010
Riak 0.13, Featuring Riak Search
I’m not very sure how I’ve managed to be the last to the Riak 0.13 party :(. And I can tell you it is a big party.
After writing about Riak search a couple of times already[1]
, I finally missed exactly the release of Riak that includes Riak search.
Riak 0.13, ☞ announced a couple of days ago, brings quite a few new exciting features:
- Riak search
- MapReduce improvements
- Bitcask storage backend improvements
- improvements to the riak_code and riak_kv modules — the building blocks of Dynamo-like distributed systems — and better code organization allowing easier use of these modules
While everything in this release sounds like an important step forward for Riak, what sets it aside the Riak search a feature that is currently unique in the NoSQL databases space.
Riak search
Riak search is using Lucene and builds a Solr like API on top of it (nb I think that reusing known interfaces and protocols is most of the time the right approach).
At a very high level, Search works like this: when a bucket in Riak has been enabled for Search integration (by installing the Search pre-commit hook), any objects stored in that bucket are also indexed seamlessly in Riak Search. You can then find and retrieve your Riak objects using the objects’ values. The Riak Client API can then be used to perform Search queries that return a list of bucket/key pairs matching the query. Alternatively, the query results can be used as the input to a Riak MapReduce operation. Currently the PHP, Python, Ruby, and Erlang APIs support integration with Riak Search.
The Basho blog explains this feature extensively ☞ here and ☞ here.
Riak Search shows a lot of great decisions made by the Basho team, as it avoids reinventing the wheel or creating some new protocols/interfaces. I’ve stressed these aspects a couple of times already, when writing that NoSQL databases should follow the Unix Philosophy and also when writing about how important NoSQL protocols are. Mathias Meyer has a ☞ post detailing why these are important.
Last, but not least the Ruby Riak ripple library ☞ got updated too, but not sure it supports all the new features in Riak 0.13.
Here is a Rusty Klophaus (Basho) talking about Riak search at Berlin Buzzwords NoSQL event:
- First post about Riak search Notes on scaling out with Riak and Riak search podcast dates back to December 14th, 2009, just a couple of days after setting up myNoSQL. (↩)
Original title and link: Riak 0.13, Featuring Riak Search (NoSQL databases © myNoSQL)
Tuesday, 25 May 2010
Riak Search and Riak Full Text Indexing
Announced a while back and ☞ not quite here yet, Riak Search is Basho’s solution to the full text indexing problem.
While waiting for the release of Riak Search, I think that you can already start doing full text indexing using one of the existing indexing solutions (Lucene[1], Solr[2], ElasticSearch[3], etc.) and Riak post-commit hooks.
Simply put, all you’ll have to do is to create a Riak post-commit hook that feeds data into your indexing system.
The downside of this solution is that:
- you’ll still have to make sure that your indexing system is scalable, elastic, etc.
- you’ll not be able to use indexed data directly from Riak mapreduce functions, a feature that will be available through Riak Search.
Anyways, until Riak Search is out, why not having some fun!
Update: Embedded below a presentation on Riak Search providing some more details about this upcoming Basho product:
Update: Looks like the other presentation is not available anymore, so here is another on Riak search:
Tuesday, 6 April 2010
Presentation: CouchDB and Lucene
We’ve looked in the past at two possible approaches to deal with full text indexing in CouchDB. Now, I’ve found a great slidedeck from Martin Rehfeld on the subject:
Thursday, 11 February 2010
Integrating MongoDB with Solr
Sounds like quite a few NoSQL projects are externalizing the full text indexing to either Lucene or Solr (take for example CouchDB integration with Lucene or Neo4j integration with Lucene and Solr).
Now even if there are some basic ways (see [1] and [2]) to achieve this with MongoDB alone, people are still looking for more scalable solutions as shown by this thread ☞ covering Solr integration with MongoDB. The thread also mentions a couple of existing Ruby or Rails plugins for this integration.
One concern that I’ve expressed about the integration with Lucene alone is that you’ll have to deal with its scalability. Solr is one way to do that automatically. Lately I have heard of a new solution for scalable search: ☞ ElasticSearch which sounds quite interesting (nb: I haven’t yet gone through its docs or played with it, but the creator of the project has a long search/indexing history behind. You can find more details about Elastic Search here[3]).
Friday, 29 January 2010
Neo4j Extending Integration with Lucene Family. Now Solr
In a previous post, I was writing that Neo4j, as CouchDB, is using Lucene for full text indexing. While agreeing that this is definitely better than reinventing the wheel, I was also raising my concern about the complexity and scalability of this approach.
Now it looks like there is some work to integrate Neo4j with Solr, the standalone full-text search server based on Lucene [1]. This would definitely address the issue I have raised. Anyway it is not yet clear from the original message [2] how this integration will work though (it sounds like a two-way integration, but I may be misinterpreting the details). The code is availalbe on Neo4j ☞ SVN.
References
- [1] ☞ Solr (↩)
- [2] ☞ Solr integration for Neo4j in the making (↩)
Monday, 4 January 2010
Neo4j Node Indexing
It looks like CouchDB is not the only NoSQL store that uses Lucene for full text indexing. Neo4j, the graph database, has no built-in indexing features, but provides a plugable mechanism for supporting it. You can read more about this integration on ☞ Neo4j wiki.
There is also a post from Arin Sarkissian providing ☞ a quick example of how node indexing should be implemented.
While I do appreciate the fact that these projects are not suffering from the “not invented here” syndrome (and I read that Lucene can scale), I would definitely find very useful to see some good references/recommendations on how to deal with Lucene scaling once Lucene-based full text/node indexing is used.
Update: Neo4j is getting closer to its 1.0 release and the latest RCs include some improvements on the node indexing. You can read more about it in the ☞ changelog
Monday, 28 December 2009
CouchDB Full Text Indexing
Currently there seems to be two approaches to get full text indexing in CouchDB: couchdb-lucene [1] and indexer [2].
As its name implies, couchdb-lucene is based on the well known Lucene library. While I think that such a solution is providing a lot of features and flexibility, my concern is that it also brings additional complexity in terms of scalability as you’ll not only need to take care of scaling CouchDB, but also your Lucene indexes. On the other hand, indexer is using a much simpler approach and stores the indexes directly in the CouchDB, but it is still a prototype version.
But that’s just my opinion, so I’m wondering which one of these would you favor?
To learn more about these projects you can check the following resources:
And for more libraries and projects make sure you check the NoSQL Libraries.
References
Tuesday, 22 December 2009
CouchDB Full Text Indexing Prototype and Riak Search
A prototype for CouchDB full text indexing based on Joe Armstrong’s code from ☞ Programming Erlang: Software for a Concurrent World
The implementation is quite naive, using a couch database to store the inverted index, but it works surprisingly well for my use case and is very simple.
Not sure though that this prototype would have stopped ☞ the guys from Collecta to migrate to Riak and Riak Search.
The CouchDB full text indexing prototype code can be accessed on ☞ GitHub.
via: http://dionne.posterous.com/full-text-indexing-couchdb-and-performance
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling