full text indexing: All content tagged as full text indexing in NoSQL databases and polyglot persistence
Tuesday, 29 January 2013
MongoDB and Full Text Search: My First Week With MongoDB 2.4 Development Release
Chris Winslett of MongoHQ is experimenting with MongoDB text search and [declare himself satisfied:
Full-text searching with MongoDB 2.4 is more complex and powerful than originally illustrated in our first blog post outlining this feature.
There’s no suprise that he likes it, but I’m wondering if 10gen has an internal feedback channel with the other companies offering MongoDB services where they get feedback about the upcoming features and their implementations.
Original title and link: MongoDB and Full Text Search: My First Week With MongoDB 2.4 Development Release (©myNoSQL)
via: http://blog.mongohq.com/blog/2013/01/22/first-week-with-mongodb-2-dot-4-development-release/
Monday, 14 January 2013
MongoDB Text Search Tutorial
Today is the day of the experimental MongoDB text search feature. Tobias Trelle continues his posts about this feature providing some examples for query syntax (negation, phrase search)—according to the previous post even more advanced queries should be supported, filtering and projections, multiple text fields indexing, and adding details about the stemming solution used (Snowball).
In case you missed the previous posts, here is a quick link list:
- MongoDB Full Text Search Explained
- Full text search in MongoDB: details about supported languages and queries
- Short demo of MongoDB text search and hashed shard keys
- Indexing a Markdown blog using MongoDB Full Text Indexing
Original title and link: MongoDB Text Search Tutorial (©myNoSQL)
via: http://blog.codecentric.de/en/2013/01/mongodb-text-search-tutorial/
Indexing a Markdown Blog With MongoDB Full Text Search
A. Jesse Jiryu Davis uses the recently announced MongoDB full text search to index his Makrdown based blog:
The blog had been using a really terrible method for search, involving regular expressions, a full collection scan for every search, and no ranking of results by relevance. I wanted to replace all that cruft with MongoDB’s full-text search ASAP. Here’s what I did.
This looks nice, but I’d like to see how well it works. And there’s one thing that I don’t understand: why parsing the HTML when the source text is already in Markdown?
Original title and link: Indexing a Markdown Blog With MongoDB Full Text Search (©myNoSQL)
Short Demo of MongoDB Text Search and Hashed Shard Keys
Staying on the subject of MongoDB full text search—see here and here—a 10 minutes demo of the new feature:
Original title and link: Short Demo of MongoDB Text Search and Hashed Shard Keys (©myNoSQL)
Full Text Search in MongoDB: Details About Languages and Queries
Another post about the upcoming MongoDB full text search, this one adds some more details about supported languages and queries:
- Support for Latin based languages initially, with plans for other character sets later. Initially this will be: Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish and Turkish.
- Support for advanced queries, similar to the Google search syntax e.g. negation and phrase matching.
It’s worth emphasizing that the post refers to character sets when speaking about supported languages, but not about stemming which differs for many of those.
Original title and link: Full Text Search in MongoDB: Details About Languages and Queries (©myNoSQL)
via: http://blog.serverdensity.com/full-text-search-in-mongodb/
MongoDB Full Text Search Explained
Tobias Trelle explains the features planned for the full text support coming in MongoDB 2.4: stop words, (basic) stemming, full text indexes, and API:
The upcoming release 2.4 of MongoDB will include a first, experimental support for full text search (FTS). This feature was requested early in the history of MongoDB as you can see from this JIRA ticket: SERVER-380. FTS is first available with the developer release 2.3.2.
Couple of reasons for MongoDB including full text search:
- highly requested feature (239 votes, 193 watchers, 42 participants)
- (high level) feature parity with MySQL
- NIH
The majority of databases support full text indexing, but almost everyone needing good full text search ends up using Lucene or Solr or Elastic Search or Sphinx.
Original title and link: MongoDB Full Text Search Explained (©myNoSQL)
via: http://blog.codecentric.de/en/2013/01/text-search-mongodb-stemming/
Monday, 2 April 2012
Architecture of HBase-based Lucene Implementation
Boris Lublinsky and Mike Segel:
The implementation tries to balance two conflicting requirements - performance: in memory cache can drastically improve performance by minimizing the amount of HBase reads for search and documents retrieval; and scalability: ability to run as many Lucene instances as required to support growing search clients population. The latter requires minimizing of the cache life time to synchronize content with the HBase instance (a single copy of thruth). A compromise is achieved through implementing configurable cache time to live parameter, limiting cache presence in each Lucene instance.

Besides existing Solr scaling approaches and the work to make Solr scalable, there’s also the recently released DataStax Enterprise which integrates Solr on top of Cassandra.
Original title and link: Architecture of HBase-based Lucene Implementation (©myNoSQL)
Friday, 16 March 2012
Real-Time Search With MongoDB and Elasticsearch
Interesting usage of the MongoDB oplog to replace the lack of storage notifications:
ElasticSearch has a built in feature of Rivers, which are essentially plugins for specific services to constantly stream in new updates for indexing. Unfortunately, there’s no MongoDB River (probably due to the lack of built-in database triggers), so I did some research and realized that I could use the MongoDB oplog to continually capture updates to our main databases.
Kristina Chodorow has two posts—here and here—detailing what’s stored in the oplog.
Original title and link: Real-Time Search With MongoDB and Elasticsearch (©myNoSQL)
via: http://amberonrails.com/real-time-search-with-mongodb-and-elasticsearch/
Tuesday, 6 March 2012
Scaling Solr Indexing With SolrCloud, Hadoop and Behemoth
Grant Ingersoll:
Instead of doing all the extra work of making sure instances are up, etc., however, I am going to focus on using some of the new features of Solr4 (i.e. SolrCloud whose development effort has been primarily led by several of my colleagues: Yonik Seeley, Mark Miller and Sami Siren) which remove the need to figure out where to send documents when indexing, along with a convenient Hadoop-based document processing toolkit, created by Julien Nioche, called Behemoth that takes care of the need to write any Map/Reduce code and also handles things like extracting content from PDFs and Word files in a Hadoop friendly manner (think Apache Tika run in Map/Reduce) while also allowing you to output the results to things like Solr or Mahout, GATE and others as well as to annotate the intermediary results.
I have to agree with Karussell:
Scaling Solr means using Solr AND X AND Y AND… Scaling ElasticSearch means using ElasticSearch
Original title and link: Scaling Solr Indexing With SolrCloud, Hadoop and Behemoth (©myNoSQL)
Monday, 13 February 2012
Arya a MongoDB based Search Engine
The system is currently hard coded with one tokenizer and one analyzer. This can easily be changed. The searcher returns the document and the score it received but not where the term is, or any information on how to ‘highlight’ the result. This is doable by adding in the required information into the match embedded document and processing it out in the Map Reduce phase. There is no query caching in this system. Paging through the results will result in duplicate work. It would be best to actually cache the output of the map reduce into Redis using a sorted set. The Redis key would have to be derived from the query.
Remember this should be considered just an experiment for learning about MongoDB’s MapReduce capabilities.
Original title and link: Arya a MongoDB based Search Engine (©myNoSQL)
via: http://supermanscott.com/arya-a-mongodb-based-search-engine
Saturday, 11 February 2012
Big Data Search: Perfect Search
Tim Stay (CEO) talks about Perfect Search a solution for searching Big Data that:
- offers a unique architectural approach that significantly reduces the total computations required to query
- creates terms and pattern indexes (basically combinations of terms at indexing time)
- uses jump tables and bloom filters
- heavily optimizes disk I/O
- doesn’t require indexes in memory
- “can often do same query with less than 1% computations”
-
“when compared to Oracle/MS SQL, Perfect Search can be from 10x to over 1000x faster”
- according to the chart, the significant speed improvements are for cached results, while for first time queries I see numbers from 2 to 59
- if Perfect Search is a search engine why comparing with relational databases?
-
“Google takes over 100 servers to search 1 billion documents. Perfect Search can do it with 1 server”
- Google is using 100 servers for reliability and guaranteeing the speed of results
- “Lucene: 0.1 billion documents per server; CPU maxing at 100%. Perfect Search 1.6 billion documents per server; CPU idling at 15%”
With this preamble, you can watch the video after the break:
Wednesday, 8 February 2012
Fulltext search your CouchDB in Ruby
When having to choose what library to use for full text indexing of CouchDB data for a Ruby application, Taylor Luk looked at from Sphinx, Lucene, Ferret, Xapian and decided to go with Xapian with Xapit . Besides the fact that Xapian with Xapit offers a clean interface and customization of the indexing process, there seem to be quite a few important limitations:
- Xapit is still under active development
- You need to trigger Index update manually
- It doesn’t Incremental index update at the moment
I know some are afraid of managing a Java stack, but in the land of indexing, Lucene, Solr, ElasticSearch, IndexTank are the most powerful tools.
Original title and link: Fulltext search your CouchDB in Ruby (©myNoSQL)
via: http://taylorluk.com/post/17255656638/fulltext-search-your-couchdb-in-ruby
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling