NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



full text indexing: All content tagged as full text indexing in NoSQL databases and polyglot persistence

Full-text Search in your Database: Algolia vs Elasticsearch

Until now, Elasticsearch has been the fall-back solution for developers. Although a beautiful product for big data analysis or document search, it hasn’t been designed for object searches. Algolia has. The purpose of this blog post is to answer a question we’re frequently asked: If Algolia brings a specific answer when Elasticsearch offers a broad set of tools, how do they compare for database search?

This is the first time I’ve heard of Algolia. Unfortunately the docs page doesn’t reveal anything of the Algol’s secret sauce. So without knowing anything about it, I’d speculate that the performance difference comes from a highly optimized short ngrams storage/retrieval approach.

Original title and link: Full-text Search in your Database: Algolia vs Elasticsearch (NoSQL database©myNoSQL)


NoSQL and Full Text Indexing: Two Trends

On one side:

  1. DataStax with Solr
  2. MapR with LucidWorks Search (nb: Solr)

and on the other side:

  1. Riak Searching: Solr-like but custom prioprietary implementation
  2. MongoDB text search: custom prioprietary implementation

I’m not going to argue about the pros and cons of each of these approaches, but I’m sure you already know which of these approaches I’m in favor of.

Original title and link: NoSQL and Full Text Indexing: Two Trends (NoSQL database©myNoSQL)

MongoDB and Full Text Search: My First Week With MongoDB 2.4 Development Release

Chris Winslett of MongoHQ is experimenting with MongoDB text search and [declare himself satisfied:

Full-text searching with MongoDB 2.4 is more complex and powerful than originally illustrated in our first blog post outlining this feature.

There’s no suprise that he likes it, but I’m wondering if 10gen has an internal feedback channel with the other companies offering MongoDB services where they get feedback about the upcoming features and their implementations.

Original title and link: MongoDB and Full Text Search: My First Week With MongoDB 2.4 Development Release (NoSQL database©myNoSQL)


MongoDB Text Search Tutorial

Today is the day of the experimental MongoDB text search feature. Tobias Trelle continues his posts about this feature providing some examples for query syntax (negation, phrase search)—according to the previous post even more advanced queries should be supported, filtering and projections, multiple text fields indexing, and adding details about the stemming solution used (Snowball).

In case you missed the previous posts, here is a quick link list:

  1. MongoDB Full Text Search Explained
  2. Full text search in MongoDB: details about supported languages and queries
  3. Short demo of MongoDB text search and hashed shard keys
  4. Indexing a Markdown blog using MongoDB Full Text Indexing

Original title and link: MongoDB Text Search Tutorial (NoSQL database©myNoSQL)


Indexing a Markdown Blog With MongoDB Full Text Search

A. Jesse Jiryu Davis uses the recently announced MongoDB full text search to index his Makrdown based blog:

The blog had been using a really terrible method for search, involving regular expressions, a full collection scan for every search, and no ranking of results by relevance. I wanted to replace all that cruft with MongoDB’s full-text search ASAP. Here’s what I did.

This looks nice, but I’d like to see how well it works. And there’s one thing that I don’t understand: why parsing the HTML when the source text is already in Markdown?

Original title and link: Indexing a Markdown Blog With MongoDB Full Text Search (NoSQL database©myNoSQL)


Short Demo of MongoDB Text Search and Hashed Shard Keys

Staying on the subject of MongoDB full text search—see here and here—a 10 minutes demo of the new feature:

Original title and link: Short Demo of MongoDB Text Search and Hashed Shard Keys (NoSQL database©myNoSQL)

Full Text Search in MongoDB: Details About Languages and Queries

Another post about the upcoming MongoDB full text search, this one adds some more details about supported languages and queries:

  • Support for Latin based languages initially, with plans for other character sets later. Initially this will be: Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish and Turkish.
  • Support for advanced queries, similar to the Google search syntax e.g. negation and phrase matching.

It’s worth emphasizing that the post refers to character sets when speaking about supported languages, but not about stemming which differs for many of those.

Original title and link: Full Text Search in MongoDB: Details About Languages and Queries (NoSQL database©myNoSQL)


MongoDB Full Text Search Explained

Tobias Trelle explains the features planned for the full text support coming in MongoDB 2.4: stop words, (basic) stemming, full text indexes, and API:

The upcoming release 2.4 of MongoDB will include a first, experimental support for full text search (FTS). This feature was requested early in the history of MongoDB as you can see from this JIRA ticket: SERVER-380. FTS is first available with the developer release 2.3.2.

Couple of reasons for MongoDB including full text search:

  1. highly requested feature (239 votes, 193 watchers, 42 participants)
  2. (high level) feature parity with MySQL
  3. NIH

The majority of databases support full text indexing, but almost everyone needing good full text search ends up using Lucene or Solr or Elastic Search or Sphinx.

Original title and link: MongoDB Full Text Search Explained (NoSQL database©myNoSQL)


Architecture of HBase-based Lucene Implementation

Boris Lublinsky and Mike Segel:

The implementation tries to balance two conflicting requirements - performance: in memory cache can drastically improve performance by minimizing the amount of HBase reads for search and documents retrieval; and scalability: ability to run as many Lucene instances as required to support growing search clients population. The latter requires minimizing of the cache life time to synchronize content with the HBase instance (a single copy of thruth). A compromise is achieved through implementing configurable cache time to live parameter, limiting cache presence in each Lucene instance.

Architecture of HBase-based Lucene implementation

Besides existing Solr scaling approaches and the work to make Solr scalable, there’s also the recently released DataStax Enterprise which integrates Solr on top of Cassandra.

Original title and link: Architecture of HBase-based Lucene Implementation (NoSQL database©myNoSQL)


Real-Time Search With MongoDB and Elasticsearch

Interesting usage of the MongoDB oplog to replace the lack of storage notifications:

ElasticSearch has a built in feature of Rivers, which are essentially plugins for specific services to constantly stream in new updates for indexing. Unfortunately, there’s no MongoDB River (probably due to the lack of built-in database triggers), so I did some research and realized that I could use the MongoDB oplog to continually capture updates to our main databases.

Kristina Chodorow has two posts—here and here—detailing what’s stored in the oplog.

Original title and link: Real-Time Search With MongoDB and Elasticsearch (NoSQL database©myNoSQL)


Scaling Solr Indexing With SolrCloud, Hadoop and Behemoth

Grant Ingersoll:

Instead of doing all the extra work of making sure instances are up, etc., however, I am going to focus on using some of the new features of Solr4 (i.e. SolrCloud whose development effort has been primarily led by several of my colleagues: Yonik Seeley, Mark Miller and Sami Siren) which remove the need to figure out where to send documents when indexing, along with a convenient Hadoop-based document processing toolkit, created by Julien Nioche, called Behemoth that takes care of the need to write any Map/Reduce code and also handles things like extracting content from PDFs and Word files in a Hadoop friendly manner (think Apache Tika run in Map/Reduce) while also allowing you to output the results to things like Solr or Mahout, GATE and others as well as to annotate the intermediary results.

I have to agree with Karussell:

Scaling Solr means using Solr AND X AND Y AND… Scaling ElasticSearch means using ElasticSearch

Original title and link: Scaling Solr Indexing With SolrCloud, Hadoop and Behemoth (NoSQL database©myNoSQL)


Arya a MongoDB based Search Engine

The system is currently hard coded with one tokenizer and one analyzer. This can easily be changed. The searcher returns the document and the score it received but not where the term is, or any information on how to ‘highlight’ the result. This is doable by adding in the required information into the match embedded document and processing it out in the Map Reduce phase. There is no query caching in this system. Paging through the results will result in duplicate work. It would be best to actually cache the output of the map reduce into Redis using a sorted set. The Redis key would have to be derived from the query.

Remember this should be considered just an experiment for learning about MongoDB’s MapReduce capabilities.

Original title and link: Arya a MongoDB based Search Engine (NoSQL database©myNoSQL)