NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



IndexTank: All content tagged as IndexTank in NoSQL databases and polyglot persistence

Lucene & Solr Year 2011 in Review

I much prefer reviews to predictions. Moreover so when there are so many worthy things to be mention as what Lucene and Solr have accomplished in 2011:

  • Near Real-Time search (freshly added documents can be immediately made visible in search results)
  • Field collapsing or result grouping
  • faceting module
  • language support

Plus the promise of the SolrCloud:

In short, SolrCloud will make it easier for people to operate larger Solr clusters by making use of more modern design principles and software components such as ZooKeeper, that make creation of distributed, cluster-based software/services easier.  Some of the core functionality is that there will be no single point of failure, any node will be able to handle any operation, there will be no traditional master-slave setup, there will be centralized cluster management and configuration, failovers will be automatic and in general things will be much more dynamic.  

On the other hand, last December LinkedIn open sourced IndexTank a real-time fulltext search-and-indexing system. Some of its features will definitely sound interesting to Lucene and Solr users.

Original title and link: Lucene & Solr Year 2011 in Review (NoSQL database©myNoSQL)


IndexTank vs Thinking Sphinx vs WebSolr

In the light of IndexTank being open sourced by LinkedIn, here is a post in which Gautam Rege compares IndexTank with Thinking Sphinx and WebSolr. Feature-wise IndexTank has some advantages over Solr and almost none when compared wtih Thinking Sphinx.

When I first set out needing full text searching, I used Solr. It was pretty good though re-indexing took ages and to ensure consistency, I had to re-index every day via cron. Then I found Thinking Sphinx – and loved it because it managed delta indexes! Wow – no more daily re-index cron jobs. Even the re-indexing was way quicker.

The big issue with both Solr and TS was that it required tight integration with models and my database. For example – in TS, if a relationship was changed, I had to ensure to trigger the parent / child delta index in order to ensure it gets indexed too.  Both TS and Solr add methods to ActiveRecord, which I find a little annoying. These nuances gets my code too dependent on TS or Solr and switching from them to something else becomes a big pain!

Original title and link: IndexTank vs Thinking Sphinx vs WebSolr (NoSQL database©myNoSQL)


LinkedIn Open Sources IndexTank: What Is IndexTank and How Does It Compare to Lucene and Solr

Today LinkedIn has announced that they are open sourcing the technology behind IndexTank, a company they acquired back in October. IndexTank was offering a hosted, scalable full-text search API.

The projects can be found already on GitHub: index tank-engine (the indexing engine) and the API, BackOffice, Storefront, and Nebulizer.

When reading the announcement, I’ve asked myself two questions: what is IndexTank and how does IndexTank compare to Lucene and Solr.

The answer to the the first one is provided in the post.

What is Index Tank? IndexTank is mainly three things:

  • IndexEngine: a real-time fulltext search-and-indexing system designed to separate relevance signals from document text. This is because the life cycle of these signals is different from the text itself, especially in the context of user-generated social inputs (shares, likes, +1, RTs).
  • API: a RESTful interface that handles authentication, validation, and communication with the IndexEngine(s). It allows users of IndexTank to access the service from different technology platforms (Java, Python, .NET, Ruby and PHP clients are already developed) via HTTP.
  • Nebulizer: a multitenant framework to host and manage an unlimited number of indexes running over a layer of Infrastructure-as-a-Service. This component of IndexTank will instantiate new virtual instances as needed, move indexes as they need more resources, and try to be reasonably efficient about it.

For the second, I’ve reached out the the old IndexTank FAQ.

How does IndexTank compare to Lucene and Solr?

  1. IndexTank was a hosted, scalable service
  2. IndexTank can add documents to the index
  3. IndexTank supports updating document variables without re-indexing
  4. IndexTank supports geolocation functions

For more details there’s a paper by Alejandro Perez covering IndexTank and other search solutions.

Happy hacking!

Original title and link: LinkedIn Open Sources IndexTank: What Is IndexTank and How Does It Compare to Lucene and Solr (NoSQL database©myNoSQL)