NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



LinkedIn Open Sources IndexTank: What Is IndexTank and How Does It Compare to Lucene and Solr

Today LinkedIn has announced that they are open sourcing the technology behind IndexTank, a company they acquired back in October. IndexTank was offering a hosted, scalable full-text search API.

The projects can be found already on GitHub: index tank-engine (the indexing engine) and the API, BackOffice, Storefront, and Nebulizer.

When reading the announcement, I’ve asked myself two questions: what is IndexTank and how does IndexTank compare to Lucene and Solr.

The answer to the the first one is provided in the post.

What is Index Tank? IndexTank is mainly three things:

  • IndexEngine: a real-time fulltext search-and-indexing system designed to separate relevance signals from document text. This is because the life cycle of these signals is different from the text itself, especially in the context of user-generated social inputs (shares, likes, +1, RTs).
  • API: a RESTful interface that handles authentication, validation, and communication with the IndexEngine(s). It allows users of IndexTank to access the service from different technology platforms (Java, Python, .NET, Ruby and PHP clients are already developed) via HTTP.
  • Nebulizer: a multitenant framework to host and manage an unlimited number of indexes running over a layer of Infrastructure-as-a-Service. This component of IndexTank will instantiate new virtual instances as needed, move indexes as they need more resources, and try to be reasonably efficient about it.

For the second, I’ve reached out the the old IndexTank FAQ.

How does IndexTank compare to Lucene and Solr?

  1. IndexTank was a hosted, scalable service
  2. IndexTank can add documents to the index
  3. IndexTank supports updating document variables without re-indexing
  4. IndexTank supports geolocation functions

For more details there’s a paper by Alejandro Perez covering IndexTank and other search solutions.

Happy hacking!

Original title and link: LinkedIn Open Sources IndexTank: What Is IndexTank and How Does It Compare to Lucene and Solr (NoSQL database©myNoSQL)