This idea is definitely not new, but the post shares quite a few good principles on how to build a search engine using Redis. Plus there’s some ☞ code available:
I know what you are thinking. Why would we want to build a search engine from scratch when Lucene, Xapian, and other software is available? What could possibly be gained? To start, simplicity, speed, and flexibility. We’re going to be building a search engine implementing TF/IDF search Redis, redis-py, and just a few lines of Python. With a few small changes to what I provide, you can integrate your own document importance scoring, and if one of my patches gets merged into Redis, you could combine TF/IDF with your pre-computed Pagerank… Building an index and search engine using Redis offers so much more flexibility out of the box than is available using any of the provided options. Convinced?
Anyways, as I’ve already said it myself, there are a couple of things you should be aware of:
Using Redis to build search is great for your personal site, your company intranet, your internal customer search, maybe even one of your core products. But be aware that Redis keeps everything in memory, so as your index grows, so does your machine requirements. Naive sharding tricks may work to a point, but there will be a point where your merging will have to turn into a tree, and your layers of merges start increasing your latency to scary levels.