index: All content tagged as index in NoSQL databases and polyglot persistence
There is an interesting conversation on the HBase mailing list about HBase MapReduce and different options of using external indexes:
Suppose you have a really large table with 1 billion rows of data.
Since HBase really doesn’t have any indexes built in (Don’t get me started about the contrib/transactional stuff…), you’re forced to use some sort of external index, or roll your own index table.
The net result is that you end up with a list object that contains your result set.
So the question is… what’s the best way to feed the list object in?
One option I thought about is writing the object to a file and then using it as the file in and then control the splitters. Not the most efficient but it would work.
Was trying to find a more ‘elegant’ solution and I’m sure that anyone using SOLR or LUCENE or whatever… had come across this problem too.
- I still cannot find a decent way to read and link to these mailing lists. How difficult would be to have a nice, threaded, uncluttered view? Do I want too much? (↩)