There is an interesting conversation on the HBase mailing list about HBase MapReduce and different options of using external indexes:
Suppose you have a really large table with 1 billion rows of data.
Since HBase really doesn’t have any indexes built in (Don’t get me started about the contrib/transactional stuff…), you’re forced to use some sort of external index, or roll your own index table.
The net result is that you end up with a list object that contains your result set.
So the question is… what’s the best way to feed the list object in?
One option I thought about is writing the object to a file and then using it as the file in and then control the splitters. Not the most efficient but it would work.
Was trying to find a more ‘elegant’ solution and I’m sure that anyone using SOLR or LUCENE or whatever… had come across this problem too.
- I still cannot find a decent way to read and link to these mailing lists. How difficult would be to have a nice, threaded, uncluttered view? Do I want too much? (↩)