NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



HBase Map/Reduce and External Indexes

There is an interesting conversation on the HBase mailing list about HBase MapReduce and different options of using external indexes:

Suppose you have a really large table with 1 billion rows of data.

Since HBase really doesn’t have any indexes built in (Don’t get me started about the contrib/transactional stuff…), you’re forced to use some sort of external index, or roll your own index table.

The net result is that you end up with a list object that contains your result set.

So the question is… what’s the best way to feed the list object in?

One option I thought about is writing the object to a file and then using it as the file in and then control the splitters. Not the most efficient but it would work.

Was trying to find a more ‘elegant’ solution and I’m sure that anyone using SOLR or LUCENE or whatever… had come across this problem too.

☞ hadoop-hbase-user[1]

  1. I still cannot find a decent way to read and link to these mailing lists. How difficult would be to have a nice, threaded, uncluttered view? Do I want too much?  ()

Original title and link: HBase Map/Reduce and External Indexes (NoSQL databases © myNoSQL)