Bloom: All content tagged as Bloom in NoSQL databases and polyglot persistence
Monday, 7 February 2011
HBase and Bloom Filters
Lars George and Nicolas Spiegelberg — both HBase committers, Nicolas also being the guy that implemented HBase Bloom filters — explaining the pros (and cons) of using Bloom filters in HBase:
Lars:
Keep in mind that HBase only has a block index per file, which is rather course grained and tells the reader that a key may be in the file because it falls into a start and end key range in the block index. But if the key is actually present can only be determined by loading that block and scanning it. This also places a burden on the block cache and you may create a lot of unnecessary churn that the bloom filters would help avoid.
Nicolas:
Get/Scan(Row) currently does a parallel N-way get of that Row from all StoreFiles in a Region. This means that you are doing N read requests from disk. BloomFilters provide a lightweight in-memory structure to reduce those N disk reads to only the files likely to contain that Row (N-B).
Original title and link: HBase and Bloom Filters (NoSQL databases © myNoSQL)
via: http://www.quora.com/How-are-bloom-filters-used-in-HBase