Lars George and Nicolas Spiegelberg — both HBase committers, Nicolas also being the guy that implemented HBase Bloom filters — explaining the pros (and cons) of using Bloom filters in HBase:
Keep in mind that HBase only has a block index per file, which is rather course grained and tells the reader that a key may be in the file because it falls into a start and end key range in the block index. But if the key is actually present can only be determined by loading that block and scanning it.
This also places a burden on the block cache and you may create a lot of unnecessary churn that the bloom filters would help avoid.
Get/Scan(Row) currently does a parallel N-way get of that Row from all StoreFiles in a Region. This means that you are doing N read requests from disk. BloomFilters provide a lightweight in-memory structure to reduce those N disk reads to only the files likely to contain that Row (N-B).
Original title and link: HBase and Bloom Filters (NoSQL databases © myNoSQL)