A thorough post from Trend Micro Hadoop Group (Mingjie Lai, Eugene Koontz, Andrew Purtell) explaining all details of HBase coprocessors included in the latest HBase release 0.92.0:
Why HBase Coprocessors?
HBase has very effective MapReduce integration for distributed computation over data stored within its tables, but in many cases – for example simple additive or aggregating operations like summing, counting, and the like – pushing the computation up to the server where it can operate on the data directly without communication overheads can give a dramatic performance improvement over HBase’s already good scanning performance.
Also, before 0.92, it was not possible to extend HBase with custom functionality except by extending the base classes.
What are HBase Coprocessors?
In order to support sufficient flexibility for potential coprocessor behaviors, two different aspects of extension are provided by the framework. One is the observer, which are like triggers in conventional databases, and the other is the endpoint, dynamic RPC endpoints that resemble stored procedures.
What can HBase Coprocessors be used for?
exciting new features can be built on top of it, for example secondary indexing, complex filtering (push down predicates), and access control.
These are just a couple of interesting points from this excellent article. I strongly suggest reading it.
Original title and link: HBase Coprocessors Explained ( ©myNoSQL)