Cloudera Adds HBase to CDH
Cloudera talks about the addition of HBase to the Cloudera’s Distribution of Hadoop announced during the Hadoop summit:
Analysis of continuously updated data: With data access methods available for all major languages, it’s simple to interface data-generating applications like web crawlers, log collectors, or web applications to write into HBase. For example, the next generation of the Nutch web crawler stores its data in HBase. Once the data generators insert the data, HBase enables MapReduce analysis on either the latest data or a snapshot at any recent timestamp.
User-facing analytics: These user-facing data applications rely on the ability not just to compute the models, but also to make the computed data available for latency-sensitive lookup operations. For these applications, it’s simple to integrate HBase as the destination for a MapReduce job. Extremely efficient incremental and bulk loading features allow the operational data to be updated while simultaneously serving traffic to latency-sensitive workloads. Compared with alternative data stores, the tight integration with other Hadoop projects as well as the consolidation of infrastructure are tangible benefits.
While it may look like the main reason for the HBase addition is its perfect integration with Hadoop, other NoSQL databases like Cassandra and Hypertable are also on improving their integration with tools like Hive, Hadoop, etc..
via: http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3-b2-hbase/