ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Odiago WibiData: Analytics Startup Powered by HBase and Hadoop

A new startup powered by HBase and Hadoop, founded by one of Cloudera’s founders Christophe Bisciglia and Hadoop developer and ex-Cloudera Aaron Kimball, focusing on investigative and operational analytics on consumer Internet data:

  • ALL data pertaining to a single user (or mobile device) is kept in a single, possibly very long, HBase row.
  • There are two primary operators in WibiData, Produce and Gather.
  • Produce operates on single rows. It can operate on one row at HBase speed (milliseconds) if you need to inform an interactive user response. Or it can operate on the whole database in batch via Hadoop MapReduce.
  • It is reasonable to think of Produce as mainly doing two things. One is the aforementioned serving of data out of WibiData into interactive applications. The other is scoring, classifying, recommending, etc. on individual users (i.e. rows), in line with an analytic model.
  • Gather typically operates on all your rows at once, and emits suitable input for a MapReduce Reduce step. It is reasonable to think of Gather as being a key cog in the training of analytic models.
  • HBase schema management is done at the WibiData system level, not directly in applications. There’s a WibiData HBase data dictionary, powered by a set of system tables, that specifies cell data types/record types and, in effect, primitive schemas.

One aspect that I’m not familiar with is how HBase can handle multitenancy, a requirement for services like WibiData.

As a side note, I assume this is the type of startups Accel’s $100m fund for Big Data, Hadoop, and NoSQL Databases is targetting.

Original title and link: Odiago WibiData: Analytics Startup Powered by HBase and Hadoop (NoSQL database©myNoSQL)

via: http://www.dbms2.com/2011/11/02/5576/