ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Firefox Downloads Visualization Powered by HBase

Not only is Mozilla celebrating the release of Firefox 4, but they took the time to set up a nice visualization for downloads.

glow.mozilla.org is powered by tailing logs and streaming data into HBase:

  1. The various load balancing clusters that host download.mozilla.org are configured to log download requests to a remote syslog server.
  2. The remote server is running rsyslog and has a config that specifically filters those remote syslog events into a dedicated file that rolls over hourly
  3. SQLStream is installed on that server and it is tailing those log files as they appear.
  4. The SQLStream pipeline does the following for each request:
    • filtering out anything other than valid download requests
    • uses MaxMind GeoIP to get a geographic location from the IP address
    • uses a streaming group by to aggregate the number of downloads by product, location, and timestamp
    • every 10 seconds, sends a stream of counter increments to HBase for the timestamp row with the column qualifiers being each distinct location that had downloads in that time interval
  5. The glow backend is a python app that pulls the data out of HBase using the Python Thrift interface and writes a file containing a JSON representation of the data every minute.
  6. That JSON file can be cached on the front-end forever since each minute of data has a distinct filename
  7. The glow website pulls down that data and plays back the downloads or allows you to browse the geographic totals in the arc chart view

This sounds a lot like what Facebook is doing for the new Real-Time Analytics system.. The parts missing are Scribe and ptail.

Original title and link: Firefox Downloads Visualization Powered by HBase (NoSQL databases © myNoSQL)

via: http://blog.mozilla.com/data/2011/03/22/how-glow-mozilla-org-gets-its-data/