Firefox Downloads Visualization Powered by HBase
Not only is Mozilla celebrating the release of Firefox 4, but they took the time to set up a nice visualization for downloads.
glow.mozilla.org is powered by tailing logs and streaming data into HBase:
- The various load balancing clusters that host download.mozilla.org are configured to log download requests to a remote syslog server.
- The remote server is running rsyslog and has a config that specifically filters those remote syslog events into a dedicated file that rolls over hourly
- SQLStream is installed on that server and it is tailing those log files as they appear.
- The SQLStream pipeline does the following for each request:
- filtering out anything other than valid download requests
- uses MaxMind GeoIP to get a geographic location from the IP address
- uses a streaming group by to aggregate the number of downloads by product, location, and timestamp
- every 10 seconds, sends a stream of counter increments to HBase for the timestamp row with the column qualifiers being each distinct location that had downloads in that time interval
- The glow backend is a python app that pulls the data out of HBase using the Python Thrift interface and writes a file containing a JSON representation of the data every minute.
- That JSON file can be cached on the front-end forever since each minute of data has a distinct filename
- The glow website pulls down that data and plays back the downloads or allows you to browse the geographic totals in the arc chart view
This sounds a lot like what Facebook is doing for the new Real-Time Analytics system.. The parts missing are Scribe and ptail.
Original title and link: Firefox Downloads Visualization Powered by HBase (NoSQL databases © myNoSQL)
via: http://blog.mozilla.com/data/2011/03/22/how-glow-mozilla-org-gets-its-data/