NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



NoSQL Case Study: Migrating to HBase/Hadoop to Handle Firefox Crash Reports at Mozilla

What will you do if you’d have to process daily 2.5 million crash reports amounting to around 320Gb of data and you’d have an architecture as the one below?

This is exactly the scenario of handling Firefox crash reports at Mozilla. And according to their ☞ blog, all this will change pretty soon with the migration to HBase and Hadoop.

However, we are in the process of moving to Hadoop, and currently all our crashes are also being written to HBase. Soon this will become our main data storage, and we’ll be able to do a lot more interesting things with the data. We’ll also be able to process 100% of crashes.

Some of the steps involved to get to the new and simplified architecture depicted below:

[…] we will no longer be relying on NFS in production. All crash report submissions are already stored in HBase, but with Socorro 1.7, we will retrieve the data from HBase for processing and store the processed result back into HBase.


[…] we will migrate the processors and minidump_stackwalk instances to run on our Hadoop nodes, further distributing our architecture. This will give us the ability to scale up to the amount of data we have as it grows over time.