ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

What Can Be Learned From Heroku Outage Postmortem

While some may learn a few new things or get a confirmation in the very details of the outage, what caught my attention in the Heroku’s postmortem analysis is the conclusions:

  • higher sensitivity and more aggressive monitoring on a variety of metrics
  • improved early warning systems
  • better containment
  • improved flow controls, both manual and automatic
  • expanding simulations of unusual load conditions in our staging environment

None of these are particular to a specific storage or NoSQL database. But they all reflect the reality of operating at large scale where even the most operationally friendly solutions—think of Dynamo-inspired NoSQL databases—cannot and should not be left unmonitored or unsupervised or with no clear recovery strategies and processes in place.

In the NoSQL world, one of the most covered outages was the MongoDB outage at Foursquare. And in case you don’t remember the details, most of the circumstances that led to that event could have been prevented by having:

  1. better monitoring
  2. early warnings
  3. better operational procedures

Aren’t these two lists looking very alike?

Original title and link: What Can Be Learned From Heroku Outage Postmortem (NoSQL database©myNoSQL)

via: https://status.heroku.com/incident/308