Distributed System Reliability: It's About Operations, Not Architecture or Design
Jay Kreps1:
I have come around to the view that the real core difficulty of these systems is operations, not architecture or design. Both are important but good operations can often work around the limitations of bad (or incomplete) software, but good software cannot run reliably with bad operations. […] I really think there is really only one thing to talk about with respect to reliability: continuous hours of successful production operations.
Original title and link: Distributed System Reliability: It’s About Operations, Not Architecture or Design (©myNoSQL)
via: http://blog.empathybox.com/post/19574936361/getting-real-about-distributed-system-reliability