distributed systems: All content tagged as distributed systems in NoSQL databases and polyglot persistence
Jared Wray enumerates the following 4 rules for High Availability :
- No Single Point of failure
- Self-healing is Required
- It will go down so plan on it
- It is going to cost more: […] The discussion instead should be what downtime is acceptable for the business.
I’m not sure there’s a very specific definition of high availability, but the always correct Wikipedia says:
High availability is a system design approach and associated service implementation that ensures a prearranged level of operational performance will be met during a contractual measurement period.
This got me thinking if self-healing is actually a requirement? Could I translated this into asking: is it possible to control the MTTF? Control in the sense of planning operations that would push MTTF into a range that is not consider to break the SLA.
Jim Gray and Daniel P. Siewiorek wrote in their “High Availability Computer Systems”:
The key concepts and techniques used to build high availability computer systems are (1) modularity, (2) fail-fast modules, (3) independent failure modes, (4) redundancy, and (5) repair. These ideas apply to hardware, to design, and to software. They also apply to tolerating operations faults and environmental faults.
Notice the lack of the “self” part. So is self-healing a requirement of highly available systems?
Original title and link: Four Golden Rules of High Availability. Is Self-Healing a Requirement of Highly Available Systems? ( ©myNoSQL)
Besides the many practical lessons emphasized in Jack Clark’s interview with Adrian Cockcroft on ZDNet—luckly I’ve had the chance to see some of Cockcroft’s presentations about Netflix architecture and also talk to him directly—one thing that sticked with me was the ending paragraph:
The thing I’ve been publicly asking for has been better IO in the cloud. Obviously I want SSDs in there. We’ve been asking cloud vendors to do that for a while. With Cassandra, we’ve had to go onto horizontal scale and use the internal disks and triple replicate across availability zones, so you end up with a triple-redundant data store that is careful not to overload the disks.
That reminded me of this old ACM article authored by Brendan Gregg:
When I/O latency is presented as a visual heat map, some intriguing and beautiful patterns can emerge. These patterns provide insight into how a system is actually performing and what kinds of latency end-user applications experience. Many characteristics seen in these patterns are still not understood, but so far their analysis is revealing systemic behaviors that were previously unknown.
I was wondering if in the NoSQL databases space (and data storage space in general) are there any of the monitoring tools that provide such advanced visualization of latency data. Do you know any?
Original title and link: Visualizing System Latency ( ©myNoSQL)
Eventual and Strong Consistency, Sloppy and Strict Quorums, and Other Lessons and Thoughts on Distributed Systems
Anything I’d write would just steal from your time to read and think about the email Joseph Blomstedt posted to the Riak list.
Original title and link: Eventual and Strong Consistency, Sloppy and Strict Quorums, and Other Lessons and Thoughts on Distributed Systems ( ©myNoSQL)
Apache ZooKeeper, the high-performance coordination service exposing services like naming, configuration management, synchronization, etc. for distributed applications, has reached version 3.4.0.
The most important ones are summarized by Patrick Hunt in this Cloudera blog post:
- ZooKeeper 3.3.3 clients are compatible with 3.4.0 servers
- Native Windows version of C client
- Support Kerberos authentication of clients
- Support Kerberos authentication of clients
- Improved REST Interface
- Existing monitoring support has been extended through the introduction of a new ‘mntr’ 4 letter word
- Add tools and recipes for monitoring as a contrib
- Web-based Administrative Interface
- Automating log and snapshot cleaning
- Add logging/stats to identify production deployment issues
- Support for building RPM and DEB packages
Something to keep in mind though: ZooKeeper 3.4.0 is not production ready yet. After extensive testing, it will be followed soon by a minor release that will be production-ready.
Original title and link: Apache ZooKeeper 3.4.0 Released to Be Followed Soon by Production-Ready Version ( ©myNoSQL)
Cadir Lee (CTO Zynga) quoted in a VentureBeat post:
It’s not the amount of hardware that matters. It’s the architecture of the application. You have to work at making your app architecture so that it takes advantage of Amazon. You have to have complete fluidity with the storage tier, the web tier. We are running our own data centers. We are looking more at doing our own data centers with more of a private cloud.
Couple of thoughts:
- Zynga is going the opposite direction than Netflix. While Netflix is focusing (by using Amazon for most of their infrastructure), Zynga is diversifying (building their own data centers) .
- Zynga’s applications are great examples of where fully distributed NoSQL databases fit. Availability is key.
- My answer to the question: “how many Zyngas are out there” would be: “enough to ensure some good business for the most reliable and scalable distributed databases”
- Zynga has contributed and is an investor in Membase, the company that merged with CouchOne to form Couchbase. But Zynga was using a custom version of Membase.
- Zynga also operates a large MySQL cluster.
- Zynga processes over 15 terabytes of game data every day (according to their SEC filing ). That’s Hadoop sweet spot.
PS: I’d love to talk to someone from Zynga about their data storage approach. If you have any connections I’d really appreciate an introduction.
Original title and link: Zynga, Data Centers, Polyglot Persistence, and Big Data ( ©myNoSQL)