ha: All content tagged as ha in NoSQL databases and polyglot persistence
A bit old and most probably not statistically significant, but I’d say it looks correct in general.
Do you agree? Any other sources containing statistics about hardward failure rates?
Original title and link: Hardware Components Relative Failure Rates Chart ( ©myNoSQL)
The next version of Neo4j will remove the dependency on ZooKeeper for high availability setups. In a post on Neo4j blog, the team has announced the availability of the 1st milestone of Neo4j 1.9 which already contains the new implementation of Neo4j High Availability Cluster:
With Neo4j 1.9 M01, cluster members communicate directly with each other, based on an implementation of the Paxos consensus protocol for master election.
According to the updated documentation annotated with my own comments:
- Write transactions can be performed on any database instance in a cluster. (nb: writes are performed on the master first, but the cluster does the routing automatically)
- If the master fails a new master will be elected automatically. A new master is elected and started within just a few seconds and during this time no writes can take place (the writes will block or in rare cases throw an exception)
- If the master goes down any running write transaction will be rolled back and new transactions will block or fail until a new master has become available.
- The cluster automatically handles instances becoming unavailable (for example due to network issues), and also makes sure to accept them as members in the cluster when they are available again.
- Transactions are atomic, consistent and durable but eventually propagated out to other slaves. (nb: a transaction includes only the write to the master)
- Updates to slaves are eventual consistent by nature but can be configured to be pushed optimistically from master during commit. (nb: writes to slave will still not be part of the transaction)
- In case there were changes on the master that didn’t get replicated before it failed, there are chances to reach a situation where two different versions exists—if the failed master recovers. This situation is resolved by having the old master dismiss its copy of the data (nb the documentation says move away)
- Reads are highly available and the ability to handle read load scales with more database instances in the cluster.
Original title and link: Next Neo4j Version Implementing HA Without ZooKeeper ( ©myNoSQL)
After posting about Spreecast’s Redis High Available/Failover solution based on ZooKeeper where I referred to Redis Sentinel, I realized I haven’t linked to Salvatore Sanfilippo’s post about the design of Redis Sentinel:
It is a distributed monitoring system for Redis. On top of the monitoring layer it also implements a notification system with a simple to use API, and an automatic failover solution.
Well, this is a pretty cold description of what Redis Sentinel is. Actually it is a system that also tries to make monitoring fun! In short you have this monitoring unit, the Sentinel. The idea is that this monitoring unit is extremely chatty, it speaks the Redis protocol, and you can ask it many things about how it is seeing the Redis instances it is monitoring, what are the attached slaves, what the other Sentinels monitoring the same system and so forth. Sentinel is designed to interact with other programs a lot.
The official Redis Sentinel documentation is available too here. Salvatore Sanfilippo is actively working on Redis Sentinel and while it is not complete yet, there are already users trying it out. Redis Sentinel will be stable in a few weeks and will be released as part of the Redis 2.8. In case you’ll want to start using it before 2.8 becomes available, use the git unstable branch
Original title and link: Redis High Availability and Automatic Failover: Redis Sentinel ( ©myNoSQL)
Jared Wray enumerates the following 4 rules for High Availability :
- No Single Point of failure
- Self-healing is Required
- It will go down so plan on it
- It is going to cost more: […] The discussion instead should be what downtime is acceptable for the business.
I’m not sure there’s a very specific definition of high availability, but the always correct Wikipedia says:
High availability is a system design approach and associated service implementation that ensures a prearranged level of operational performance will be met during a contractual measurement period.
This got me thinking if self-healing is actually a requirement? Could I translated this into asking: is it possible to control the MTTF? Control in the sense of planning operations that would push MTTF into a range that is not consider to break the SLA.
Jim Gray and Daniel P. Siewiorek wrote in their “High Availability Computer Systems”:
The key concepts and techniques used to build high availability computer systems are (1) modularity, (2) fail-fast modules, (3) independent failure modes, (4) redundancy, and (5) repair. These ideas apply to hardware, to design, and to software. They also apply to tolerating operations faults and environmental faults.
Notice the lack of the “self” part. So is self-healing a requirement of highly available systems?
Original title and link: Four Golden Rules of High Availability. Is Self-Healing a Requirement of Highly Available Systems? ( ©myNoSQL)
- The cost of strong consistency to Amazon is low, if not zero. To you? 2x.
- If you were to run your own distributed database, you wouldn’t incur this cost (although you’d have to factor in hardware and ops costs).
- Offering a “consistent write” option instead would save you money and latency.
- If Amazon provided SLAs so users knew how well eventual consistency worked, users could make more informed decisions about their app requirements and DynamoDB. However, Amazon probably wouldn’t be able to charge so much for strong consistency.
It is not the first time I’ve heard this discussion, but it is the first time I’ve found it in a detailed form. I have no reasons to defend Amazon’s DynamoDB pricing strategy, but:
- Comparing the costs of operating self hosted with managed highly available distributed databases seems to me to be out of place and cannot lead to a real conclusion.
- While consistent writes could be a solution for always having consistent reads, it would require Amazon to reposition the DynamoDB offer from a highly available database to something else. Considering Amazon has always explained their rationale for building highly available systems I find this difficult to believe it would happen.
Getting back to the consistent vs eventually consistent reads, what one needs to account for is a combination of:
- costs for cross data center access
- costs for maintaining the request capacity SLA
- costs for maintaining the request latency promise
- penalty costs for not meeting the service commitment
I agree thought it’s almost impossible to estimate each of these and decide if they lead or not to the increased consistent read price.
Original title and link: Why DynamoDB Consistent Reads Cost Twice or What’s Wrong With Amazon’s DynamoDB Pricing? ( ©myNoSQL)
As I’m slowly recovering after a severe poisoning that I initially ignored but finally put me to bed for almost a week, I’m going to post some of the most interesting articles I’ve read while resting.
Hadoop Namenode’s single point of failure has always been mentioned as one of the weaknesses of Hadoop and also as a differentiator of other Hadoop-based commercial offerings. But now the Namenode HA branch was merged into trunk and while it will take a couple of cicles to complete the tests, this will become soon part of the Hadoop distribution.
Significant enhancements were completed to make HOT Failover work:
- Configuration changes for HA
- Notion of active and standby states were added to the Namenode
- Client-side redirection
- Standby processing journal from Active
- Dual block reports to Active and Standby
In a follow up post to Gartner’s article Apache Hadoop 1.0 Doesn’t Clear Up Trunks and Branches Questions. Do Distributions?, the advantage of using custom distributions will slowly vanish and the open source version will be the one you’ll want to have in production.
Original title and link: Hadoop Namenode High Availability Merged to HDFS Trunk ( ©myNoSQL)