NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



ha: All content tagged as ha in NoSQL databases and polyglot persistence

Making Hadoop Safe for Mission Critical Applications: CIOs Afraid of the NameNode

We all know there’s lots of excitement and buzz surrounding Hadoop, but talk to some CIOs in “non-web” industries about moving mission critical apps to the open source Big Data framework and you’re bound to hear a little fear in their voices. They’re worried that Hadoop is not ready for primetime because it has a single point of failure.

Seriously? Can you actually visualize a CIO being afraid of the NameNode? Does he look like this?


Original title and link: Making Hadoop Safe for Mission Critical Applications: CIOs Afraid of the NameNode (NoSQL database©myNoSQL)


Quorum-Based Journaling For Hadoop NameNode High Availability

Given two NameNodes, which both think they are active, each with their own unique epoch numbers, how can we avoid Split Brain Syndrome? The answer is suprisingly simple and elegant: when a NameNode sends any message (or remote procedure call) to a JournalNode, it includes its epoch number as part of the request. Whenever the JournalNode receives such a message, it compares the epoch number against a locally stored value called the promised epoch. If the request is coming from a newer epoch, then it records that new epoch as its promised epoch. If instead the request is coming from an older epoch, then it rejects the request.

This sounds a lot like versioned writes as in MVCC or implementations of optimistic locking. To see the complete details of the implementation check Todd Lipcon’s Qurom-Journal Design paper (PDF).

Original title and link: Quorum-Based Journaling For Hadoop NameNode High Availability (NoSQL database©myNoSQL)


Hardware Components Relative Failure Rates Chart

A bit old and most probably not statistically significant, but I’d say it looks correct in general.


Credit Alex Gorbachev, Pythian

Do you agree? Any other sources containing statistics about hardward failure rates?

Original title and link: Hardware Components Relative Failure Rates Chart (NoSQL database©myNoSQL)

Next Neo4j Version Implementing HA Without ZooKeeper

The next version of Neo4j will remove the dependency on ZooKeeper for high availability setups. In a post on Neo4j blog, the team has announced the availability of the 1st milestone of Neo4j 1.9 which already contains the new implementation of Neo4j High Availability Cluster:

With Neo4j 1.9 M01, cluster members communicate directly with each other, based on an implementation of the Paxos consensus protocol for master election.

According to the updated documentation annotated with my own comments:

  • Write transactions can be performed on any database instance in a cluster. (nb: writes are performed on the master first, but the cluster does the routing automatically)
  • If the master fails a new master will be elected automatically. A new master is elected and started within just a few seconds and during this time no writes can take place (the writes will block or in rare cases throw an exception)
  • If the master goes down any running write transaction will be rolled back and new transactions will block or fail until a new master has become available.
  • The cluster automatically handles instances becoming unavailable (for example due to network issues), and also makes sure to accept them as members in the cluster when they are available again.
  • Transactions are atomic, consistent and durable but eventually propagated out to other slaves. (nb: a transaction includes only the write to the master)
  • Updates to slaves are eventual consistent by nature but can be configured to be pushed optimistically from master during commit. (nb: writes to slave will still not be part of the transaction)
  • In case there were changes on the master that didn’t get replicated before it failed, there are chances to reach a situation where two different versions exists—if the failed master recovers. This situation is resolved by having the old master dismiss its copy of the data (nb the documentation says move away)
  • Reads are highly available and the ability to handle read load scales with more database instances in the cluster.

Original title and link: Next Neo4j Version Implementing HA Without ZooKeeper (NoSQL database©myNoSQL)

Rolling With Eventual Consistency or the Pros and Cons of a Dynamo Style Key-Value Store

Great educational post by Casey Rosenthal on Basho’s blog about the radically different approach of data modelling when using non-relational storage engines or non-queryable data models.

In a previous post I wrote about the different mindset that a software engineer should have when building for a key-value database as opposed to a relational database. When working with a relational database, you describe the model first and then query the data later. With a key-value database, you focus first on what you want the result of the query to look like, and then work backward toward a model.

A different way to look at it is that the advantage of the Dynamo’s style high availability key-value store doesn’t come for free. In the world of distributed systems there’s always a trade-off and you need to carefully choose each component of the architecture to match the requirements, but also be aware of the concenssions or complexity you’ll have to accept in other parts of the system.

Original title and link: Rolling With Eventual Consistency or the Pros and Cons of a Dynamo Style Key-Value Store (NoSQL database©myNoSQL)


The Best Defense Against Major Unexpected Failures Is to Fail Often: Netflix Open Sources Chaos Monkey

Why would anyone use Netflix’s Chaos Monkey?

Failures happen and they inevitably happen when least desired or expected. If your application can’t tolerate an instance failure would you rather find out by being paged at 3am or when you’re in the office and have had your morning coffee? Even if you are confident that your architecture can tolerate an instance failure, are you sure it will still be able to next week? How about next month? Software is complex and dynamic and that “simple fix” you put in place last week could have undesired consequences. Do your traffic load balancers correctly detect and route requests around instances that go offline? Can you reliably rebuild your instances? Perhaps an engineer “quick patched” an instance last week and forgot to commit the changes to your source repository?

GitHub repository is here

Original title and link: The Best Defense Against Major Unexpected Failures Is to Fail Often: Netflix Open Sources Chaos Monkey (NoSQL database©myNoSQL)


Redis High Availability and Automatic Failover: Redis Sentinel

After posting about Spreecast’s Redis High Available/Failover solution based on ZooKeeper where I referred to Redis Sentinel, I realized I haven’t linked to Salvatore Sanfilippo’s post about the design of Redis Sentinel:

It is a distributed monitoring system for Redis. On top of the monitoring layer it also implements a notification system with a simple to use API, and an automatic failover solution.

Well, this is a pretty cold description of what Redis Sentinel is. Actually it is a system that also tries to make monitoring fun! In short you have this monitoring unit, the Sentinel. The idea is that this monitoring unit is extremely chatty, it speaks the Redis protocol, and you can ask it many things about how it is seeing the Redis instances it is monitoring, what are the attached slaves, what the other Sentinels monitoring the same system and so forth. Sentinel is designed to interact with other programs a lot.

The official Redis Sentinel documentation is available too here. Salvatore Sanfilippo is actively working on Redis Sentinel and while it is not complete yet, there are already users trying it out. Redis Sentinel will be stable in a few weeks and will be released as part of the Redis 2.8. In case you’ll want to start using it before 2.8 becomes available, use the git unstable branch

Original title and link: Redis High Availability and Automatic Failover: Redis Sentinel (NoSQL database©myNoSQL)

Redis Failover at Spreecast Based on Apache ZooKeeper

Until Redis Sentil becomes generally available, Spreecast’s Ruby library using Apache ZooKeeper might be the solution for high availability Redis clusters:

We decided to address these concerns by creating the redis_failover gem, which was recently released as open-source for Ruby environments employing Redis. The redis_failover gem aims to be a drop-in replacement for the existing Ruby client for Redis. redis_failover is equipped with the capability to recognize and handle an automatic (or manual) failover gracefully. The client knows to automatically direct write operations to the current master and read operations to one of N slaves. redis_failover is built on top of ZooKeeper, a proven distributed configuration and notification system that handles pushing changes to nodes across the network. We decided to use ZooKeeper since it automatically handles network partitions, quorum management, client discovery, and other difficult distributed computing problems.

redis_failover architecture

Original title and link: Redis Failover at Spreecast Based on Apache ZooKeeper (NoSQL database©myNoSQL)


The Behavior of EC2/EBS Metadata Replicated Datastore

The Amazon post about the service disruption that happened late last month provides an interesting description of the behavior of the Amazon EC2 and EBS metadata datastores:

The EC2 and EBS APIs are implemented on multi-Availability Zone replicated datastores. These datastores are used to store metadata for resources such as instances, volumes, and snapshots. To protect against datastore corruption, currently when the primary copy loses power, the system automatically flips to a read-only mode in the other Availability Zones until power is restored to the affected Availability Zone or until we determine it is safe to promote another copy to primary.

Original title and link: The Behavior of EC2/EBS Metadata Replicated Datastore (NoSQL database©myNoSQL)


Four Golden Rules of High Availability. Is Self-Healing a Requirement of Highly Available Systems?

Jared Wray enumerates the following 4 rules for High Availability :

  • No Single Point of failure
  • Self-healing is Required
  • It will go down so plan on it
  • It is going to cost more: […] The discussion instead should be what downtime is acceptable for the business.

I’m not sure there’s a very specific definition of high availability, but the always correct Wikipedia says:

High availability is a system design approach and associated service implementation that ensures a prearranged level of operational performance will be met during a contractual measurement period.

This got me thinking if self-healing is actually a requirement? Could I translated this into asking: is it possible to control the MTTF? Control in the sense of planning operations that would push MTTF into a range that is not consider to break the SLA.

Jim Gray and Daniel P. Siewiorek wrote in their “High Availability Computer Systems”:

The key concepts and techniques used to build high availability computer systems are (1) modularity, (2) fail-fast modules, (3) independent failure modes, (4) redundancy, and (5) repair. These ideas apply to hardware, to design, and to software. They also apply to tolerating operations faults and environmental faults.

Notice the lack of the “self” part. So is self-healing a requirement of highly available systems?

Original title and link: Four Golden Rules of High Availability. Is Self-Healing a Requirement of Highly Available Systems? (NoSQL database©myNoSQL)

Why DynamoDB Consistent Reads Cost Twice or What’s Wrong With Amazon’s DynamoDB Pricing?

Peter Bailis has posted an interesting article about the cost structure for Amazon DynamoDB reads— consistent reads are double the price of eventually consistent reads:

  1. The cost of strong consistency to Amazon is low, if not zero. To you? 2x.
  2. If you were to run your own distributed database, you wouldn’t incur this cost (although you’d have to factor in hardware and ops costs).
  3. Offering a “consistent write” option instead would save you money and latency.
  4. If Amazon provided SLAs so users knew how well eventual consistency worked, users could make more informed decisions about their app requirements and DynamoDB. However, Amazon probably wouldn’t be able to charge so much for strong consistency.

It is not the first time I’ve heard this discussion, but it is the first time I’ve found it in a detailed form. I have no reasons to defend Amazon’s DynamoDB pricing strategy, but:

  1. Comparing the costs of operating self hosted with managed highly available distributed databases seems to me to be out of place and cannot lead to a real conclusion.
  2. While consistent writes could be a solution for always having consistent reads, it would require Amazon to reposition the DynamoDB offer from a highly available database to something else. Considering Amazon has always explained their rationale for building highly available systems I find this difficult to believe it would happen.
  3. Getting back to the consistent vs eventually consistent reads, what one needs to account for is a combination of:

    • costs for cross data center access
    • costs for maintaining the request capacity SLA
    • costs for maintaining the request latency promise
    • penalty costs for not meeting the service commitment

    I agree thought it’s almost impossible to estimate each of these and decide if they lead or not to the increased consistent read price.

Original title and link: Why DynamoDB Consistent Reads Cost Twice or What’s Wrong With Amazon’s DynamoDB Pricing? (NoSQL database©myNoSQL)

Hadoop Namenode High Availability Merged to HDFS Trunk

As I’m slowly recovering after a severe poisoning that I initially ignored but finally put me to bed for almost a week, I’m going to post some of the most interesting articles I’ve read while resting.

Hadoop Namenode’s single point of failure has always been mentioned as one of the weaknesses of Hadoop and also as a differentiator of other Hadoop-based commercial offerings. But now the Namenode HA branch was merged into trunk and while it will take a couple of cicles to complete the tests, this will become soon part of the Hadoop distribution.

Here’s Jitendra Pandey announcement on Hortonworks’s blog:

Significant enhancements were completed to make HOT Failover work:

  • Configuration changes for HA
  • Notion of active and standby states were added to the Namenode
  • Client-side redirection
  • Standby processing journal from Active
  • Dual block reports to Active and Standby

In a follow up post to Gartner’s article Apache Hadoop 1.0 Doesn’t Clear Up Trunks and Branches Questions. Do Distributions?, the advantage of using custom distributions will slowly vanish and the open source version will be the one you’ll want to have in production.

Original title and link: Hadoop Namenode High Availability Merged to HDFS Trunk (NoSQL database©myNoSQL)