Given two NameNodes, which both think they are active, each with their own unique epoch numbers, how can we avoid Split Brain Syndrome? The answer is suprisingly simple and elegant: when a NameNode sends any message (or remote procedure call) to a JournalNode, it includes its epoch number as part of the request. Whenever the JournalNode receives such a message, it compares the epoch number against a locally stored value called the promised epoch. If the request is coming from a newer epoch, then it records that new epoch as its promised epoch. If instead the request is coming from an older epoch, then it rejects the request.
This sounds a lot like versioned writes as in MVCC or implementations of optimistic locking. To see the complete details of the implementation check Todd Lipcon’s Qurom-Journal Design paper (PDF).
Original title and link: Quorum-Based Journaling For Hadoop NameNode High Availability ( ©myNoSQL)