This is a sequel to Notes on Paxos post.Similarly, the primarily goal here is for me to understand why the BFT consensus algorithm works in detail.This might, or might not be useful for other people!The Paxos article is a prerequisite, best to read that now, and return to this article tomorrow :)
Note also that while Paxos was more or less a direct translation of Lamport’s lecture, this post is a mish-mash oft the original BFT paper by Liskov and Castro, my own thinking, and a cursory glance as this formalization.As such, the probability that there are no mistakes here is quite low.
BFT stands for Byzantine Fault Tolerant consensus.Similarly to Paxos, we imagine a distributed system of computers communicating over a faulty network which can arbitrary reorder, delay, and drop messages.And we want computers to agree on some specific choice of value among the set of possibilities, such that any two computers pick the same value.Unlike Paxos though, we also assume that computers themselves might be faulty or malicious.So, we add a new condition to our list of bad things.Besides reordering, duplication, delaying and dropping, a fake message can be manufactured out of thin air.
Of course, if absolutely arbitrary messages can be forged, then no consensus is possible — each machine lives in its own solipsistic world which might be completely unlike the world of every other machine.So there’s one restriction — messages are cryptographically signed by the senders, and it is assumed that it is impossible for a faulty node to impersonate non-faulty one.
Can we still achieve consensus?As long as for each f faulty, malicious nodes, we have at least 2f + 1 honest ones.
Similarly to the Paxos post, we will capture this intuition into a precise mathematical statement about trajectories of state machines.
Our plan is to start with vanilla Paxos, and then patch it to allow byzantine behavior.Here’s what we’ve arrived at last time:
Our general idea is to add some “evil” acceptors πΌ to the mix and allow them sending arbitrary messages, while at the same time making sure that the subset of “good” acceptors continues to run Paxos.What makes this complex is that we don’t know which acceptor are good and which are bad.So this is our setup
If previously the quorum condition was “any two quorums have an acceptor in common”, it is now “any two quorums have a good acceptor in common”.An alternative way to say that is “a byzantine quorum is a super-set of normal quorum”, which corresponds to the intuition where we are running normal Paxos, and there are just some extra evil guys whom we try to ignore.For Paxos, we allowed f faulty out of 2f + 1 total nodes with f+1 quorums.For Byzantine Paxos, we’ll have f byzantine out 3f + 1 nodes with 2f+1 quorums.As I’ve said, if we forget about byzantine folks, we get exactly f + 1 out of 2f + 1 picture of normal Paxos.
The next step is to determine behavior for byzantine nodes.They can send any message, as long as they are the author:
That is, a byzantine acceptor can send any 1a or 2a message at any time, while for 1b and 2b the author should match.
What breaks?The most obvious thing is Phase2b, that is, voting.In Paxos, as soon as an acceptor receives a 2a message, it votes for it.The correctness of Paxos hinges on the Safe check before we send 2a message, but a Byzantine node can send an arbitrary 2a.
The solution here is natural: rather than blindly trust 2a messages, acceptors would themselves double-check the safety condition, and reject the message if it doesn’t hold:
Implementation wise, this means that, when a coordinator sends a 2a, it also wants to include 1b messages proving the safety of 2a.But in the spec we can just assume that all messages are broadcasted, for simplicity.Ideally, for correct modeling you also want to model how each acceptor learns new messages, to make sure that negative reasoning about a certain message not being sent doesn’t creep in, but we’ll avoid that here.
However, just re-checking safety doesn’t fully solve the problem.It might be the case that several values are safe at a particular ballot (indeed, in the first ballot any value is safe), and it is exactly the job of a coordinator / 2a message to pick one value to break the tie.And in our case a byzantine coordinator can send two 2a for different valid values.
And here we’ll make the single non-trivial modification to the algorithm.Like the Safe condition is at the heart of Paxos, the Confirmed condition is the heart here.
So basically we expect a good coordinator to send just one 2a message, but a bad one can send many.And we want to somehow distinguish the two cases.One way to do that is to broadcast ACKs for 2a among acceptors.If I received a 2a message, checked that the value therein is safe, and also know that everyone else received this same 2a message, I can safely vote for the value.
So we introduce a new message type, 2ac, which confirms a valid 2a message:
Naturally, evil acceptors can confirm whatever:
But, if we get a quorum of confirmations, we can be sure that no other value will be confirmed in a given ballot (each good acceptors confirms at most a single message in a ballot (and we need a bit of state for that as well))
Putting everything so far together, we get
In the above, I’ve also removed phases 1a and 2a, as byzantine acceptors are allowed to send arbitrary messages as well (we’ll need explicit 1a/2a for liveness, but we won’t discuss that here).
The most important conceptual addition is Phase2ac— if an acceptor receives a new 2a message for some ballot with a safe value, it sends out the confirmation provided that it hadn’t done that already.In Phase2b then we can vote for confirmed values: confirmation by a quorum guarantees both that the value is safe at this ballot, and that this is a single value that can be voted for in this ballot (two different values can’d be confirmed in the same ballot, because quorums have an honest acceptor in common).This almost works, but there’s still a problem.Can you spot it?
The problem is in the Safe condition.Recall that the goal of the Safe condition is to pick a value v for ballot b, such that, if any earlier ballot b1 concludes, the value chosen in b1 would necessary be v.The way Safe works for ballot b in normal Paxos is that the coordinator asks a certain quorum to abstain from further voting in ballots earlier than b, collects existing votes, and uses those votes to pick a safe value.Specifically, it looks at the vote for the highest-numbered ballot in the set, and declares a value from it as safe (it is safe: it was safe at that ballot, and for all future ballots there’s a quorum which abstained from voting).
This procedure puts a lot of trust in that highest vote, which makes it vulnerable.An evil acceptor can just say that it voted in some high ballot, and force a choice of arbitrary value.So, we need some independent confirmation that the vote was cast for a safe value.And we can re-use 2ac messages for this:
And … that’s it, really.Now we can sketch a proof that this thing indeed achieves BFT consensus, because it actually models normal Paxos among non-byzantine acceptors.
Phase1a messages of Paxos are modeled by Phase1a messages of BFT Paxos, as they don’t have any preconditions, the same goes for Phase1b.Phase2a message of Paxos is emitted when a value becomes confirmed in BFT Paxos.This is correct modeling, because BFT’s Safe condition models normal Paxos Safe condition (this … is a bit inexact I think, to make this exact, we want to separate “this value is safe” from “we are voting for this value” in original Paxos as well).Finally, Phase2b also displays direct correspondence.
As a final pop-quiz, I claim that the Confirmed(m.vote.bal, v) condition in Safe above can be relaxed.As stated, Confirmed needs a byzantine quorum of confirmations, which guarantees both that the value is safe and that it is the single confirmed value, which is a bit more than we need here.Do you see what would be enough?