From Paxos to BFT
This is a sequel to Notes on Paxos post. Similarly, the primarily goal here is for me to understand why the BFT consensus algorithm works in detail. This might, or might not be useful for other people! The Paxos article is a prerequisite, best to read that now, and return to this article tomorrow :)
Note also that while Paxos was more or less a direct translation of Lamport’s lecture, this post is a mish-mash oft the original BFT paper by Liskov and Castro, my own thinking, and a cursory glance as this formalization. As such, the probability that there are no mistakes here is quite low.
What is BFT?
BFT stands for Byzantine Fault Tolerant consensus. Similarly to Paxos, we imagine a distributed system of computers communicating over a faulty network which can arbitrary reorder, delay, and drop messages. And we want computers to agree on some specific choice of value among the set of possibilities, such that any two computers pick the same value. Unlike Paxos though, we also assume that computers themselves might be faulty or malicious. So, we add a new condition to our list of bad things. Besides reordering, duplication, delaying and dropping, a fake message can be manufactured out of thin air.
Of course, if absolutely arbitrary messages can be forged, then no consensus is possible — each machine lives in its own solipsistic world which might be completely unlike the world of every other machine. So there’s one restriction — messages are cryptographically signed by the senders, and it is assumed that it is impossible for a faulty node to impersonate non-faulty one.
Can we still achieve consensus?
As long as for each
f faulty, malicious nodes, we have at least
2f + 1 honest ones.
Similarly to the Paxos post, we will capture this intuition into a precise mathematical statement about trajectories of state machines.
Our plan is to start with vanilla Paxos, and then patch it to allow byzantine behavior. Here’s what we’ve arrived at last time:
Our general idea is to add some “evil” acceptors 𝔼 to the mix and allow them sending arbitrary messages, while at the same time making sure that the subset of “good” acceptors continues to run Paxos. What makes this complex is that we don’t know which acceptor are good and which are bad. So this is our setup
If previously the quorum condition was “any two quorums have an acceptor in common”, it is now “any two quorums have a good acceptor in common”.
An alternative way to say that is “a byzantine quorum is a super-set of normal quorum”, which corresponds to the intuition where we are running normal Paxos, and there are just some extra evil guys whom we try to ignore.
For Paxos, we allowed
f faulty out of
2f + 1 total nodes with
For Byzantine Paxos, we’ll have
f byzantine out
3f + 1 nodes with
As I’ve said, if we forget about byzantine folks, we get exactly
f + 1 out of
2f + 1 picture of normal Paxos.
The next step is to determine behavior for byzantine nodes. They can send any message, as long as they are the author:
That is, a byzantine acceptor can send any
2a message at any time, while for
2b the author should match.
The most obvious thing is
Phase2b, that is, voting.
In Paxos, as soon as an acceptor receives a
2a message, it votes for it.
The correctness of Paxos hinges on the
Safe check before we send
2a message, but a Byzantine node can send an arbitrary
The solution here is natural: rather than blindly trust
2a messages, acceptors would themselves double-check the safety condition, and reject the message if it doesn’t hold:
Implementation wise, this means that, when a coordinator sends a
2a, it also wants to include
1b messages proving the safety of
But in the spec we can just assume that all messages are broadcasted, for simplicity.
Ideally, for correct modeling you also want to model how each acceptor learns new messages, to make sure that negative reasoning about a certain message not being sent doesn’t creep in, but we’ll avoid that here.
However, just re-checking safety doesn’t fully solve the problem.
It might be the case that several values are safe at a particular ballot (indeed, in the first ballot any value is safe), and it is exactly the job of a coordinator /
2a message to pick one value to break the tie.
And in our case a byzantine coordinator can send two
2a for different valid values.
And here we’ll make the single non-trivial modification to the algorithm.
Safe condition is at the heart of Paxos, the
Confirmed condition is the heart here.
So basically we expect a good coordinator to send just one
2a message, but a bad one can send many.
And we want to somehow distinguish the two cases.
One way to do that is to broadcast ACKs for
2a among acceptors.
If I received a
2a message, checked that the value therein is safe, and also know that everyone else received this same
2a message, I can safely vote for the value.
So we introduce a new message type,
2ac, which confirms a valid
Naturally, evil acceptors can confirm whatever:
But, if we get a quorum of confirmations, we can be sure that no other value will be confirmed in a given ballot (each good acceptors confirms at most a single message in a ballot (and we need a bit of state for that as well))
Putting everything so far together, we get
In the above, I’ve also removed phases
2a, as byzantine acceptors are allowed to send arbitrary messages as well (we’ll need explicit
2a for liveness, but we won’t discuss that here).
The most important conceptual addition is
Phase2ac — if an acceptor receives a new
2a message for some ballot with a safe value, it sends out the confirmation provided that it hadn’t done that already.
Phase2b then we can vote for confirmed values: confirmation by a quorum guarantees both that the value is safe at this ballot, and that this is a single value that can be voted for in this ballot (two different values can’d be confirmed in the same ballot, because quorums have an honest acceptor in common).
This almost works, but there’s still a problem.
Can you spot it?
The problem is in the
Recall that the goal of the
Safe condition is to pick a value
v for ballot
b, such that, if any earlier ballot
b1 concludes, the value chosen in
b1 would necessary be
Safe works for ballot
b in normal Paxos is that the coordinator asks a certain quorum to abstain from further voting in ballots earlier than
b, collects existing votes, and uses those votes to pick a safe value.
Specifically, it looks at the vote for the highest-numbered ballot in the set, and declares a value from it as safe (it is safe: it was safe at that ballot, and for all future ballots there’s a quorum which abstained from voting).
This procedure puts a lot of trust in that highest vote, which makes it vulnerable.
An evil acceptor can just say that it voted in some high ballot, and force a choice of arbitrary value.
So, we need some independent confirmation that the vote was cast for a safe value.
And we can re-use
2ac messages for this:
And … that’s it, really. Now we can sketch a proof that this thing indeed achieves BFT consensus, because it actually models normal Paxos among non-byzantine acceptors.
Phase1a messages of Paxos are modeled by Phase1a messages of BFT Paxos, as they don’t have any preconditions, the same goes for Phase1b. Phase2a message of Paxos is emitted when a value becomes confirmed in BFT Paxos. This is correct modeling, because BFT’s Safe condition models normal Paxos Safe condition (this … is a bit inexact I think, to make this exact, we want to separate “this value is safe” from “we are voting for this value” in original Paxos as well). Finally, Phase2b also displays direct correspondence.
As a final pop-quiz, I claim that the
Confirmed(m.vote.bal, v) condition in
Safe above can be relaxed.
Confirmed needs a byzantine quorum of confirmations, which guarantees both that the value is safe and that it is the single confirmed value, which is a bit more than we need here.
Do you see what would be enough?
The final specification contains this relaxation:
TLA+ specs for this post are available here: https://github.com/matklad/paxosnotes.