Reaching Agreement in the Presence of Faults

Journal of the Association for Computing Machinery 27 | , Vol 2

2005 Edsger W. Dijkstra Prize in Distributed Computing

Before this paper, it was generally assumed that a three-processor system could tolerate one faulty processor. This paper shows that “Byzantine” faults, in which a faulty processor sends inconsistent information to the other processors, can defeat any traditional three-processor algorithm. (The term Byzantine didn’t appear until [46].) In general, 3n+1 processors are needed to tolerate n faults. However, if digital signatures are used, 2n+1 processors are enough. This paper introduced the problem of handling Byzantine faults. I think it also contains the first precise statement of the consensus problem.

I am often unfairly credited with inventing the Byzantine agreement problem. The problem was formulated by people working on SIFT (see [30]) before I arrived at SRI. I had already discovered the problem of Byzantine faults and written [29]. (I don’t know if that was earlier than or concurrent with its discovery at SRI.) However, the people at SRI had a much simpler and more elegant statement of the problem than was present in [29].

The 4-processor solution presented in this paper and the general impossibility result were obtained by Shostak; Pease invented the 3n+1-processor solution. My contribution to the work in this paper was the solution using digital signatures, which is based on the algorithm in [29]. It was my work on digital signatures (see [36]) that led me to think in that direction. However, digital signatures, as used here, are a metaphor. Since the signatures need be secure only against random failure, not against an intelligent adversary, they are much easier to implement than true digital signatures. However, this point seems to have escaped most people, so they rule out the algorithms that use digital signatures because true digital signature algorithms are expensive. Thus, 3n+1-processor solutions are used even though there are satisfactory 2n+1-processor solutions,

My other contribution to this paper was getting it written. Writing is hard work, and without the threat of perishing, researchers outside academia generally do less publishing than their colleagues at universities. I wrote an initial draft, which displeased Shostak so much that he completely rewrote it to produce the final version.

Over the years, I often wondered whether the people who actually build airplanes know about the problem of Byzantine failures. In 1997, I received email from John Morgan who used to work at Boeing. He told me that he came across our work in 1986 and that, as a result, the people who build the passenger planes at Boeing are aware of the problem and design their systems accordingly. But, in the late 80s and early 90s, the people at Boeing working on military aircraft and on the space station, and the people at McDonnell-Douglas, did not understand the problem. I have no idea what Airbus knows or when they knew it.

This paper was awarded the 2005 Edsger W. Dijkstra Prize in Distributed Computing.