G.M. Kuper, Ken McAloon, et al.
Journal of Automated Reasoning
In this paper we combine two previously disparate aspects of reliable distributed computing - self-stabilization, i.e., tolerance of systemic failures, and fault-tolerance, i.e., tolerance of process failures. We define what it means for a protocol to solve a problem while tolerating both types of failures and demonstrate a `compiler' that transforms a process failure-tolerant protocol for a synchronous system into a process and systemic failure-tolerant protocol. For asynchronous systems, we present a protocol that solves a crucial problem (Consensus) while tolerating both process and systemic failures.
G.M. Kuper, Ken McAloon, et al.
Journal of Automated Reasoning
Nayeem Islam, Andreas L. Prodromidis, et al.
ICDCS 1997
Kenneth J. Perry, Sam Toueg
IEEE Transactions on Software Engineering
Marc Snir
PODC 1993