Ruixiong Tian, Zhe Xiang, et al.
Qinghua Daxue Xuebao/Journal of Tsinghua University
A total failure occurs whenever all processes cooperatively executing a distributed task fail before the task completes. A frequent prerequisite for recovery from a total failure is identification of the last set (LAST) of processes to fail. Necessary and sufficient conditions are derived here for computing LAST from the local failure data of recovered processes. These conditions are then translated into procedures for deciding LAST membership, using either complete or incomplete failure data. The choice of failure data is itself dictated by two requirements: (1) it can be cheaply maintained, and (2) it must afford maximum fault-tolerance in the sense that the expected number of recoveries required for identifying LAST is minimized. © 1985, ACM. All rights reserved.
Ruixiong Tian, Zhe Xiang, et al.
Qinghua Daxue Xuebao/Journal of Tsinghua University
Renu Tewari, Richard P. King, et al.
IS&T/SPIE Electronic Imaging 1996
Zohar Feldman, Avishai Mandelbaum
WSC 2010
Kaoutar El Maghraoui, Gokul Kandiraju, et al.
WOSP/SIPEW 2010