Joshua S. Auerbach, Ajei S. Gopal, et al.
ICDCS 1994
Optimistic failure recovery mechanisms are proposed as a way to provide transparent fault tolerance to distributed applications and systems. The authors identify problems that may arise when these mechanisms are applied to vast networks including many processors and spanning large geographical areas and many administrative domains. They present a technique--recovery unit gateways--that can be used to address many of these issues with existing failure recovery algorithms. This method can be applied with minimal disruption to existing transparent recovery systems, as well as to build large optimistic recovery systems while minimizing the dependency tracking overhead.
Joshua S. Auerbach, Ajei S. Gopal, et al.
ICDCS 1994
Michael S. Meier, Kevan Miller, et al.
SIGMETRICS Parallel and Distributed Tools 1996
Flaviu Cristian, Farnam Jahanian
SRDS 1991