ICS 269, Winter 1999: Theory Seminar
12 February 1999:
A Survey of Rollback-Recovery Protocols in Message-Passing Systems
by: E.N. Elnozahy, D.B. Johnson, Y.M. Wang
Thuan Do, ICS, UC Irvine
Abstract: The problem of rollback-recovery in message-passing
systems has undergone extensive study. In this survey, we review
rollback-recovery techniques that do not require special language
constructs, and classify them into two primary categories.
Checkpoint-based rollback-recovery relies solely on checkpointed
states for system state restoration. Depending on when checkpoints
are taken, existing approaches can be divided into uncoordinated
checkpointing, coordinated checkpointed and communication-induced
checkpointing. Log-based rollback-recovery uses checkpointing and
message logging. There are three different log-based approaches,
namely, pessimistic logging, optimistic logging and causal logging.
We identify a set of desirable properties of rollback-recovery
protocols, and compare different approaches with respect to these
properties. Log-based rollback-recovery protocols generally rely on
the assumption of piecewise determinism and pay additional overhead
to allow faster output commits and more localized recovery. We
present research issues under each approach, and review existing
solutions to address them. We also present implementation issues of
checkpointing and message logging.