Workshop on Recent Advances on Intrusion-Tolerant Systems

WRAITS 2007

March 23, 2007

 

Keynote speech

 

Automatic Recovery from Failures and Attacks Using
Bounded Partially Observable Markov Decision Processes

William H. Sanders

Donald Biggar Willett Professor of Engineering 
Director, Information Trust Institute 
University of Illinois at Urbana-Champaign

Abstract:

Providing high availability with acceptable cost is becoming increasingly critical in a wide variety of application domains. Automatic system monitoring, followed by diagnosis and recovery, has the potential to provide the required low-cost solution, high-availability solution that is needed in those domains. However, automating recovery is difficult in practical settings because of low coverage, poor localization abilities, false positives, and/or false negatives exhibited by many commonly used monitoring techniques. Furthermore, multiple recovery actions are often applicable even when the fault or attack can be correctly diagnosed, and the correct choice is not always obvious.  We present a holistic model-based approach that overcomes these challenges and enables automatic recovery in distributed systems. To do so, it uses theoretically sound techniques to provide controllers that choose good, if not optimal, recovery actions according to a user defined optimization criteria. The automatic recovery problem is framed as an undiscounted accumulated reward maximization problem on a structurally restricted class of partially observable Markov decision processes (POMDPs) that we call R-POMDPs. The resulting recovery controller can be proven to always terminate in a finite amount of time, to probabilistically not terminate before recovery is finished, and to generate recovery actions that are on the average no more expensive than promised. Simulation-based experimental results on a realistic e-commerce system demonstrate that the proposed bounds can be rapidly improved iteratively, and the resulting controller convincingly outperforms a controller that uses heuristics instead of bounds.