Static Typing for a Faulty Lambda Calculus [abstract] (ACM DL, PDF)
David Walker, Lester Mackey, Jay Ligatti, George A. Reis, and David I. August
Proceedings of the 11th ACM SIGPLAN International Conference on Functional Programming (ICFP), September 2006.
A transient hardware fault occurs when an energetic particle strikes a
transistor, causing it to change state. These faults do not cause
permanent damage, but may result in incorrect program execution by
altering signal transfers or stored values. While the likelihood that
such transient faults will cause any significant damage may seem
remote, over the last several years transient faults have caused
costly failures in high-end machines at America Online, eBay, and the
Los Alamos Neutron Science Center, among others. Because
susceptibility to transient faults is proportional to the size and
density of transistors, the problem of transient faults will become
increasingly important in the coming decades. This paper defines the first formal, type-theoretic framework for
studying reliable computation in the presence of transient faults.
More specifically, it defines lzap, a lambda calculus that exhibits
intermittent data faults. In order to detect and recover from these
faults, lzap programs replicate intermediate computations and use
majority voting, thereby modeling software-based fault tolerance
techniques studied extensively, but informally. To ensure that programs maintain the proper invariants and use lzap
primitives correctly, the paper defines a type system for the
language. This type system guarantees that well-typed programs can
tolerate any single data fault. To demonstrate that lzap can serve as
an idealized typed intermediate language, we define a type-preserving
translation from a standard simply-typed lambda calculus into lzap.