Non-Uniform Fault Tolerance [abstract] (PDF)
Jonathan Chang, George A. Reis, and David I. August
Proceedings of the 2nd Workshop on Architectural Reliability (WAR), December 2006.
As devices become more susceptible to transient faults that can affect
program correctness, processor designers will increasingly compensate
by adding hardware or software redundancy. Proposed redundancy
techniques and those currently in use are generally applied uniformly
to a structure despite non-uniformity in the way errors within the
structure manifest themselves in programs. This uniform protection
leads to inefficiency in terms of performance, power, and area. Using
case studies involving the register file, this paper motivates an
alternative \emph{Non-Uniform Fault Tolerance} approach which improves
reliability over uniform approaches by spending the redundancy budget
on those areas most susceptible.