Software Fault Detection Using Dynamic Instrumentation [abstract] (CiteSeerX, PDF)
George A. Reis, David I. August, Robert Cohn, and Shubhendu S. Mukherjee
Proceedings of the Fourth Annual Boston Area Architecture Workshop (BARC), February 2006.

Software-only approaches to increase hardware reliability have been proposed and evaluated as alternatives to hardware modification. These techniques have shown that they can significantly improve reliability with reasonable performance overhead. Software-only techniques do not require any hardware support and thus are far cheaper and easier to deploy. These techniques can be used for systems that have already been manufactured and now require higher reliability than the hardware can offer.

All previous proposals have been static compilation techniques that rely on source code transformations or alterations to the compilation process. Our proposal is the first application of software fault detection for transient errors that increases reliability dynamically. The application of our technique is trivial since the only requirement is the program binary, which makes it applicable for legacy programs that no longer have readily available or easily re-compilable source code. Our dynamic reliability technique can seamlessly handle variable-length instructions, mixed code and data, statically unknown indirect jump targets, dynamically generated code, and dynamically loaded libraries. Our technique is also able attach to an already running application to increase its reliability, and detach when appropriate, thus returning to faster (although unreliable) execution.