The Compiler Foundations Project[Prakash Prabhu is responsible for this page]
Compiler Research to address emerging issues by eliminating conventional limitations
New challenges in computer architecture demand new roles for compilation systems. We have demonstrated that these systems can effectively address problems of reliability, security, and scalability, problems they were not originally designed to handle. Doing so required overcoming limitations fundamental to their design by leveraging our research in the understanding and manipulation of program dataflow [MICRO 2000], control flow [ISCA 1998, ISCA 1999, IEEE 2001, IEEE 1997, IJPP 1999, Book 2005, ESA 2004, ISSS 2001], and memory behavior [CGO 2005, CGO 2004, MSP 2005, PLDI 2007]. Modern compilers still use a decades old conventional compiler organization, even though it limits the effectiveness of most compiler optimizations. The research described in this section provides compiler optimizations with greater freedom and increased effectiveness in the presence of other, potentially interfering, optimizations. This research not only make compilers more effective, but also provides compiler researchers with a means to more thoroughly understand the strength of their work.
Smart Memory Analysis, Profiling, and Intelligent Speculation
One of the most important factors that hinder advanced compiler optimizations is spurious and infrequently occuring memory dependences. Traditionally, pointer analysis has been applied to remove memory dependences that inhibit optimizations. However, conventional pointer analysis works once on a high level intermediate representation (IR), and its results are conservatively maintained by all subsequent code transformations. This imposes a phase-ordering constraint, causing the precision of alias information to be diluted by code transformations. To overcome these limitations, we proposed Low-Level Pointer Analysis (VLLPA) [CGO 2005] that operates at the assembly level, yet is context sensitive and partially flow sensitive, and handles all features of all features of C, such as unions. A comparison with a state-of-the-art high-level-IR based pointer analysis  shows that VLLPA is less precise on only 0.3% of memory operations and more precise on 26.8% of memory operations. Pointer analysis, while disambiguating pointers to scalar variables, cannot distinguish between elements of linked data structures. In order to do this, we designed and implemented shape analysis [PLDI 2007], a more advanced memory analysis technique that determines the "shape" properties of these data structures, e.g. whether the data structure is a tree or a list, whether it contains cycle, etc. Our shape analysis encodes the shape properties using recursive predicates written in separation logic, and leverages a machine learning technique to automatically synthesize these predicates.
Both pointer and shape analysis are static analysis solutions, which are necessarily conservative to produce sound results across all inputs to a program. However, emerging applications written in languages like C/C++ frequently contain pointer-based code marked by irregular accesses to memory and irregular control flow, whose runtime behavior varies widely based on input. In such cases, static analysis fails and we use memory profiling to provide useful information on the dynamic memory behavior of programs that static memory analysis can not provide. We have developed efficient profiling techniques that have minimal runtime overheads [CC 2006, MSP 2005, CGO 2004]. In particular these techniques rely on heuristics strategically select key memory dependences to profile, and, in the common case where aliasing does not occur, enable aggressive optimizations. In addition to profiling based on loads and stores, we have developed profiling techniques that identifies the recursive data structures created when a program is executed and provides useful information including the shape and size of the data structure, and the traversal pattern [MSP 2005, CGO 2004]. This information can be used to perform optimizations that manipulate entire data structures. In our work, we leverage this profiling information to speculatively remove memory dependences [Book 2005]. Speculation is performed intelligently, by removing only those memory dependences that inhibit particular optimizations of interest to a client.
Procedure Boundary Elimination
A particularly limiting feature of the conventional compilation framework is its procedure-based nature. This limits code transformations to within a single procedure's boundaries, which may not present an ideal scope for optimization. Although aggressive inlining and certain interprocedural optimization routines can alleviate this problem, these can be applied only sparingly, the former because it causes excessive code growth, and the latter because their compile-time requirements scale poorly with program size. A solution to this problem, is a new compiler approach called Procedure Boundary Elimination(PBE) [PLDI 2006].
PBE expands the scope of optimization by unifying an entire application's code into an optimizable program-wide representation, called the whole-program control-flow graph(WCFG). By employing an expanded version of Hank's region formation, the compiler can then divide the WCFG into more manageable compilation units. These new compilation units are chosen according to the needs of optimization, regardless of the original procedure boundaries. Since code duplication, such as that caused by inlining, can often increase optimization opportunities, PBE also employs a targeted code specialization phase, which causes code growth only where it is likely to produce actual benefits. Of course, the central challenge that this scheme faces is that subsequent optimization and analysis routines must be able to operate on arbitrary portions of the WCFG. This problem was solved by extending and generalizing interprocedural analysis algorithms and by properly encapsulating regions. Results confirm that PBE delivers, with reasonable compile times, the performance improvement of aggressive inlining and interprocedural optimization (10-50%), without its code growth (30-200%). Since PBE changes the way transformations interact with the code, we expect it to teach us quite a bit about existing compiler techniques and to inspire new ones.
With performance, reliability, security, power, and other demands placed on them, optimizing compilers must incorporate an ever larger number of increasingly complex transformation algorithms. Since code transformations may lead to performance pitfalls or interfere with subsequent transformations, compilers must employ predictive heuristics for optimization guidance. The unpredictability of optimization interaction and the irregularity of modern machines severely limits the accuracy of these heuristics. As a result, during heuristic and optimization tuning, compiler writers temper high variance optimizations, limit their use, or may even exclude them entirely. While this results in compilers with good average code quality, opportunities are missed in individual code segments. We developed an alternative to the traditionally tuned compilation process, called optimization-space exploration (OSE). An OSE compiler considers many promising transformation alternatives for each hot code segment and selects the best one according to a performance model. By evaluating transformation sequences a posteriori, OSE reduces the dependence on predictive heuristics. Unlike previous iterative compilation attempts, which suffer from unrealistic compile times (days or years) and focus on limited portions of the optimization process, OSE is the first iterative compilation technique fast enough and general enough for widespread use in general-purpose compilation. To achieve this performance, OSE narrows down the space of optimization options explored through judicious use of heuristics. With help from Intel, OSE was implemented and proven in Intel's Electron production compiler. OSE achieves significant performance improvement on Intel's Itanium, more than 20% in some cases over best prior results, with only a 50\% increase compile time [JILP 2005, CGO 2003].
Project Ph.D. Graduates
Selected Project Publications
All Project Publications
Optimizations for the Memory Hierarchy
Shape Analysis with Inductive Recursion Synthesis [abstract] (ACM DL, PDF)
Automatic Instruction Scheduler Retargeting by Reverse-Engineering [abstract] (ACM DL, PDF)
A Framework for Unrestricted Whole-Program Optimization [abstract] (ACM DL, PDF)
Selective Runtime Memory Disambiguation in a Dynamic Binary
Translator [abstract] (PDF)
Recursive Data Structure Profiling [abstract] (ACM DL, PDF)
Practical and Accurate Low-Level Pointer Analysis [abstract] (IEEE Xplore, PDF)
Compiler Optimization-Space Exploration [abstract] (PDF)
Finding Dominators in Practice
Exposing Memory Access Regularities Using Object-Relative Memory Profiling [abstract] (ACM DL, PDF)
Compiler Optimization-Space Exploration [abstract] (IEEE Xplore, PDF)
Procedure Boundary Elimination for EPIC Compilers [abstract] (CiteSeerX, PDF)
Program Decision Logic Optimization using Predication and
Control Speculation [abstract] (IEEE Xplore, PDF, PostScript)
Accurate and Efficient Predicate Analysis with Binary Decision
Diagrams [abstract] (ACM DL, PDF, PostScript)
Systematic Compilation for Predicated Execution
The Partial Reverse If-Conversion Framework for Balancing
Control Flow and Predication [abstract]
The Program Decision Logic Approach to Predicated Execution [abstract] (ACM DL, PDF, PostScript)
Integrated Predicated and Speculative Execution in the IMPACT
EPIC Archtecture [abstract] (ACM DL, PDF, PostScript)
The IMPACT EPIC 1.0 Architecture and Instruction Set Reference Manual
A Framework for Balancing Control Flow and Predication [abstract] (IEEE Xplore, PDF, PostScript)
Architectural Support Compiler-Synthesized Dynamic Branch
Prediction Strategies: Rationale and Initial Results [abstract] (IEEE Xplore, PDF, PostScript)
Compiler Technology for Future Microprocessors [abstract] (IEEE Xplore, PDF, PostScript)
A Comparison of Full and Partial Predicated Execution Support
for ILP Processors [abstract] (ACM DL, PDF, PostScript)
Sentinel Scheduling with Recovery Blocks