Procedure Boundary Elimination

Procedure Boundary Elimination (PBE) is a framework for unrestricted whole-program optimization. The rest of this page gives a very brief overview of the PBE project and its current status. A complete description of PBE, including a detailed discussion of related work, will be available soon as a technical report (currently in progress). An extremely preliminary version of PBE has also been published in the EPIC'03 workshop.

Why do we need whole-program optimization?

Current compilation frameworks are procedure-based, in the sense that they apply most or all optimization routines on one procedure at a time. However, a single procedure may not offer the ideal scope for optimization. This hampers the optimizing compiler's effectiveness.

What about existing solutions?

Aggressive inlining is by far the most common way to extend optimization scope. Although inlining has become an invaluable tool in today's compilers, it can be applied only sparingly, because it causes excessive code growth. This limits its applicability.

Additionally, interprocedural or whole-program analysis and optimization routines can vastly increase the information available to the compiler, as well as the optimization opportunities that the compiler can exploit. However, even when such routines are incorporated in the compiler, the biggest part of the optimization process remains procedure-based. This is because the compile-time cost of interprocedural routines grows very fast with program size, thus making the generalization of most optimizations to the interprocedural level unrealistic. In turn, having to leave the code in a procedure-based form for later phases hinders the transformations that interprocedural routines can perform.

Unrestricted Whole-Program Optimization

PBE allows all optimization routines in a compiler to operate across procedure boundaries. At the same time, PBE avoids the twin problems of excessive code growth and excessive compile time. Therefore, PBE is the first general and practical solution to the optimization scope problem.

The WIR

The PBE framework's first task is to obtain an intermediate representation (IR) of the whole program that is both analyzable and optimizable. Much like interprocedural analysis does, this is achieved by merging the separate control-flow graphs (CFGs) of individual procedures into a single whole-program IR (WIR), by inserting appropriately annotated arcs from call and return sites to the corresponding procedure entries and exits. Additionally, PBE takes several extra steps in order to make this representation freely optimizable. Among other things, procedure stack frames are merged, the stack-based behavior of locals in recursive procedures becomes explicit, etc.

Transforming the WIR

Any transformation on the WIR can operate without regard to procedure boundaries. Optimizations that involve duplicating or rearranging code (e.g. loop unrolling, superblock formation, code layout) can duplicate or excise any randomly shaped subpart of a (former) procedure. Optimizations that involve extending liveranges (e.g. redundancy elimination, loop-invariant code motion) can cause a variable to be live across call or return arcs, without worrying about parameter-passing or return value mechanisms. The only obligation that an optimization routine has is to correctly maintain the annotations on the call and return arcs. Even this constraint can be removed, at the cost of making analysis less precise.

Originally, the WIR will look just like a bunch of procedures joined together. However, after a few optimization passes, the WIR's subparts will lose their procedural form. One can still follow the call and return arcs in order to divide the WIR into parts roughly corresponding to the original procedures. (For reasons explained in the technical report, I call these parts same-level subgraphs). However, these subgraphs may have multiple entries and exits, with each exit pointing to different "return sites". Thus the usual assumptions about the behavior of calls and returns are violated. Also, a random subset of the program's variables may be live across any such entry or exit. Such data exchange cannot be expressed through any meaningful calling convention. By giving more freedom to a program's form, PBE exposes optimization opportunities that were formerly impossible to realize.

Analyzing the WIR

Although interprocedural analysis has been extensively studied in the past, existing interprocedural analysis algorithms cannot be directly applied to the WIR. Such algorithms have to be generalized in order to handle the WIR's unrestricted form. Additionally, since PBE is intended for real-world compilers, PBE analysis also has to go to extra lengths in order to be efficient enough. Unfortunately, PBE analysis algorithms are way too complicated for this brief overview. Interested readers are referred to the technical report.

Partitioning the WIR

Of course, optimization and analysis routines cannot operate on the whole program without incurring prohibitive compile-time costs. For this reason, the PBE compiler partitions the WIR into more manageable parts through region formation. Region-based compilation has originally been proposed in order to avoid the fast-increasing compile-time costs of aggressive inlining, but it is far more relevant in the PBE domain. If done right, region formation can limit compile time while causing only negligible performance losses.

Generally, a region is a compiler-defined compilation unit, comprising a set of basic blocks that are deemed to be strongly correlated. How this correlation is measured is up to the region formation algorithm used. The most obvious solution would be to form regions according to profile-guided heuristics. We have already implemented this, and we are currently looking into more sophisticated structural and dataflow heuristics for region formation. After the WIR is broken up into regions, each region is properly encapsulated so that it can be analyzed and optimized separately. To make this encapsulation work in PBE, we had to significantly extend the existing region encapsulation methods, both to make it handle the WIR's call and return constraints, and to make it more effective (since PBE applies region-based compilation in a scale not tried before).

Targeted Code Specialization

The benefits of aggressive inlining do not result solely from extending the scope of optimization. By giving a separate copy of a callee's code to each call site, inlining allows that code to be specialized according to each callee's needs. PBE recovers these specialization benefits while limiting code growth to where it is likely to be beneficial, using a targeted code specialization phase.

Many existing optimization routines perform code specialization as a side effect. There are several such techniques originally intended for scheduling (superblock formation, hyperblock formation, treegions). More sophisticated techniques include complete PRE and "on-demand" predication-based specialization. All these techniques can be given new freedom within the PBE framework. Furthermore, these methods can now be applied much more aggressively, since inlining, usually the biggest user of a compiler's code-growth budget, is no longer happening. We are also investigating PBE-specific specialization methods.