Static Dependence Analysis in an Infrastructure for Automatic Parallelization [abstract] (PDF)
Nick P. Johnson
Ph.D. Thesis, Department of Computer Science, Princeton University, September 2015.

Now that parallel architectures are common, software must exploit multiple cores to fully utilize hardware resources and achieve efficient execution. Restructuring applications for explicit parallelism requires developers to reason about low-level details to avoid concurrency bugs and achieve parallel performance. Automatic thread extraction relieves developers of this arduous task. This dissertation presents a compiler {\em middle-ware} for automatic thread extraction---analyses and transformations that allow the compiler to deliver parallel performance for sequentially-specified input programs. This middle-ware reconsiders the compilation infrastructure to present alternative technologies better suited to the needs of automatic thread extraction. Specifically, {\bf Collaborative Dependence Analysis:} Since no analysis algorithm can precisely decide all analysis facts, this dissertation combines simple analysis algorithms into a {\em collaborative ensemble} algorithm: the ensemble algorithm can disprove dependence queries which no member disproves alone. Collaboration enables {\em factored} development, which prescribes the development of small, orthogonal analysis algorithms tailored to the needs of analysis clients. Results demonstrate independently-developed analysis algorithms collaborating to solve complex, multiple-logic queries. Further, results demonstrate a large impact on performance: for some benchmarks, analysis strength is the difference between 11$\times$ slowdown and 28$\times$ speedup on 50 cores. {\bf Scaling Analysis to Larger Scopes:} The infrastructure builds around Parallel-Stage Decoupled Software Pipelining (PS-DSWP) thread extraction. PS-DSWP targets large, hot program scopes to overcome the non-recurring overheads of parallel execution. However, as scopes grow, the burden of analysis becomes prohibitive. This dissertation contributes a faster algorithm to compute a dependence graph to drive PS-DSWP. This algorithm identifies dependence edges which {\em cannot} affect PS-DSWP. It skips dependence analysis queries pertaining to unimportant edges, reducing analysis time---or allowing more expensive analysis algorithms---for the remaining, important queries. Evaluation demonstrates that the algorithm computes the ${\rm DAG}_{\rm SCC}$ twice as fast using half as many dependence analysis queries without sacrificing analysis precision. {\bf Incorporating Speculation into the Compilation Pipeline:} A parallelization system may speculate various properties to enable thread extraction. This dissertation presents design patterns to simplify development of novel speculation types. The dissertation integrates these contributions into a robust and flexible middle-ware upon which many works are built.