Transpilation Utilizing Language-Agnostic IR and Interactivity for Parallelization [abstract] (PDF)
Zujun Tan
Ph.D. Thesis, Department of Computer Science,
Princeton University, 2024.
Migrating codes between architectures is difficult because different execution models require different types of parallelism for optimal performance. Previous approaches, like libraries or source-level tools, generate correct and natural-looking syntax for the new parallel model with limited optimization and largely leave performance engineering to the programmer. Recent approaches, such as transpilation at the compiler intermediate representation (IR) level, can automate performance engineering, but profitability can be limited by not having facts known only to the programmer. Decompiling the optimized program could leverage the strength of existing compilers to provide programmers with a natural compiler-parallelized starting point for further parallelization or refinement. Despite this potential, existing decompilers fail to do this because they do not generate portable parallel source code compatible with arbitrary compilers of the source language.
This thesis provides a method for migrating code such that the compiler and programmer work together to generate code with optimal performance. To achieve this,
it introduces Tulip via IR-level transpilation. Transpilation at the IR level enables Tulip to generalize the transformations applied to retarget parallelism. Furthermore, Tulip integrates the state-of-the-art automatic parallelization framework to explore additional parallelism expressible only in the target parallel programming model. It then generates natural source code through a novel decompiler, SPLENDID, in a high-level parallel programming language (OpenMP), which can be interactively optimized and tuned with programmer intervention. For 19 Polybench benchmarks, Tulip-generated OpenMP offloading programs perform 14% faster than the original CUDA sources on NVIDIA GPUs. Moreover, transpilation to the CPU leads to a 2.9x speedup over the best state-of-the-art transpiler. Tulip-generated portable parallel code is also more natural than what existing decompilers produce, resulting in a 39x higher average BLEU score.
This thesis includes contributions from Yebin Chon, Ziyang Xu, Sophia Zhang, and David I. August from Princeton Liberty Research Group, Brian Homerding, Yian Su, and Simone Campanoni from Northwestern Arcana Lab, Michael Kruse (AMD), Johannes Doerfert (LLNL), William S. Moses (UIUC), and Ivan R. Ivanov (Tokyo Tech).