Automatically Exploiting Cross-Invocation Parallelism Using Runtime Information [abstract] (PDF)
Jialu Huang, Thomas B. Jablin, Stephen R. Beard, Nick P. Johnson, and David I. August
Proceedings of the 2013 International Symposium on Code Generation
and Optimization (CGO), February 2013.
Automatic parallelization is a promising approach to producing scalable
multi-threaded programs for multicore architectures. Many existing automatic
techniques only parallelize iterations within a loop invocation and synchronize
threads at the end of each loop invocation. When parallel code contains many
loop invocations, synchronization can easily become a performance bottleneck.
Some automatic techniques address this problem by exploiting crossinvocation
parallelism. These techniques use static analysis to partition iterations among
threads to avoid cross-thread dependences. However, this partitioning is not
always achievable at compile-time, because program input termines dependence
patterns at run-time. By contrast, this paper proposes DOMORE, the first
automatic parallelization technique that uses runtime information to exploit
additional cross-invocation parallelism. Instead of partitioning iterations
statically, DOMORE dynamically detects crossthread dependences and synchronizes
only when necessary. DOMORE consists of a compiler and a runtime library. At
compile time, DOMORE automatically parallelizes loops and inserts a custom
runtime engine into programs. At runtime, the engine observes dependences and
synchronizes iterations only when necessary. For six programs, DOMORE achieves
a geomean loop speedup of 2.1X over parallel execution without crossinvocation
parallelization and of 3.2X over sequential execution on eight cores.