Decoupled Software Pipelining Creates Parallelization Opportunities [abstract] (ACM DL, PDF)
Jialu Huang, Arun Raman, Yun Zhang, Thomas B. Jablin, Tzu-Han Hung, and David I. August
Proceedings of the 2010 International Symposium on Code Generation and Optimization (CGO), April 2010.
Decoupled Software Pipelining (DSWP) is one approach to
automatically extract threads from loops. It partitions loops into long-running
threads that communicate in a pipelined manner via inter-core queues. This work
recognizes that DSWP can also be an enabling transformation for other loop
parallelization techniques. This use of DSWP, called DSWP+, splits a loop into
new loops with dependence patterns amenable to parallelization using techniques
that were originally either inapplicable or poorly-performing. By parallelizing
each stage of the DSWP+ pipeline using (potentially) different techniques, not
only is the benefit of DSWP increased, but the applicability and performance of
other parallelization techniques are enhanced. This paper evaluates DSWP+ as an
enabling framework for other transformations by applying it in conjunction with
DOALL, LOCALWRITE, and SpecDOALL to individual stages of the pipeline. This paper
demonstrates significant performance gains on a commodity 8-core multicore machine
running a variety of codes transformed with DSWP+.