Decoupled Software Pipelining Creates Parallelization Opportunities [abstract] (ACM DL, PDF)
Jialu Huang, Arun Raman, Yun Zhang, Thomas B. Jablin, Tzu-Han Hung, and David I. August
Proceedings of the 2010 International Symposium on Code Generation and Optimization (CGO), April 2010.
Accept Rate: 41% (29/70).
Decoupled Software Pipelining (DSWP) is one approach to
automatically extract threads from loops. It partitions loops into long-running
threads that communicate in a pipelined manner via inter-core queues. This work
recognizes that DSWP can also be an enabling transformation for other loop
parallelization techniques. This use of DSWP, called DSWP+, splits a loop into
new loops with dependence patterns amenable to parallelization using techniques
that were originally either inapplicable or poorly-performing. By parallelizing
each stage of the DSWP+ pipeline using (potentially) different techniques, not
only is the benefit of DSWP increased, but the applicability and performance of
other parallelization techniques are enhanced. This paper evaluates DSWP+ as an
enabling framework for other transformations by applying it in conjunction with
DOALL, LOCALWRITE, and SpecDOALL to individual stages of the pipeline. This paper
demonstrates significant performance gains on a commodity 8-core multicore machine
running a variety of codes transformed with DSWP+.