Automatic and Speculative Parallel-Stage Decoupled Software Pipelining [abstract]
Zujun Tan, Greg Chan, Ziyang Xu, Sotiris Apostolakis, and David I. August
The Second Young Architect Workshop (YArch), March 2020.
The end of Moore's law and Dennard scaling has raised the need for
work to improve the utilization of transistors. Multicore systems and their
transistors are underutilized due to the lack of sufficient extraction of
multicore-appropriate thread-level parallelism (MATLP) from programs. Over the
last decade and especially in the last year, compiler advancements have
dramatically increased MATLP extraction for sequential codes. In particular,
new lower-validation-cost speculation methods make it possible to bypass limits
of static memory analyses. Applied to DOALL, these methods have been shown to
double the performance over the best prior speculative techniques.
Unfortunately, since these methods only apply to DOALL, they fail completely
when any hard to predict cross-iteration dependence exists. PS-DSWP is a
generalization of DOALL able to handle unpredictable cross-iteration
dependences. Thus, the goal of this work is to combine the
performance-enhancing power of these new speculation methods with the
applicability of PS-DSWP. This work will include porting these efficient
speculative enabling transformations to PS-DSWP, creating a planning phase to
find an application of these techniques that avoid unnecessary speculation and
maximizes parallelism.