Automatic and Speculative Parallel-Stage Decoupled Software Pipelining [abstract]
Zujun Tan, Greg Chan, Ziyang Xu, Sotiris Apostolakis, and David I. August
The Second Young Architect Workshop (YArch), March 2020.

The end of Moore's law and Dennard scaling has raised the need for work to improve the utilization of transistors. Multicore systems and their transistors are underutilized due to the lack of sufficient extraction of multicore-appropriate thread-level parallelism (MATLP) from programs. Over the last decade and especially in the last year, compiler advancements have dramatically increased MATLP extraction for sequential codes. In particular, new lower-validation-cost speculation methods make it possible to bypass limits of static memory analyses. Applied to DOALL, these methods have been shown to double the performance over the best prior speculative techniques. Unfortunately, since these methods only apply to DOALL, they fail completely when any hard to predict cross-iteration dependence exists. PS-DSWP is a generalization of DOALL able to handle unpredictable cross-iteration dependences. Thus, the goal of this work is to combine the performance-enhancing power of these new speculation methods with the applicability of PS-DSWP. This work will include porting these efficient speculative enabling transformations to PS-DSWP, creating a planning phase to find an application of these techniques that avoid unnecessary speculation and maximizes parallelism.