Decoupled Software Pipelining: A Promising Technique to Exploit Thread-Level Parallelism [abstract]
Guilherme Ottoni, Ram Rangan, Neil Vachharajani, and David I. August
Proceedings of the Fourth Workshop on Explicitly Parallel Instruction Computer Architectures and Compiler Technology (EPIC), March 2005.

Processor manufacturers are moving to multi-core, multi-threaded designs because of several factors such as cost, ease of design and scalability. As most processors will be multi-threaded in the future, exposing thread-level parallelism (TLP) is a problem of increasing importance. Because the adequate granularity of the threads is dependent on the target architecture, and writing sequential applications is usually more natural, the compiler plays an important role in performing the mapping from applications to the appropriate multi-threaded code. In spite of this, few general-purpose compilation techniques have been proposed to assist in this task. In this paper, we propose Decoupled Software Pipelining (DSWP) to extract thread-level parallelism. DSWP can convert most application loops into a pipeline of loop threads. This brings pipeline parallelism to most application loops including those not targeted by traditional software pipelining. DSWP does not rely on complex hardware speculation support since it is a non-speculative transformation. This paper describes the DSWP technique, discusses its implementation in a compiler, and presents experimental results demonstrating that it is a promising technique to extract TLP.