Spice: Speculative Parallel Iteration Chunk Execution [abstract] (ACM DL, PDF)
Easwaran Raman, Neil Vachharajani, Ram Rangan, and David I. August
Proceedings of the 2008 International Symposium on Code Generation and Optimization (CGO), April 2008.

The recent trend in the processor industry of packing multiple processor cores in a chip has increased the importance of automatic techniques for extracting thread level parallelism. A promising approach for extracting thread level parallelism in general purpose applications is thread level speculation(TLS), which uses memory alias or value speculation to break dependences amongst threads and executes them concurrently. In this work, we present a novel software-only value prediction mechanism for TLS and an associated TLS technique called speculative parallel iteration chunk execution (Spice). Our value prediction technique predicts the loop live-ins of only a few iterations of a given loop, enabling speculative threads to start from those iterations. It also increases the probability of successful speculation by only predicting that the values will be used as live-ins in some future iterations of the loop. These twin properties enable our value prediction scheme to have high prediction accuracies while exposing significant coarse-grained thread-level parallelism. Spice has been implemented as an automatic transformation in a research compiler. The technique results in up to 157% speedup (101% on average) with 4 threads.