News

The Filament Project aims to automatically extract thread level parallelism from single-threaded codes. To achieve this, Filament uses a novel parallelization technique Decoupled Software Pipelining (DSWP). DSWP achieves non-speculative parallel execution through the extraction of pipelined parallelism. Essentially, DSWP breaks a loop body up into multiple threads where each thread feeds data values to subsequent threads. However, DSWP avoids creating threads such that there is communication from later threads to earlier threads. This avoids placing the communication path on the critical path. By placing queues between these threads, decoupled execution can be achieved. This helps prevent a thread stalled on a cache miss from stalling all other threads. The base DSWP technique has been extended with speculation to allow for more parallelism to be extracted. Additionally, a single DSWP phase can, under the right circumstances, be used to extract DOALL style parallelism.

With the DSWP technique we have been able to successfully parallelize many SPEC benchmarks. The speedups we obtained are shown in graphs below, with both train and reference input sets. The red dots represent the 1.4x speedup traditionally achieved per doubling of transistors, which are assumed to be used to double the number of cores on a die.

176.gcc

176.gcc

197.parser

197.parser

256.bzip2

256.bzip2