Auto-Vectorization of Interleaved Data for SIMD [abstract] (ACM DL)
Dorit Nuzman, Ira Rosen, and Ayal Zaks
Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2006.
Most implementations of the Single Instruction Multiple Data
(SIMD) model available today require that data elements be packed
in vector registers. Operations on disjoint vector elements are not
supported directly and require explicit data reorganization manipulations.
Computations on non-contiguous and especially interleaved
data appear in important applications, which can greatly
benefit from SIMD instructions once the data is reorganized properly.
Vectorizing such computations efficiently is therefore an ambitious
challenge for both programmers and vectorizing compilers.
We demonstrate an automatic compilation scheme that supports
effective vectorization in the presence of interleaved data with constant
strides that are powers of 2, facilitating data reorganization.
We demonstrate how our vectorization scheme applies to dominant
SIMD architectures, and present experimental results on a wide
range of key kernels, showing speedups in execution time up to 3.7
for interleaving levels (stride) as high as 8.