Liberty Queues for EPIC Architectures [abstract] (PDF)
Thomas B. Jablin, Yun Zhang, James A. Jablin, Jialu Huang, Hanjun Kim, and David I. August
Proceedings of the Eighth Workshop on Explicitly Parallel Instruction Computer Architectures and Compiler Technology (EPIC), April 2010.

Core-to-core communication bandwidth is critical for high-performance pipeline-parallel programs. Hardware communication queues are unlikely to be implemented and are perhaps unnecessary. This paper presents Liberty Queues, a high-performance lock-free software-only ring buffer, and describes the porting effort from the original x86-64 implementation to IA-64. Liberty Queues achieve a bandwidth of 500 MB/s between unrelated processors on a first generation Itanium 2, compared with 281 MB/s on modern Opterons and 430 MB/s on modern Xeons claimed by related works. Bandwidth results are presented for seven different multicore and multiprocessor systems, as well as a sensitivity analysis.