FastForward for Efficient Pipeline Parallelism: A Cache-Optimized Concurrent Lock-Free Queue [abstract] (ACM DL)
John Giacomoni, Tipp Moseley, and Manish Vachharajani
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), February 2008.
Low overhead core-to-core communication is critical for efficient
pipeline-parallel software applications. This paper presents
FastForward, a cache-optimized single-producer/single-consumer
concurrent lock-free queue for pipeline parallelism on
multicore architectures, with weak to strongly ordered consistency
models. Enqueue and dequeue times on a 2.66 GHz Opteron 2218 based
system are as low as 28.5 ns, up to 5x faster than the next best
solution. FastForward's effectiveness is demonstrated for real
applications by applying it to line-rate soft network processing on
Gigabit Ethernet with general purpose commodity hardware.