Design of 3D FFTs with FPGA clusters [abstract] (PDF)
Jiayi Sheng, Benjamin Humphries, Hansen Zhang, and Martin C. Herbordt
Proceedings of High Performance Extreme Computing Conference (HPEC), September 2014.

The three dimensional Fast Fourier Transform (3D FFT) is widely applied in various scientific applications. Distributed 3D FFTs require global communication: this becomes a serious concern when strong scaling is required as in long timescale molecular dynamics simulations. In this paper, we propose a parameterized 3D FFT design that targets at a 3D-torus FPGA-based network of various sizes. Characteristics include direct FPGA-FPGA communication links, support for various internal switch designs, and use of table-based routing which saves chip area and routing cycles. We find that even assuming extremely conservative parameters, we are able to run the 16^3 FFT in 3.9us, 32^3 FFT in 5.46us, 64^3 FFT in 9.52us, and 128^3 FFT in 25.72us. These results indicate that clusters based on commodity FPGAs are likely to be appropriate when strong scaling is needed in applications limited by the 3D FFT.