Design of 3D FFTs with FPGA clusters [abstract] (PDF)
Jiayi Sheng, Benjamin Humphries, Hansen Zhang, and Martin C. Herbordt
Proceedings of High Performance Extreme Computing Conference (HPEC), September 2014.
The three dimensional Fast Fourier Transform (3D FFT) is widely
applied in various scientific applications. Distributed 3D FFTs require global
communication: this becomes a serious concern when strong scaling is required
as in long timescale molecular dynamics simulations. In this paper, we propose
a parameterized 3D FFT design that targets at a 3D-torus FPGA-based network of
various sizes. Characteristics include direct FPGA-FPGA communication links,
support for various internal switch designs, and use of table-based routing
which saves chip area and routing cycles. We find that even assuming extremely
conservative parameters, we are able to run the 16^3 FFT in 3.9us, 32^3 FFT in
5.46us, 64^3 FFT in 9.52us, and 128^3 FFT in 25.72us. These results indicate
that clusters based on commodity FPGAs are likely to be appropriate when strong
scaling is needed in applications limited by the 3D FFT.