3D FFTs on a Single FPGA [abstract] (PDF)
Benjamin Humphries, Hansen Zhang, Jiayi Sheng, Raphael Landaverde, and Martin C. Herbordt
Proceedings of the 22nd International Conference on Field-Programmable Custom Computing Machines (FCCM), May 2014.
  
  
The 3D FFT is critical in many physical simulations and image
processing applications. On FPGAs, however, the 3D FFT was thought to be
inefficient relative to other methods such as convolution-based implementations
of multigrid. We find the opposite: a simple design, operating at a
conservative frequency, takes 4us for 16^3, 21us for 32^3, and 215us for 64^3
single precision data points. The first two of these compare favorably with the
25us and 29us obtained running on a current Nvidia GPU. Some broader
significance is that this is a critical piece in implementing a large scale
FPGA-based MD engine: even a single FPGA is capable of keeping the FFT off of
the critical path for a large fraction of possible MD simulations.
