Please note that the following content is intended for personal note-taking, and therefore is rather unorganized. On the occasion of any issues, please email me or make a thread in the comment section below.

For the original self-attention, we need to compute a inner product with a SoftMax function:

However, if we choose to solve it in the frequency domain, we should utilize Fourier transform and inverse Fourier transform before the product:

In this way, we can better capture the information in the frequency domain and extract the features of the blurring in low frequency components.
Follow the similar strategy, we can also modify the multiplication in the FFN by incorporating fast Fourier transform. We will formulate the process of computation as:

We choose to use GEGLU as the activation function in this setting.

References

[1] Kong, Lingshun, et al. "Efficient frequency domain-based transformers for high-quality image deblurring." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.