FFT (Fast Fourier Transform)
Key Points
- •FFT converts a polynomial from coefficients to values at the n-th roots of unity in O(n log n) time, enabling fast multiplication via pointwise products.
- •Polynomial multiplication becomes convolution of coefficients; the convolution theorem says DFT turns convolution into elementwise multiplication.
- •We use divide-and-conquer and the special structure of complex roots of unity to do the Discrete Fourier Transform efficiently.
- •To multiply A(x) and B(x), compute FFT(A), FFT(B), multiply pointwise, then apply inverse FFT and round to nearest integers.
- •Zero-pad to at least the next power of two so linear convolution is correct and no aliasing occurs.
- •Precision issues arise with large coefficients; mitigate by splitting coefficients into chunks or using NTT (modular FFT) when possible.
- •There are two common implementations: recursive D&C (educational) and iterative in-place with bit-reversal (competitive-programming fast).
- •Overall complexity is O(n log n) time and O(n) memory; constants matter, so prefer iterative FFT and reuse plans/twiddles.
Prerequisites
- →Complex numbers and Euler's formula — FFT uses complex roots of unity e^{2πi/n}; understanding rotations and Euler's identity is essential.
- →Polynomials and convolution — Polynomial multiplication corresponds to the Cauchy product (discrete convolution), the core operation accelerated by FFT.
- →Big-O notation and divide-and-conquer — To analyze FFT's O(n log n) time via the recurrence T(n)=2T(n/2)+O(n).
- →Modular arithmetic (for NTT/splitting) — When working modulo primes or recombining split coefficients, you must reduce results correctly.
- →Floating-point rounding and numerical error — FFT over doubles introduces small errors; proper rounding and error tolerance are needed.
Detailed Explanation
Tap terms for definitions01Overview
Fast Fourier Transform (FFT) is a divide-and-conquer algorithm that computes the Discrete Fourier Transform (DFT) of a sequence in O(n \log n) time instead of O(n^2). In algorithmic problem solving, we mainly use FFT to multiply polynomials (or perform discrete convolutions) much faster than the naive O(n^2) approach. The key idea is to switch between two equivalent ways to represent a polynomial: by its coefficients and by its values at carefully chosen points (the n-th roots of unity). In the point-value form, multiplying polynomials is easy: multiply corresponding values. FFT provides the fast forward transform (coefficients to values) and inverse transform (values back to coefficients). This trick turns polynomial multiplication into three transforms and one pointwise multiply. Hook: Imagine multiplying two 100,000-degree polynomials in a time budget of one second; quadratic methods are impossible. Using FFT, the same task becomes feasible. Concept: change representation to exploit simple per-index multiplications. Example: convolving two arrays representing histogram counts to get pairwise sum frequencies can also be done via FFT in O(n \log n), a common trick in combinatorics and signal processing.
02Intuition & Analogies
Think of a polynomial as a song. The coefficients are like notes written on a sheet, while the values at roots of unity are like listening to the song at specific time points around a circle. Playing notes together (convolution) is hard to do manually, but if you record how loud the song is at many evenly spaced times (roots of unity), then mixing two songs is as simple as multiplying the volumes at each time. FFT is the fast method to switch between the sheet music (coefficients) and the audio samples (values at evenly spaced times). Another analogy: Suppose you want to compute how many ways to pick a pair of numbers—one from each list—whose sum equals k. The naive way checks all pairs. A faster way counts how often each number appears, then performs a convolution of these counts—this convolution is precisely polynomial multiplication of generating functions. By evaluating both polynomials at the special complex points on the unit circle (roots of unity), the convolution theorem tells us that convolution becomes pointwise multiplication. The divide-and-conquer trick splits the sequence into evens and odds, reusing symmetries of the circle (w^{k + n/2} = -w^k) so that each recursion handles half the problem, and we combine results with simple "butterfly" operations. Example: If you want the distribution of sums of two dice (2..12), you can encode each die as a polynomial where the coefficient of x^i is 1 if face i appears; multiplying polynomials yields the sum distribution directly.
03Formal Definition
04When to Use
Use FFT when you need fast polynomial multiplication or discrete convolution and your coefficient growth/precision can be controlled. Typical competitive programming use cases include: (1) multiplying large-degree integer polynomials (e.g., big integer multiplication), (2) counting pairwise sums or sums of k elements via generating functions, (3) cross-correlation or circular convolution problems, and (4) fast subset sum approximations with binning. If you require exact integer results modulo a prime of the form p = c \cdot 2^k + 1 with a primitive root, consider using NTT (Number Theoretic Transform) for perfect accuracy. Use FFT with doubles when output fits within safe rounding tolerance; otherwise, split coefficients into chunks (e.g., 15 bits) and recombine to reduce numerical error. Prefer the iterative in-place FFT for performance-critical contexts and when multiple convolutions share the same size so you can reuse twiddle factors. For small sizes (n \lesssim few hundreds), Karatsuba or even naive O(n^2) can be competitive due to lower constants.
⚠️Common Mistakes
Common pitfalls include: (1) forgetting to zero-pad both inputs so that the transform length L is at least the next power of two ≥ deg(A)+deg(B)+1, causing circular convolution (aliasing) instead of the intended linear convolution. (2) Using the wrong sign in the angle: forward transform uses +2\pi i k / n while inverse uses −2\pi i k / n and you must divide by n at the end; omitting the 1/n factor yields results scaled by n. (3) Rounding errors: directly casting real parts to integers without round() can be off by one due to floating-point noise; also, using float instead of double increases error. (4) Overflow during reconstruction: even though FFT uses doubles, final integer combinations might exceed 32-bit; always use 64-bit integers (long long). (5) Not normalizing tiny negative zeros (−0.0) after inverse transform, which can confuse equality checks. (6) Precision loss with very large coefficients or long polynomials: mitigate by splitting coefficients into 15–16-bit chunks or switching to NTT. (7) Incorrect bit-reversal or twiddle indexing in iterative FFT leads to scrambled outputs; test using identity properties (FFT followed by inverse should return original within 1e−6). Example: multiplying [1, 2] and [3, 4] should yield [3, 10, 8]; if you see [3, 6, 8], you likely forgot the inverse normalization.
Key Formulas
DFT definition
Explanation: This defines the discrete Fourier transform: is the value of the polynomial with coefficients evaluated at the k-th power of the primitive n-th root of unity. It converts from coefficient form to point-value form.
Inverse DFT
Explanation: This recovers coefficients from values. The minus sign conjugates the angle, and dividing by n normalizes the result so that IDFT(DFT(a)) = a.
Convolution (Cauchy product)
Explanation: This is the definition of linear convolution (polynomial multiplication) producing the coefficient of in the product. FFT computes this faster than the naive O() summation.
Convolution Theorem
Explanation: The DFT of the convolution equals pointwise multiplication of DFTs. This is the core reason we can multiply polynomials quickly using FFT.
FFT Recurrence
Explanation: The radix-2 Cooley–Tukey algorithm splits the DFT into two half-size DFTs and combines them in linear time, leading to O(n log n) complexity.
Padding length
Explanation: To compute a linear convolution of lengths n and m using FFT, pad both to length L, the next power of two at least n+m−1. This prevents circular wrap-around.
Half-turn symmetry
Explanation: This symmetry underlies the even-odd decomposition in radix-2 FFT, simplifying the combine step (butterflies).
FFT Complexity
Explanation: The number of arithmetic operations grows roughly like n times the number of recursion levels (log n). Memory scales linearly with the transform size.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Recursive FFT using std::complex<double> (educational clarity) 5 // Multiplies two integer polynomials a and b and returns integer coefficients. 6 7 using cd = complex<double>; 8 const double PI = acos(-1.0); 9 10 void fft(vector<cd>& a, bool invert) { 11 int n = (int)a.size(); 12 if (n == 1) return; 13 14 // Split into even and odd indices 15 vector<cd> a0(n/2), a1(n/2); 16 for (int i = 0; i < n/2; ++i) { 17 a0[i] = a[2*i]; 18 a1[i] = a[2*i + 1]; 19 } 20 21 // Recurse on halves 22 fft(a0, invert); 23 fft(a1, invert); 24 25 // Principal n-th root of unity 26 double ang = 2 * PI / n * (invert ? -1 : 1); 27 cd w(1), wn(cos(ang), sin(ang)); 28 29 // Combine (butterfly) 30 for (int k = 0; k < n/2; ++k) { 31 cd t = w * a1[k]; 32 a[k] = a0[k] + t; 33 a[k + n/2] = a0[k] - t; 34 w *= wn; 35 } 36 37 // For inverse FFT, normalize at the top level only (caller responsibility) 38 } 39 40 vector<long long> multiply_poly(const vector<long long>& A, const vector<long long>& B) { 41 // Prepare complex vectors and pad to power of two >= A.size()+B.size()-1 42 int n1 = (int)A.size(), n2 = (int)B.size(); 43 int need = n1 + n2 - 1; 44 int n = 1; while (n < need) n <<= 1; 45 46 vector<cd> fa(n), fb(n); 47 for (int i = 0; i < n1; ++i) fa[i] = cd((double)A[i], 0); 48 for (int i = 0; i < n2; ++i) fb[i] = cd((double)B[i], 0); 49 50 // Forward FFT 51 fft(fa, false); 52 fft(fb, false); 53 54 // Pointwise multiply 55 for (int i = 0; i < n; ++i) fa[i] *= fb[i]; 56 57 // Inverse FFT 58 fft(fa, true); 59 for (int i = 0; i < n; ++i) fa[i] /= (double)n; // normalize here 60 61 // Round to nearest integer and trim to needed size 62 vector<long long> C(need); 63 for (int i = 0; i < need; ++i) { 64 C[i] = llround(fa[i].real()); 65 } 66 return C; 67 } 68 69 int main() { 70 ios::sync_with_stdio(false); 71 cin.tie(nullptr); 72 73 // Example: (1 + 2x + 3x^2) * (4 + 5x) 74 vector<long long> A = {1, 2, 3}; 75 vector<long long> B = {4, 5}; 76 77 vector<long long> C = multiply_poly(A, B); 78 79 // Expected: [4, 13, 22, 15] 80 for (size_t i = 0; i < C.size(); ++i) { 81 if (i) cout << ' '; 82 cout << C[i]; 83 } 84 cout << '\n'; 85 return 0; 86 } 87
This code implements a clear recursive radix-2 FFT over complex doubles. We convert coefficient vectors to complex form, pad to the next power of two, take forward FFTs, multiply spectra elementwise, and apply the inverse FFT by calling the same routine with invert=true and dividing by n. Rounding produces integer coefficients.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Fast iterative radix-2 FFT with bit-reversal; recommended for performance. 5 // Provides convolution for real integer coefficients with rounding. 6 7 using cd = complex<double>; 8 const double PI = acos(-1.0); 9 10 void fft(vector<cd>& a, bool invert) { 11 int n = (int)a.size(); 12 13 // Bit-reversal permutation 14 static vector<int> rev; 15 static vector<cd> roots{cd(0, 0), cd(1, 0)}; // roots[k] = e^{2πi / 2^k} 16 if ((int)rev.size() != n) { 17 int k = __builtin_ctz(n); 18 rev.assign(n, 0); 19 for (int i = 0; i < n; ++i) { 20 rev[i] = (rev[i >> 1] >> 1) | ((i & 1) << (k - 1)); 21 } 22 // Precompute roots up to needed power 23 if ((int)roots.size() <= k) { 24 int old = roots.size(); 25 roots.resize(k + 1); 26 for (int p = old; p <= k; ++p) { 27 double ang = 2 * PI / (1 << p); 28 roots[p] = cd(cos(ang), sin(ang)); // principal 2^p-th root 29 } 30 } 31 } 32 33 for (int i = 0; i < n; ++i) if (i < rev[i]) swap(a[i], a[rev[i]]); 34 35 for (int len = 1, p = 1; (len << 1) <= n; len <<= 1, ++p) { 36 cd wlen = roots[p]; 37 if (invert) wlen = conj(wlen); 38 for (int i = 0; i < n; i += (len << 1)) { 39 cd w(1, 0); 40 for (int j = 0; j < len; ++j) { 41 cd u = a[i + j]; 42 cd v = a[i + j + len] * w; 43 a[i + j] = u + v; 44 a[i + j + len] = u - v; 45 w *= wlen; 46 } 47 } 48 } 49 50 if (invert) { 51 for (cd &x : a) x /= (double)n; 52 } 53 } 54 55 vector<long long> convolution(const vector<long long>& A, const vector<long long>& B) { 56 int n1 = (int)A.size(), n2 = (int)B.size(); 57 if (!n1 || !n2) return {}; 58 int need = n1 + n2 - 1; 59 int n = 1; while (n < need) n <<= 1; 60 61 vector<cd> fa(n), fb(n); 62 for (int i = 0; i < n1; ++i) fa[i] = cd((double)A[i], 0); 63 for (int i = 0; i < n2; ++i) fb[i] = cd((double)B[i], 0); 64 65 fft(fa, false); 66 fft(fb, false); 67 for (int i = 0; i < n; ++i) fa[i] *= fb[i]; 68 fft(fa, true); 69 70 vector<long long> C(need); 71 for (int i = 0; i < need; ++i) C[i] = llround(fa[i].real()); 72 return C; 73 } 74 75 int main() { 76 ios::sync_with_stdio(false); 77 cin.tie(nullptr); 78 79 // Example: Convolution of two sequences (can model polynomial multiplication) 80 vector<long long> A = {3, 1, 4, 1, 5}; 81 vector<long long> B = {2, 7, 1, 8}; 82 83 auto C = convolution(A, B); 84 85 for (size_t i = 0; i < C.size(); ++i) { 86 if (i) cout << ' '; 87 cout << C[i]; 88 } 89 cout << '\n'; 90 return 0; 91 } 92
This version implements an optimized iterative FFT: it performs an in-place bit-reversal permutation and iteratively applies butterflies with precomputed roots. It is faster and more memory-friendly than the recursive version and is a common competitive programming template.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Convolution modulo a prime (e.g., 1e9+7) using floating-point FFT with 15-bit coefficient splitting. 5 // This avoids precision issues when coefficients or results are large. 6 7 using cd = complex<double>; 8 const double PI = acos(-1.0); 9 10 void fft(vector<cd>& a, bool invert) { 11 int n = (int)a.size(); 12 static vector<int> rev; 13 static vector<cd> roots{cd(0, 0), cd(1, 0)}; 14 if ((int)rev.size() != n) { 15 int k = __builtin_ctz(n); 16 rev.assign(n, 0); 17 for (int i = 0; i < n; ++i) rev[i] = (rev[i >> 1] >> 1) | ((i & 1) << (k - 1)); 18 if ((int)roots.size() <= k) { 19 int old = roots.size(); 20 roots.resize(k + 1); 21 for (int p = old; p <= k; ++p) { 22 double ang = 2 * PI / (1 << p); 23 roots[p] = cd(cos(ang), sin(ang)); 24 } 25 } 26 } 27 for (int i = 0; i < n; ++i) if (i < rev[i]) swap(a[i], a[rev[i]]); 28 for (int len = 1, p = 1; (len << 1) <= n; len <<= 1, ++p) { 29 cd wlen = roots[p]; 30 if (invert) wlen = conj(wlen); 31 for (int i = 0; i < n; i += (len << 1)) { 32 cd w(1, 0); 33 for (int j = 0; j < len; ++j) { 34 cd u = a[i + j]; 35 cd v = a[i + j + len] * w; 36 a[i + j] = u + v; 37 a[i + j + len] = u - v; 38 w *= wlen; 39 } 40 } 41 } 42 if (invert) for (cd& x : a) x /= (double)n; 43 } 44 45 // Generic convolution over integers using FFT doubles, with rounding 46 static vector<long long> convolution_ll(const vector<long long>& A, const vector<long long>& B) { 47 if (A.empty() || B.empty()) return {}; 48 int need = (int)A.size() + (int)B.size() - 1; 49 int n = 1; while (n < need) n <<= 1; 50 vector<cd> fa(n), fb(n); 51 for (size_t i = 0; i < A.size(); ++i) fa[i] = cd((double)A[i], 0); 52 for (size_t i = 0; i < B.size(); ++i) fb[i] = cd((double)B[i], 0); 53 fft(fa, false); fft(fb, false); 54 for (int i = 0; i < n; ++i) fa[i] *= fb[i]; 55 fft(fa, true); 56 vector<long long> C(need); 57 for (int i = 0; i < need; ++i) C[i] = llround(fa[i].real()); 58 return C; 59 } 60 61 // Convolution modulo MOD using 15-bit splitting: a = a0 + a1*B, B=1<<15 62 vector<int> convolution_mod(const vector<int>& A, const vector<int>& B, int MOD) { 63 const long long BASE = 1LL << 15; // 32768 64 65 // Split into low/high parts 66 vector<long long> a0(A.size()), a1(A.size()), b0(B.size()), b1(B.size()); 67 for (size_t i = 0; i < A.size(); ++i) { 68 int x = A[i] % MOD; if (x < 0) x += MOD; 69 a0[i] = x & (BASE - 1); 70 a1[i] = x >> 15; 71 } 72 for (size_t i = 0; i < B.size(); ++i) { 73 int x = B[i] % MOD; if (x < 0) x += MOD; 74 b0[i] = x & (BASE - 1); 75 b1[i] = x >> 15; 76 } 77 78 // Perform three convolutions: c0=a0*b0, c1=a0*b1 + a1*b0, c2=a1*b1 79 vector<long long> c0 = convolution_ll(a0, b0); 80 vector<long long> c01 = convolution_ll(a0, b1); 81 vector<long long> c10 = convolution_ll(a1, b0); 82 vector<long long> c2 = convolution_ll(a1, b1); 83 84 size_t need = c0.size(); 85 vector<int> res(need); 86 for (size_t i = 0; i < need; ++i) { 87 long long t0 = (i < c0.size() ? c0[i] : 0); 88 long long t1 = (i < c01.size() ? c01[i] : 0) + (i < c10.size() ? c10[i] : 0); 89 long long t2 = (i < c2.size() ? c2[i] : 0); 90 // Recombine: t0 + t1*BASE + t2*BASE^2 (mod MOD) 91 __int128 val = (__int128)t0 + (__int128)t1 * BASE + (__int128)t2 * BASE * BASE; 92 long long v = (long long)(val % MOD); 93 if (v < 0) v += MOD; 94 res[i] = (int)v; 95 } 96 return res; 97 } 98 99 int main() { 100 ios::sync_with_stdio(false); 101 cin.tie(nullptr); 102 103 const int MOD = 1'000'000'007; 104 // Example: multiply polynomials modulo 1e9+7 with potentially large coefficients 105 vector<int> A = { (int)1e9, (int)1e9, (int)1e9 }; 106 vector<int> B = { (int)1e9, (int)1e9 }; 107 108 auto C = convolution_mod(A, B, MOD); 109 110 for (size_t i = 0; i < C.size(); ++i) { 111 if (i) cout << ' '; 112 cout << C[i]; 113 } 114 cout << '\n'; 115 return 0; 116 } 117
This implementation computes convolution modulo a large prime using floating-point FFT safely. It splits each coefficient into two 15-bit chunks (low and high), performs three real convolutions using a standard FFT with doubles, and recombines the results using BASE = 2^15. This mitigates rounding error because each convolution handles smaller magnitudes, allowing robust recovery modulo MOD.