⚙️AlgorithmAdvanced

FFT (Fast Fourier Transform)

Key Points

  • FFT converts a polynomial from coefficients to values at the n-th roots of unity in O(n log n) time, enabling fast multiplication via pointwise products.
  • Polynomial multiplication becomes convolution of coefficients; the convolution theorem says DFT turns convolution into elementwise multiplication.
  • We use divide-and-conquer and the special structure of complex roots of unity to do the Discrete Fourier Transform efficiently.
  • To multiply A(x) and B(x), compute FFT(A), FFT(B), multiply pointwise, then apply inverse FFT and round to nearest integers.
  • Zero-pad to at least the next power of two so linear convolution is correct and no aliasing occurs.
  • Precision issues arise with large coefficients; mitigate by splitting coefficients into chunks or using NTT (modular FFT) when possible.
  • There are two common implementations: recursive D&C (educational) and iterative in-place with bit-reversal (competitive-programming fast).
  • Overall complexity is O(n log n) time and O(n) memory; constants matter, so prefer iterative FFT and reuse plans/twiddles.

Prerequisites

  • Complex numbers and Euler's formulaFFT uses complex roots of unity e^{2πi/n}; understanding rotations and Euler's identity is essential.
  • Polynomials and convolutionPolynomial multiplication corresponds to the Cauchy product (discrete convolution), the core operation accelerated by FFT.
  • Big-O notation and divide-and-conquerTo analyze FFT's O(n log n) time via the recurrence T(n)=2T(n/2)+O(n).
  • Modular arithmetic (for NTT/splitting)When working modulo primes or recombining split coefficients, you must reduce results correctly.
  • Floating-point rounding and numerical errorFFT over doubles introduces small errors; proper rounding and error tolerance are needed.

Detailed Explanation

Tap terms for definitions

01Overview

Fast Fourier Transform (FFT) is a divide-and-conquer algorithm that computes the Discrete Fourier Transform (DFT) of a sequence in O(n \log n) time instead of O(n^2). In algorithmic problem solving, we mainly use FFT to multiply polynomials (or perform discrete convolutions) much faster than the naive O(n^2) approach. The key idea is to switch between two equivalent ways to represent a polynomial: by its coefficients and by its values at carefully chosen points (the n-th roots of unity). In the point-value form, multiplying polynomials is easy: multiply corresponding values. FFT provides the fast forward transform (coefficients to values) and inverse transform (values back to coefficients). This trick turns polynomial multiplication into three transforms and one pointwise multiply. Hook: Imagine multiplying two 100,000-degree polynomials in a time budget of one second; quadratic methods are impossible. Using FFT, the same task becomes feasible. Concept: change representation to exploit simple per-index multiplications. Example: convolving two arrays representing histogram counts to get pairwise sum frequencies can also be done via FFT in O(n \log n), a common trick in combinatorics and signal processing.

02Intuition & Analogies

Think of a polynomial as a song. The coefficients are like notes written on a sheet, while the values at roots of unity are like listening to the song at specific time points around a circle. Playing notes together (convolution) is hard to do manually, but if you record how loud the song is at many evenly spaced times (roots of unity), then mixing two songs is as simple as multiplying the volumes at each time. FFT is the fast method to switch between the sheet music (coefficients) and the audio samples (values at evenly spaced times). Another analogy: Suppose you want to compute how many ways to pick a pair of numbers—one from each list—whose sum equals k. The naive way checks all pairs. A faster way counts how often each number appears, then performs a convolution of these counts—this convolution is precisely polynomial multiplication of generating functions. By evaluating both polynomials at the special complex points on the unit circle (roots of unity), the convolution theorem tells us that convolution becomes pointwise multiplication. The divide-and-conquer trick splits the sequence into evens and odds, reusing symmetries of the circle (w^{k + n/2} = -w^k) so that each recursion handles half the problem, and we combine results with simple "butterfly" operations. Example: If you want the distribution of sums of two dice (2..12), you can encode each die as a polynomial where the coefficient of x^i is 1 if face i appears; multiplying polynomials yields the sum distribution directly.

03Formal Definition

Given a sequence a = (, , , ) and a primitive n-th root of unity = , the Discrete Fourier Transform (DFT) is the vector A = (, , , ) defined by = , for , 1, , n-1. The inverse DFT is given by = . In polynomial terms, if P(x) = , then = P(), the evaluation of P at n-th roots of unity. The convolution (Cauchy product) of two sequences is defined by = . The convolution theorem states that DFT(a * b) = DFT(a) DFT(b), where denotes pointwise multiplication. Hence, we can compute c by: ) DFT(b)). The Cooley–Tukey FFT algorithm factors the DFT matrix into sparse structured matrices using the identity P(x) = () + x (), enabling a recursive O(n n) procedure. The classical radix-2 FFT assumes n is a power of two; for general n, we pad with zeros to the next power of two.

04When to Use

Use FFT when you need fast polynomial multiplication or discrete convolution and your coefficient growth/precision can be controlled. Typical competitive programming use cases include: (1) multiplying large-degree integer polynomials (e.g., big integer multiplication), (2) counting pairwise sums or sums of k elements via generating functions, (3) cross-correlation or circular convolution problems, and (4) fast subset sum approximations with binning. If you require exact integer results modulo a prime of the form p = c \cdot 2^k + 1 with a primitive root, consider using NTT (Number Theoretic Transform) for perfect accuracy. Use FFT with doubles when output fits within safe rounding tolerance; otherwise, split coefficients into chunks (e.g., 15 bits) and recombine to reduce numerical error. Prefer the iterative in-place FFT for performance-critical contexts and when multiple convolutions share the same size so you can reuse twiddle factors. For small sizes (n \lesssim few hundreds), Karatsuba or even naive O(n^2) can be competitive due to lower constants.

⚠️Common Mistakes

Common pitfalls include: (1) forgetting to zero-pad both inputs so that the transform length L is at least the next power of two ≥ deg(A)+deg(B)+1, causing circular convolution (aliasing) instead of the intended linear convolution. (2) Using the wrong sign in the angle: forward transform uses +2\pi i k / n while inverse uses −2\pi i k / n and you must divide by n at the end; omitting the 1/n factor yields results scaled by n. (3) Rounding errors: directly casting real parts to integers without round() can be off by one due to floating-point noise; also, using float instead of double increases error. (4) Overflow during reconstruction: even though FFT uses doubles, final integer combinations might exceed 32-bit; always use 64-bit integers (long long). (5) Not normalizing tiny negative zeros (−0.0) after inverse transform, which can confuse equality checks. (6) Precision loss with very large coefficients or long polynomials: mitigate by splitting coefficients into 15–16-bit chunks or switching to NTT. (7) Incorrect bit-reversal or twiddle indexing in iterative FFT leads to scrambled outputs; test using identity properties (FFT followed by inverse should return original within 1e−6). Example: multiplying [1, 2] and [3, 4] should yield [3, 10, 8]; if you see [3, 6, 8], you likely forgot the inverse normalization.

Key Formulas

DFT definition

Explanation: This defines the discrete Fourier transform: is the value of the polynomial with coefficients evaluated at the k-th power of the primitive n-th root of unity. It converts from coefficient form to point-value form.

Inverse DFT

Explanation: This recovers coefficients from values. The minus sign conjugates the angle, and dividing by n normalizes the result so that IDFT(DFT(a)) = a.

Convolution (Cauchy product)

Explanation: This is the definition of linear convolution (polynomial multiplication) producing the coefficient of in the product. FFT computes this faster than the naive O() summation.

Convolution Theorem

Explanation: The DFT of the convolution equals pointwise multiplication of DFTs. This is the core reason we can multiply polynomials quickly using FFT.

FFT Recurrence

Explanation: The radix-2 Cooley–Tukey algorithm splits the DFT into two half-size DFTs and combines them in linear time, leading to O(n log n) complexity.

Padding length

Explanation: To compute a linear convolution of lengths n and m using FFT, pad both to length L, the next power of two at least n+m−1. This prevents circular wrap-around.

Half-turn symmetry

Explanation: This symmetry underlies the even-odd decomposition in radix-2 FFT, simplifying the combine step (butterflies).

FFT Complexity

Explanation: The number of arithmetic operations grows roughly like n times the number of recursion levels (log n). Memory scales linearly with the transform size.

Complexity Analysis

Let n and m be the input polynomial lengths and let L = 2^{ (n+m-1) }. An FFT-based convolution performs: two forward FFTs of size L, one pointwise multiplication of L complex numbers, and one inverse FFT of size L. Each FFT costs O(L L) complex operations due to the recurrence T(L) = 2T + O(L), which solves to O(L L). The pointwise multiplication is O(L). Therefore, the total time is O(L L). In terms of n and m, since L = (n+m), the complexity is O((n+m) (n+m)). For many problems where n m, this is written as O(n n). Space complexity is O(L) to store arrays of complex numbers for the forward transforms, the spectrum, and temporary buffers. Iterative in-place FFT uses O(1) auxiliary space beyond the arrays (plus storage for precomputed roots, also O(L)), while simple recursive implementations may incur O( L) stack depth. When coefficient splitting is used (e.g., into 15-bit chunks), you may run 2–3 independent convolutions; asymptotically, this remains O(L L) but with a constant factor multiplier (roughly threefold if done naively). With NTT, the asymptotics are identical, but constant factors differ (often favorable on integer-heavy CPUs). In practice, the performance depends critically on constant factors: precomputing twiddles, using iterative butterflies, minimizing allocations, and reusing buffers can cut runtime significantly. Floating-point precision can necessitate slightly larger L to improve numerical stability, but this does not change big-O complexity.

Code Examples

Recursive FFT for Polynomial Multiplication (Educational)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Recursive FFT using std::complex<double> (educational clarity)
5// Multiplies two integer polynomials a and b and returns integer coefficients.
6
7using cd = complex<double>;
8const double PI = acos(-1.0);
9
10void fft(vector<cd>& a, bool invert) {
11 int n = (int)a.size();
12 if (n == 1) return;
13
14 // Split into even and odd indices
15 vector<cd> a0(n/2), a1(n/2);
16 for (int i = 0; i < n/2; ++i) {
17 a0[i] = a[2*i];
18 a1[i] = a[2*i + 1];
19 }
20
21 // Recurse on halves
22 fft(a0, invert);
23 fft(a1, invert);
24
25 // Principal n-th root of unity
26 double ang = 2 * PI / n * (invert ? -1 : 1);
27 cd w(1), wn(cos(ang), sin(ang));
28
29 // Combine (butterfly)
30 for (int k = 0; k < n/2; ++k) {
31 cd t = w * a1[k];
32 a[k] = a0[k] + t;
33 a[k + n/2] = a0[k] - t;
34 w *= wn;
35 }
36
37 // For inverse FFT, normalize at the top level only (caller responsibility)
38}
39
40vector<long long> multiply_poly(const vector<long long>& A, const vector<long long>& B) {
41 // Prepare complex vectors and pad to power of two >= A.size()+B.size()-1
42 int n1 = (int)A.size(), n2 = (int)B.size();
43 int need = n1 + n2 - 1;
44 int n = 1; while (n < need) n <<= 1;
45
46 vector<cd> fa(n), fb(n);
47 for (int i = 0; i < n1; ++i) fa[i] = cd((double)A[i], 0);
48 for (int i = 0; i < n2; ++i) fb[i] = cd((double)B[i], 0);
49
50 // Forward FFT
51 fft(fa, false);
52 fft(fb, false);
53
54 // Pointwise multiply
55 for (int i = 0; i < n; ++i) fa[i] *= fb[i];
56
57 // Inverse FFT
58 fft(fa, true);
59 for (int i = 0; i < n; ++i) fa[i] /= (double)n; // normalize here
60
61 // Round to nearest integer and trim to needed size
62 vector<long long> C(need);
63 for (int i = 0; i < need; ++i) {
64 C[i] = llround(fa[i].real());
65 }
66 return C;
67}
68
69int main() {
70 ios::sync_with_stdio(false);
71 cin.tie(nullptr);
72
73 // Example: (1 + 2x + 3x^2) * (4 + 5x)
74 vector<long long> A = {1, 2, 3};
75 vector<long long> B = {4, 5};
76
77 vector<long long> C = multiply_poly(A, B);
78
79 // Expected: [4, 13, 22, 15]
80 for (size_t i = 0; i < C.size(); ++i) {
81 if (i) cout << ' ';
82 cout << C[i];
83 }
84 cout << '\n';
85 return 0;
86}
87

This code implements a clear recursive radix-2 FFT over complex doubles. We convert coefficient vectors to complex form, pad to the next power of two, take forward FFTs, multiply spectra elementwise, and apply the inverse FFT by calling the same routine with invert=true and dividing by n. Rounding produces integer coefficients.

Time: O(n log n)Space: O(n)
Iterative In-Place FFT (Bit-Reversal) for Fast Convolution
1#include <bits/stdc++.h>
2using namespace std;
3
4// Fast iterative radix-2 FFT with bit-reversal; recommended for performance.
5// Provides convolution for real integer coefficients with rounding.
6
7using cd = complex<double>;
8const double PI = acos(-1.0);
9
10void fft(vector<cd>& a, bool invert) {
11 int n = (int)a.size();
12
13 // Bit-reversal permutation
14 static vector<int> rev;
15 static vector<cd> roots{cd(0, 0), cd(1, 0)}; // roots[k] = e^{2πi / 2^k}
16 if ((int)rev.size() != n) {
17 int k = __builtin_ctz(n);
18 rev.assign(n, 0);
19 for (int i = 0; i < n; ++i) {
20 rev[i] = (rev[i >> 1] >> 1) | ((i & 1) << (k - 1));
21 }
22 // Precompute roots up to needed power
23 if ((int)roots.size() <= k) {
24 int old = roots.size();
25 roots.resize(k + 1);
26 for (int p = old; p <= k; ++p) {
27 double ang = 2 * PI / (1 << p);
28 roots[p] = cd(cos(ang), sin(ang)); // principal 2^p-th root
29 }
30 }
31 }
32
33 for (int i = 0; i < n; ++i) if (i < rev[i]) swap(a[i], a[rev[i]]);
34
35 for (int len = 1, p = 1; (len << 1) <= n; len <<= 1, ++p) {
36 cd wlen = roots[p];
37 if (invert) wlen = conj(wlen);
38 for (int i = 0; i < n; i += (len << 1)) {
39 cd w(1, 0);
40 for (int j = 0; j < len; ++j) {
41 cd u = a[i + j];
42 cd v = a[i + j + len] * w;
43 a[i + j] = u + v;
44 a[i + j + len] = u - v;
45 w *= wlen;
46 }
47 }
48 }
49
50 if (invert) {
51 for (cd &x : a) x /= (double)n;
52 }
53}
54
55vector<long long> convolution(const vector<long long>& A, const vector<long long>& B) {
56 int n1 = (int)A.size(), n2 = (int)B.size();
57 if (!n1 || !n2) return {};
58 int need = n1 + n2 - 1;
59 int n = 1; while (n < need) n <<= 1;
60
61 vector<cd> fa(n), fb(n);
62 for (int i = 0; i < n1; ++i) fa[i] = cd((double)A[i], 0);
63 for (int i = 0; i < n2; ++i) fb[i] = cd((double)B[i], 0);
64
65 fft(fa, false);
66 fft(fb, false);
67 for (int i = 0; i < n; ++i) fa[i] *= fb[i];
68 fft(fa, true);
69
70 vector<long long> C(need);
71 for (int i = 0; i < need; ++i) C[i] = llround(fa[i].real());
72 return C;
73}
74
75int main() {
76 ios::sync_with_stdio(false);
77 cin.tie(nullptr);
78
79 // Example: Convolution of two sequences (can model polynomial multiplication)
80 vector<long long> A = {3, 1, 4, 1, 5};
81 vector<long long> B = {2, 7, 1, 8};
82
83 auto C = convolution(A, B);
84
85 for (size_t i = 0; i < C.size(); ++i) {
86 if (i) cout << ' ';
87 cout << C[i];
88 }
89 cout << '\n';
90 return 0;
91}
92

This version implements an optimized iterative FFT: it performs an in-place bit-reversal permutation and iteratively applies butterflies with precomputed roots. It is faster and more memory-friendly than the recursive version and is a common competitive programming template.

Time: O(n log n)Space: O(n)
Modulo Convolution with Coefficient Splitting (Robust for Large Integers)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Convolution modulo a prime (e.g., 1e9+7) using floating-point FFT with 15-bit coefficient splitting.
5// This avoids precision issues when coefficients or results are large.
6
7using cd = complex<double>;
8const double PI = acos(-1.0);
9
10void fft(vector<cd>& a, bool invert) {
11 int n = (int)a.size();
12 static vector<int> rev;
13 static vector<cd> roots{cd(0, 0), cd(1, 0)};
14 if ((int)rev.size() != n) {
15 int k = __builtin_ctz(n);
16 rev.assign(n, 0);
17 for (int i = 0; i < n; ++i) rev[i] = (rev[i >> 1] >> 1) | ((i & 1) << (k - 1));
18 if ((int)roots.size() <= k) {
19 int old = roots.size();
20 roots.resize(k + 1);
21 for (int p = old; p <= k; ++p) {
22 double ang = 2 * PI / (1 << p);
23 roots[p] = cd(cos(ang), sin(ang));
24 }
25 }
26 }
27 for (int i = 0; i < n; ++i) if (i < rev[i]) swap(a[i], a[rev[i]]);
28 for (int len = 1, p = 1; (len << 1) <= n; len <<= 1, ++p) {
29 cd wlen = roots[p];
30 if (invert) wlen = conj(wlen);
31 for (int i = 0; i < n; i += (len << 1)) {
32 cd w(1, 0);
33 for (int j = 0; j < len; ++j) {
34 cd u = a[i + j];
35 cd v = a[i + j + len] * w;
36 a[i + j] = u + v;
37 a[i + j + len] = u - v;
38 w *= wlen;
39 }
40 }
41 }
42 if (invert) for (cd& x : a) x /= (double)n;
43}
44
45// Generic convolution over integers using FFT doubles, with rounding
46static vector<long long> convolution_ll(const vector<long long>& A, const vector<long long>& B) {
47 if (A.empty() || B.empty()) return {};
48 int need = (int)A.size() + (int)B.size() - 1;
49 int n = 1; while (n < need) n <<= 1;
50 vector<cd> fa(n), fb(n);
51 for (size_t i = 0; i < A.size(); ++i) fa[i] = cd((double)A[i], 0);
52 for (size_t i = 0; i < B.size(); ++i) fb[i] = cd((double)B[i], 0);
53 fft(fa, false); fft(fb, false);
54 for (int i = 0; i < n; ++i) fa[i] *= fb[i];
55 fft(fa, true);
56 vector<long long> C(need);
57 for (int i = 0; i < need; ++i) C[i] = llround(fa[i].real());
58 return C;
59}
60
61// Convolution modulo MOD using 15-bit splitting: a = a0 + a1*B, B=1<<15
62vector<int> convolution_mod(const vector<int>& A, const vector<int>& B, int MOD) {
63 const long long BASE = 1LL << 15; // 32768
64
65 // Split into low/high parts
66 vector<long long> a0(A.size()), a1(A.size()), b0(B.size()), b1(B.size());
67 for (size_t i = 0; i < A.size(); ++i) {
68 int x = A[i] % MOD; if (x < 0) x += MOD;
69 a0[i] = x & (BASE - 1);
70 a1[i] = x >> 15;
71 }
72 for (size_t i = 0; i < B.size(); ++i) {
73 int x = B[i] % MOD; if (x < 0) x += MOD;
74 b0[i] = x & (BASE - 1);
75 b1[i] = x >> 15;
76 }
77
78 // Perform three convolutions: c0=a0*b0, c1=a0*b1 + a1*b0, c2=a1*b1
79 vector<long long> c0 = convolution_ll(a0, b0);
80 vector<long long> c01 = convolution_ll(a0, b1);
81 vector<long long> c10 = convolution_ll(a1, b0);
82 vector<long long> c2 = convolution_ll(a1, b1);
83
84 size_t need = c0.size();
85 vector<int> res(need);
86 for (size_t i = 0; i < need; ++i) {
87 long long t0 = (i < c0.size() ? c0[i] : 0);
88 long long t1 = (i < c01.size() ? c01[i] : 0) + (i < c10.size() ? c10[i] : 0);
89 long long t2 = (i < c2.size() ? c2[i] : 0);
90 // Recombine: t0 + t1*BASE + t2*BASE^2 (mod MOD)
91 __int128 val = (__int128)t0 + (__int128)t1 * BASE + (__int128)t2 * BASE * BASE;
92 long long v = (long long)(val % MOD);
93 if (v < 0) v += MOD;
94 res[i] = (int)v;
95 }
96 return res;
97}
98
99int main() {
100 ios::sync_with_stdio(false);
101 cin.tie(nullptr);
102
103 const int MOD = 1'000'000'007;
104 // Example: multiply polynomials modulo 1e9+7 with potentially large coefficients
105 vector<int> A = { (int)1e9, (int)1e9, (int)1e9 };
106 vector<int> B = { (int)1e9, (int)1e9 };
107
108 auto C = convolution_mod(A, B, MOD);
109
110 for (size_t i = 0; i < C.size(); ++i) {
111 if (i) cout << ' ';
112 cout << C[i];
113 }
114 cout << '\n';
115 return 0;
116}
117

This implementation computes convolution modulo a large prime using floating-point FFT safely. It splits each coefficient into two 15-bit chunks (low and high), performs three real convolutions using a standard FFT with doubles, and recombines the results using BASE = 2^15. This mitigates rounding error because each convolution handles smaller magnitudes, allowing robust recovery modulo MOD.

Time: O(n log n)Space: O(n)