Convolution Applications
Key Points
- •Convolution turns local pairwise combinations (like matching characters or adding two dice) into a single fast transform–multiply–inverse pipeline.
- •Polynomial multiplication is exactly a convolution of coefficient arrays, which FFT/NTT can do in O(n n) time.
- •String matching can be solved by convolving indicator arrays of letters after reversing the pattern, then summing matches across letters.
- •Frequency arrays convolved give the distribution of sums, which is the core of many counting problems.
- •Bitwise AND/OR/XOR convolutions use different transforms (Walsh–Hadamard/Zeta) instead of the classic DFT/NTT.
- •Subset convolution aggregates over all A S via h[S] = f[A] g[S A], and is solved with subset transforms.
- •Careful zero-padding, correct reversing, and using modular NTT avoids precision issues in integer problems.
- •Convolutions are a powerful Swiss army knife for CF 2100–2500 problems: know which transform matches your algebraic combination.
Prerequisites
- →Polynomial multiplication — Convolution of coefficient arrays is exactly polynomial multiplication.
- →Modular arithmetic — NTT and FWT operate under a modulus; you must handle inverses and reductions correctly.
- →Discrete Fourier Transform — FFT/NTT rely on DFT properties to convert convolution into pointwise multiplication.
- →Bit operations and masks — XOR/OR/AND and subset convolutions index arrays by bitmasks and manipulate bits.
- →Asymptotic analysis — Choosing between naive O(nm), O(n log n), and O(n 2^n) requires understanding time/space tradeoffs.
- →Combinatorics on sets — Subset convolution and zeta/Möbius transforms are defined on the subset lattice.
Detailed Explanation
Tap terms for definitions01Overview
Convolution is a general technique to combine two sequences by sliding one over the other and summing products of aligned entries. If you think of sequences as polynomials, convolution of their coefficients is exactly polynomial multiplication. This unifying view lets us solve very different problems with the same computational skeleton: transform both inputs to a domain where convolution becomes pointwise multiplication, multiply componentwise, and transform back. The most famous example is the Fast Fourier Transform (FFT), which accelerates polynomial multiplication to O(n \log n). For integer-safe computations, the Number Theoretic Transform (NTT) does the same over finite fields modulo special primes. Beyond numeric polynomials, there are bitwise convolutions (XOR/AND/OR) handled by variants of the Walsh–Hadamard or zeta transforms, and there is subset convolution over bitmasks in combinatorial DP. Applications include fast string matching by aligning indicator arrays per character; counting sum distributions of random variables by convolving histograms; computing set-function convolutions in subset DP; and speeding up solutions that naturally express as sums over factorizations, partitions, or unions. Mastering these variants and when to use which transform is crucial for solving high-difficulty competitive programming problems efficiently.
02Intuition & Analogies
Imagine pushing a stencil across a long strip of tiles. At each position, you multiply the stencil numbers by the underlying tiles and sum to get a score. That sliding score is a convolution: local pairwise products summed as you slide. In string matching, the stencil is the pattern: for each letter, place 1 where that letter appears, 0 elsewhere. Sliding the reversed pattern indicator over the text and summing products counts how many letters match at that shift; if all m letters match, you have an occurrence. In dice-sum problems, each die has a histogram of face counts. The number of ways to get total t is the sum over i of (ways die A shows i) times (ways die B shows t−i), which is exactly the convolution of histograms. For bitwise problems, think of each index as a bitmask universe of items. XOR convolution mixes subsets by symmetric difference; OR convolution mixes so that each bit in the output is present if it is present in at least one input subset in the combination. Specialized transforms change the basis to make these operations become simple pointwise multiplications, just like FFTs do for numeric convolution. Subset convolution generalizes the idea: for every set S, you want to split it into two disjoint parts A and S\A, and sum f[A]g[S\A] over all choices. This is the same sliding-and-summing spirit, but across the lattice of subsets instead of along a line. The magic is always the same: pick the right transform so that a complicated combination becomes multiply-and-invert.
03Formal Definition
04When to Use
Use classic numeric convolution (FFT/NTT) whenever your problem reduces to sliding sums of products across indices, or to multiplying polynomials/series: string matching with equality or wildcard letters, counting sums of independent integer-valued variables, computing coefficients in combinatorial generating functions, or finding autocorrelations. Prefer NTT over floating-point FFT when your data are integers and exactness matters (e.g., counting modulo 998244353). Use XOR/OR/AND convolutions when combining set-indexed functions via symmetric difference, union, or intersection respectively—typical in problems about subsets, masks, or DP over bitmasks where the combination rule follows a bitwise operator. Reach for subset convolution when for each mask S you must split S into two disjoint parts A and S\A and aggregate f[A]g[S\A]; examples include counting ordered partitions, computing ways to build a set from components, or speeding up DP transitions that add disjoint substructures. In many CF tasks you will mix techniques: e.g., do a zeta transform by supersets, pointwise multiply arrays bucketed by popcount, then invert; or convolve multiple indicator arrays and combine results. The rule of thumb: identify the algebraic structure of your combination (sum over i+j=k? over u\oplus v=s? over A\subseteq S?), then choose the transform that diagonalizes it.
⚠️Common Mistakes
• Forgetting to zero-pad to at least n+m−1 when doing FFT/NTT leads to circular convolution (wrap-around), producing incorrect coefficients. • In string matching by convolution, not reversing the pattern indicators causes alignment to be off by m−1; the correct match index is at k = i + m − 1. • Using floating-point FFT for large integer coefficients without care results in rounding errors; prefer NTT or use coefficient splitting with rounding safeguards. • For NTT, choosing an arbitrary modulus breaks primitive roots: you must use a prime of the form p = c\cdot 2^k + 1 and a known primitive root; also normalize by n^{−1} mod p in the inverse. • In FWT XOR, forgetting the final division by the vector length (or repeated div by 2 per stage) yields results off by a power of 2; in modular arithmetic, multiply by modular inverse of size. • Mixing up OR vs AND transforms: their butterfly updates differ; verify forward/inverse formulas and test on tiny arrays. • Subset convolution indices are easy to off-by-one: ensure you use S\A (set difference) and not S XOR A; also watch complexity—naive double-subset loops are O(3^n), use transforms. • Overflow in intermediate products: even under a modulus, cast to 64-bit during multiplication before reducing.
Key Formulas
Discrete Convolution
Explanation: Each output index k sums the products of f and a reversed-and-shifted g. For finite arrays, the sum is effectively over valid i where both indices are in range.
Polynomial Multiplication
Explanation: Multiplying polynomials corresponds to convolving their coefficient arrays. The coefficient of in H is the sum of f[i]g[k-i] over i.
Convolution Theorem
Explanation: The DFT turns convolution into elementwise multiplication. Compute DFTs of both inputs, multiply pointwise, then inverse DFT to get h.
NTT Modulus and Root
Explanation: An NTT of length up to 2^k exists modulo prime p if p−1 is divisible by 2^k. The primitive 2^k-th root of unity is computed from a primitive root g.
XOR Convolution
Explanation: Combines indices by bitwise XOR. The Walsh–Hadamard transform diagonalizes this operation so it becomes pointwise multiplication in the transformed basis.
OR Convolution
Explanation: Combines indices by bitwise OR. Apply OR-zeta transform to both arrays, multiply pointwise, then invert with OR-Möbius to obtain h.
Subset Convolution
Explanation: For each set S, split it into two disjoint parts A and S and sum products. This can be computed for all S using subset zeta transforms bucketed by popcount.
FFT/NTT Complexity
Explanation: Computing one convolution of length n with FFT/NTT is O(n log n). This reflects the transform (n log n), pointwise multiplication (n), and inverse transform (n log n).
FWT Complexity
Explanation: Walsh–Hadamard or zeta transforms run in O(n 2^n) for arrays indexed by n-bit masks. There are n stages and the array has 2^n elements.
Sum Distribution
Explanation: When A and B are histograms of independent nonnegative integer variables, C is the histogram of their sum. This is exactly a convolution.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // NTT implementation modulo 998244353 5 // Supports convolution of integer vectors with exact arithmetic. 6 7 const uint32_t MOD = 998244353; // 119 * 2^23 + 1 8 const uint32_t PRIMITIVE_ROOT = 3; 9 10 uint32_t modpow(uint32_t a, uint32_t e) { 11 uint64_t r = 1; 12 uint64_t x = a; 13 while (e) { 14 if (e & 1) r = (r * x) % MOD; 15 x = (x * x) % MOD; 16 e >>= 1; 17 } 18 return (uint32_t)r; 19 } 20 21 void ntt(vector<uint32_t>& a, bool invert) { 22 int n = (int)a.size(); 23 static vector<int> rev; 24 static vector<uint32_t> roots{0,1}; 25 26 if ((int)rev.size() != n) { 27 int k = __builtin_ctz(n); 28 rev.assign(n, 0); 29 for (int i = 0; i < n; i++) { 30 rev[i] = (rev[i >> 1] >> 1) | ((i & 1) << (k - 1)); 31 } 32 } 33 if ((int)roots.size() < n) { 34 int k = __builtin_ctz(roots.size()); 35 roots.resize(n); 36 while ((1 << k) < n) { 37 uint32_t z = modpow(PRIMITIVE_ROOT, (MOD - 1) >> (k + 1)); 38 for (int i = 1 << (k - 1); i < (1 << k); i++) { 39 roots[2 * i] = roots[i]; 40 roots[2 * i + 1] = (uint64_t)roots[i] * z % MOD; 41 } 42 k++; 43 } 44 } 45 46 for (int i = 0; i < n; i++) if (i < rev[i]) swap(a[i], a[rev[i]]); 47 48 for (int len = 1; len < n; len <<= 1) { 49 for (int i = 0; i < n; i += 2 * len) { 50 for (int j = 0; j < len; j++) { 51 uint32_t u = a[i + j]; 52 uint32_t v = (uint64_t)a[i + j + len] * roots[len + j] % MOD; 53 a[i + j] = u + v < MOD ? u + v : u + v - MOD; 54 a[i + j + len] = u >= v ? u - v : u + MOD - v; 55 } 56 } 57 } 58 59 if (invert) { 60 // Inverse NTT using reversed roots: compute a^{-1} by reversing all except a[0] 61 reverse(a.begin() + 1, a.end()); 62 uint32_t inv_n = modpow(n, MOD - 2); 63 for (int i = 0; i < n; i++) a[i] = (uint64_t)a[i] * inv_n % MOD; 64 } 65 } 66 67 vector<uint32_t> convolution(const vector<uint32_t>& A, const vector<uint32_t>& B) { 68 int n = (int)A.size(), m = (int)B.size(); 69 if (!n || !m) return {}; 70 int need = n + m - 1; 71 int L = 1; while (L < need) L <<= 1; 72 vector<uint32_t> fa(L), fb(L); 73 copy(A.begin(), A.end(), fa.begin()); 74 copy(B.begin(), B.end(), fb.begin()); 75 ntt(fa, false); ntt(fb, false); 76 for (int i = 0; i < L; i++) fa[i] = (uint64_t)fa[i] * fb[i] % MOD; 77 ntt(fa, true); 78 fa.resize(need); 79 return fa; 80 } 81 82 int main() { 83 ios::sync_with_stdio(false); 84 cin.tie(nullptr); 85 string T, P; 86 // Input: text T and pattern P over lowercase letters; '?' in P matches any char. 87 // Example: 88 // Input: 89 // abacaba 90 // a?a 91 if (!(cin >> T)) return 0; 92 cin >> P; 93 int n = (int)T.size(), m = (int)P.size(); 94 vector<uint32_t> total(n + m - 1, 0); 95 96 int wildcard = 0; 97 for (char ch : P) if (ch == '?') wildcard++; 98 99 // For each letter, build indicator arrays and convolve 100 for (char c = 'a'; c <= 'z'; c++) { 101 vector<uint32_t> A(n, 0), B(m, 0); 102 for (int i = 0; i < n; i++) if (T[i] == c) A[i] = 1; 103 for (int j = 0; j < m; j++) if (P[j] == c) B[m - 1 - j] = 1; // reverse pattern 104 auto C = convolution(A, B); 105 if ((int)C.size() != n + m - 1) C.resize(n + m - 1, 0); 106 for (int k = 0; k < n + m - 1; k++) { 107 uint32_t s = total[k] + C[k]; 108 total[k] = s >= MOD ? s - MOD : s; 109 } 110 } 111 112 // A full match at position i (0-based) occurs if matches + wildcard == m. 113 vector<int> ans; 114 for (int i = 0; i + m <= n; i++) { 115 uint32_t matches = total[i + m - 1]; 116 if ((int)matches + wildcard == m) ans.push_back(i); 117 } 118 119 cout << ans.size() << "\n"; 120 for (int i = 0; i < (int)ans.size(); i++) { 121 if (i) cout << ' '; 122 cout << ans[i]; 123 } 124 cout << "\n"; 125 return 0; 126 } 127
We represent the text and pattern as 26 indicator arrays (one per letter). For each letter c, convolving A_c (text) with reversed B_c (pattern) counts matches of c at every alignment. Summing these counts over letters gives, at position i+m−1, the number of matched fixed letters at shift i. A position is a match if this equals the pattern length minus the number of wildcards. NTT ensures exact integer arithmetic modulo 998244353.
1 #include <bits/stdc++.h> 2 using namespace std; 3 const uint32_t MOD = 998244353; const uint32_t G = 3; 4 5 uint32_t modpow(uint32_t a, uint32_t e){ uint64_t r=1,x=a; while(e){ if(e&1) r=r*x%MOD; x=x*x%MOD; e>>=1;} return (uint32_t)r; } 6 7 void ntt(vector<uint32_t>& a, bool inv){ int n=a.size(); static vector<int> rev; static vector<uint32_t> roots{0,1}; if((int)rev.size()!=n){ int k=__builtin_ctz(n); rev.assign(n,0); for(int i=0;i<n;i++) rev[i]=(rev[i>>1]>>1)|((i&1)<<(k-1)); } 8 if((int)roots.size()<n){ int k=__builtin_ctz(roots.size()); roots.resize(n); while((1<<k)<n){ uint32_t z=modpow(G,(MOD-1)>>(k+1)); for(int i=1<<(k-1); i<(1<<k); ++i){ roots[2*i]=roots[i]; roots[2*i+1]=(uint64_t)roots[i]*z%MOD; } ++k; } } 9 for(int i=0;i<n;i++) if(i<rev[i]) swap(a[i],a[rev[i]]); 10 for(int len=1; len<n; len<<=1){ for(int i=0;i<n;i+=2*len){ for(int j=0;j<len;j++){ uint32_t u=a[i+j]; uint32_t v=(uint64_t)a[i+j+len]*roots[len+j]%MOD; a[i+j]=u+v<MOD?u+v:u+v-MOD; a[i+j+len]=u>=v?u-v:u+MOD-v; } } } 11 if(inv){ reverse(a.begin()+1,a.end()); uint32_t inv_n=modpow(n,MOD-2); for(int i=0;i<n;i++) a[i]=(uint64_t)a[i]*inv_n%MOD; } 12 } 13 14 vector<uint32_t> convolution(const vector<uint32_t>& A, const vector<uint32_t>& B){ int n=A.size(), m=B.size(); if(!n||!m) return {}; int need=n+m-1; int L=1; while(L<need) L<<=1; vector<uint32_t> fa(L), fb(L); copy(A.begin(),A.end(),fa.begin()); copy(B.begin(),B.end(),fb.begin()); ntt(fa,false); ntt(fb,false); for(int i=0;i<L;i++) fa[i]=(uint64_t)fa[i]*fb[i]%MOD; ntt(fa,true); fa.resize(need); return fa; } 15 16 int main(){ ios::sync_with_stdio(false); cin.tie(nullptr); 17 int Amax, Bmax; // Maximum values (support) of two integer variables 18 if(!(cin>>Amax>>Bmax)) return 0; 19 vector<uint32_t> f(Amax+1), g(Bmax+1); 20 // Read histograms: counts of occurrences for each value 0..Amax and 0..Bmax 21 for(int i=0;i<=Amax;i++){ uint64_t x; cin>>x; f[i]=x%MOD; } 22 for(int j=0;j<=Bmax;j++){ uint64_t y; cin>>y; g[j]=y%MOD; } 23 auto h = convolution(f,g); 24 // h[t] is the count (mod MOD) of achieving sum t 25 cout<<h.size()<<"\n"; 26 for(size_t i=0;i<h.size();i++){ 27 if(i) cout<<' '; 28 cout<<h[i]; 29 } 30 cout<<"\n"; 31 return 0; 32 } 33
Given two histograms f and g (counts at each integer), the histogram of sums is their convolution. We compute it with an NTT-based convolution for exactness modulo 998244353. This is a template for dice problems, knapsack-like independent sums, or combining independent distributions.
1 #include <bits/stdc++.h> 2 using namespace std; 3 const int MOD = 998244353; 4 long long modpow(long long a,long long e){ long long r=1%MOD; while(e){ if(e&1) r=r*a%MOD; a=a*a%MOD; e>>=1;} return r; } 5 6 // In-place FWT for XOR convolution. If invert=true, performs inverse transform. 7 void fwt_xor(vector<int>& a, bool invert){ int n=a.size(); for(int len=1; 2*len<=n; len<<=1){ for(int i=0;i<n;i+=2*len){ for(int j=0;j<len;j++){ int u=a[i+j]; int v=a[i+j+len]; a[i+j]=(u+v>=MOD?u+v-MOD:u+v); a[i+j+len]=(u-v<0?u-v+MOD:u-v); } } } if(invert){ long long invN = modpow(n, MOD-2); for(int &x:a) x = (long long)x*invN%MOD; } } 8 9 // OR-zeta transform and its inverse (Mbius) for OR convolution. 10 void fwt_or(vector<int>& a, bool invert){ int n=a.size(); for(int len=1; 2*len<=n; len<<=1){ for(int i=0;i<n;i+=2*len){ for(int j=0;j<len;j++){ int u=a[i+j]; int v=a[i+j+len]; if(!invert){ // forward OR-zeta 11 a[i+j] = u; // unchanged 12 a[i+j+len] = (u+v>=MOD?u+v-MOD:u+v); 13 } else { // inverse OR-Mbius 14 a[i+j] = u; // unchanged 15 a[i+j+len] = (v-u<0?v-u+MOD:v-u); 16 } 17 } } 18 } } 19 20 int main(){ ios::sync_with_stdio(false); cin.tie(nullptr); 21 int k; if(!(cin>>k)) return 0; int N=1<<k; vector<int> f(N), g(N); 22 for(int i=0;i<N;i++){ long long x; cin>>x; f[i]=x%MOD; } 23 for(int i=0;i<N;i++){ long long x; cin>>x; g[i]=x%MOD; } 24 25 // XOR convolution 26 vector<int> fx=f, gx=g; fwt_xor(fx,false); fwt_xor(gx,false); 27 vector<int> hx(N); 28 for(int i=0;i<N;i++) hx[i] = (long long)fx[i]*gx[i]%MOD; 29 fwt_xor(hx,true); 30 31 // OR convolution 32 vector<int> fo=f, go=g; fwt_or(fo,false); fwt_or(go,false); 33 vector<int> ho(N); 34 for(int i=0;i<N;i++) ho[i] = (long long)fo[i]*go[i]%MOD; 35 fwt_or(ho,true); 36 37 // Output both results 38 for(int i=0;i<N;i++){ if(i) cout<<' '; cout<<hx[i]; } cout<<"\n"; 39 for(int i=0;i<N;i++){ if(i) cout<<' '; cout<<ho[i]; } cout<<"\n"; 40 return 0; 41 } 42
XOR convolution uses the Walsh–Hadamard transform: after the forward transform, convolution becomes pointwise multiplication; the inverse normalizes by 1/N. OR convolution uses the OR-zeta transform that accumulates over subsets where a bit is set; its inverse is the OR-Möbius transform. This program reads two length-2^k arrays and outputs both their XOR and OR convolutions modulo 998244353.
1 #include <bits/stdc++.h> 2 using namespace std; 3 const int MOD = 998244353; 4 int addmod(int a,int b){ a+=b; if(a>=MOD) a-=MOD; return a; } 5 int submod(int a,int b){ a-=b; if(a<0) a+=MOD; return a; } 6 7 int main(){ ios::sync_with_stdio(false); cin.tie(nullptr); 8 int n; if(!(cin>>n)) return 0; int N=1<<n; 9 vector<int> f(N), g(N); for(int i=0;i<N;i++){ long long x; cin>>x; f[i]=x%MOD; } 10 for(int i=0;i<N;i++){ long long x; cin>>x; g[i]=x%MOD; } 11 12 // Bucket by popcount 13 vector<vector<int>> F(n+1, vector<int>(N,0)), G(n+1, vector<int>(N,0)); 14 for(int mask=0; mask<N; ++mask){ int pc=__builtin_popcount((unsigned)mask); F[pc][mask]=f[mask]; G[pc][mask]=g[mask]; } 15 16 // Subset zeta transform on each bucket: for each bit b, accumulate from subset without b to subset with b. 17 for(int b=0; b<n; ++b){ int bit=1<<b; for(int k=0; k<=n; ++k){ for(int mask=0; mask<N; ++mask){ if(mask & bit){ F[k][mask] = addmod(F[k][mask], F[k][mask ^ bit]); G[k][mask] = addmod(G[k][mask], G[k][mask ^ bit]); } } } } 18 19 // Pointwise convolution over k (popcount) dimensions 20 vector<vector<int>> H(n+1, vector<int>(N,0)); 21 for(int mask=0; mask<N; ++mask){ for(int k=0; k<=n; ++k){ long long acc=0; for(int i=0; i<=k; ++i){ acc += (long long)F[i][mask] * G[k-i][mask]; if(acc >= (1LL<<61)) acc %= MOD; } H[k][mask] = (int)(acc % MOD); } } 22 23 // Inverse subset zeta transform (Mbius) on each H[k] 24 for(int b=0; b<n; ++b){ int bit=1<<b; for(int k=0; k<=n; ++k){ for(int mask=0; mask<N; ++mask){ if(mask & bit){ H[k][mask] = submod(H[k][mask], H[k][mask ^ bit]); } } } } 25 26 // The result h[mask] is H[popcount(mask)][mask] 27 vector<int> h(N,0); 28 for(int mask=0; mask<N; ++mask){ int pc=__builtin_popcount((unsigned)mask); h[mask]=H[pc][mask]; } 29 30 for(int i=0;i<N;i++){ if(i) cout<<' '; cout<<h[i]; } cout<<"\n"; 31 return 0; 32 } 33
We compute subset convolution by bucketing f and g by popcount and performing subset zeta transforms in each bucket. After zeta, for each mask we convolve along the popcount dimension (a short polynomial product). Finally we invert the zeta transform and extract h[mask] at index equal to popcount(mask). This is the standard subset convolution technique practical up to n≈20–22.