⚙️AlgorithmAdvanced

Convolution Applications

Key Points

•
Convolution turns local pairwise combinations (like matching characters or adding two dice) into a single fast transform–multiply–inverse pipeline.
•
Polynomial multiplication is exactly a convolution of coefficient arrays, which FFT/NTT can do in O(n $lo g$ n) time.
•
String matching can be solved by convolving indicator arrays of letters after reversing the pattern, then summing matches across letters.
•
Frequency arrays convolved give the distribution of sums, which is the core of many counting problems.
•
Bitwise AND/OR/XOR convolutions use different transforms (Walsh–Hadamard/Zeta) instead of the classic DFT/NTT.
•
Subset convolution aggregates over all A $\subseteq$ S via h[S] = $\sum_{A \subseteq S}$ f[A] g[S $∖$ A], and is solved with subset transforms.
•
Careful zero-padding, correct reversing, and using modular NTT avoids precision issues in integer problems.
•
Convolutions are a powerful Swiss army knife for CF 2100–2500 problems: know which transform matches your algebraic combination.

Prerequisites

→Polynomial multiplication — Convolution of coefficient arrays is exactly polynomial multiplication.
→Modular arithmetic — NTT and FWT operate under a modulus; you must handle inverses and reductions correctly.
→Discrete Fourier Transform — FFT/NTT rely on DFT properties to convert convolution into pointwise multiplication.
→Bit operations and masks — XOR/OR/AND and subset convolutions index arrays by bitmasks and manipulate bits.
→Asymptotic analysis — Choosing between naive O(nm), O(n log n), and O(n 2^n) requires understanding time/space tradeoffs.
→Combinatorics on sets — Subset convolution and zeta/Möbius transforms are defined on the subset lattice.

Detailed Explanation

Tap terms for definitions

01Overview

Convolution is a general technique to combine two sequences by sliding one over the other and summing products of aligned entries. If you think of sequences as polynomials, convolution of their coefficients is exactly polynomial multiplication. This unifying view lets us solve very different problems with the same computational skeleton: transform both inputs to a domain where convolution becomes pointwise multiplication, multiply componentwise, and transform back. The most famous example is the Fast Fourier Transform (FFT), which accelerates polynomial multiplication to O(n \log n). For integer-safe computations, the Number Theoretic Transform (NTT) does the same over finite fields modulo special primes. Beyond numeric polynomials, there are bitwise convolutions (XOR/AND/OR) handled by variants of the Walsh–Hadamard or zeta transforms, and there is subset convolution over bitmasks in combinatorial DP. Applications include fast string matching by aligning indicator arrays per character; counting sum distributions of random variables by convolving histograms; computing set-function convolutions in subset DP; and speeding up solutions that naturally express as sums over factorizations, partitions, or unions. Mastering these variants and when to use which transform is crucial for solving high-difficulty competitive programming problems efficiently.

02Intuition & Analogies

Imagine pushing a stencil across a long strip of tiles. At each position, you multiply the stencil numbers by the underlying tiles and sum to get a score. That sliding score is a convolution: local pairwise products summed as you slide. In string matching, the stencil is the pattern: for each letter, place 1 where that letter appears, 0 elsewhere. Sliding the reversed pattern indicator over the text and summing products counts how many letters match at that shift; if all m letters match, you have an occurrence. In dice-sum problems, each die has a histogram of face counts. The number of ways to get total t is the sum over i of (ways die A shows i) times (ways die B shows t−i), which is exactly the convolution of histograms. For bitwise problems, think of each index as a bitmask universe of items. XOR convolution mixes subsets by symmetric difference; OR convolution mixes so that each bit in the output is present if it is present in at least one input subset in the combination. Specialized transforms change the basis to make these operations become simple pointwise multiplications, just like FFTs do for numeric convolution. Subset convolution generalizes the idea: for every set S, you want to split it into two disjoint parts A and S\A, and sum f[A]g[S\A] over all choices. This is the same sliding-and-summing spirit, but across the lattice of subsets instead of along a line. The magic is always the same: pick the right transform so that a complicated combination becomes multiply-and-invert.

03Formal Definition

Given sequences f and g indexed by integers, their (discrete) linear convolution h is defined by h[k] =

\sum_{i \in Z}

f[i] g[k - i]. For finite sequences of lengths n and m, the sum is effectively i from 0 to n−1 with k−i in [0, m−1]. Interpreting f and g as polynomials F(x) =

\sum_{i = 0}^{n - 1}

f[i]

x^{i}

and G(x) =

\sum_{j = 0}^{m - 1}

g[j]

x^{j}

, the Cauchy product gives coefficients of H(x) = F(x)G(x) as h[k] =

\sum_{i + j = k}

f[i]g[j]. The convolution theorem states that the Discrete Fourier Transform (DFT) converts convolution into pointwise multiplication:

DFT

(h) =

DFT

(f)

⊙

DFT

(g). Over finite fields, the Number Theoretic Transform (NTT) is a DFT analog using primitive roots of unity modulo a prime p = c

\cdot

2^k + 1. For bitwise-indexed sequences a[0..2^n−1], we define XOR convolution (a

\otimes_{\oplus}

b)[s] =

\sum_{u \oplus v = s}

a[u] b[v]; OR convolution (a

\otimes_{\lor}

b)[s] =

\sum_{u \lor v = s}

a[u] b[v]; AND convolution (a

\otimes_{\land}

b)[s] =

\sum_{u \land v = s}

a[u] b[v]. These are diagonalized by Walsh–Hadamard or zeta transforms. Subset convolution for set functions f, g: 2^{[n]}

\to

R is (f * g)[S] =

\sum_{A \subseteq S}

f[A] g[S

∖

A], central in combinatorial DP on subsets.

04When to Use

Use classic numeric convolution (FFT/NTT) whenever your problem reduces to sliding sums of products across indices, or to multiplying polynomials/series: string matching with equality or wildcard letters, counting sums of independent integer-valued variables, computing coefficients in combinatorial generating functions, or finding autocorrelations. Prefer NTT over floating-point FFT when your data are integers and exactness matters (e.g., counting modulo 998244353). Use XOR/OR/AND convolutions when combining set-indexed functions via symmetric difference, union, or intersection respectively—typical in problems about subsets, masks, or DP over bitmasks where the combination rule follows a bitwise operator. Reach for subset convolution when for each mask S you must split S into two disjoint parts A and S\A and aggregate f[A]g[S\A]; examples include counting ordered partitions, computing ways to build a set from components, or speeding up DP transitions that add disjoint substructures. In many CF tasks you will mix techniques: e.g., do a zeta transform by supersets, pointwise multiply arrays bucketed by popcount, then invert; or convolve multiple indicator arrays and combine results. The rule of thumb: identify the algebraic structure of your combination (sum over i+j=k? over u\oplus v=s? over A\subseteq S?), then choose the transform that diagonalizes it.

⚠️Common Mistakes

• Forgetting to zero-pad to at least n+m−1 when doing FFT/NTT leads to circular convolution (wrap-around), producing incorrect coefficients. • In string matching by convolution, not reversing the pattern indicators causes alignment to be off by m−1; the correct match index is at k = i + m − 1. • Using floating-point FFT for large integer coefficients without care results in rounding errors; prefer NTT or use coefficient splitting with rounding safeguards. • For NTT, choosing an arbitrary modulus breaks primitive roots: you must use a prime of the form p = c\cdot 2^k + 1 and a known primitive root; also normalize by n^{−1} mod p in the inverse. • In FWT XOR, forgetting the final division by the vector length (or repeated div by 2 per stage) yields results off by a power of 2; in modular arithmetic, multiply by modular inverse of size. • Mixing up OR vs AND transforms: their butterfly updates differ; verify forward/inverse formulas and test on tiny arrays. • Subset convolution indices are easy to off-by-one: ensure you use S\A (set difference) and not S XOR A; also watch complexity—naive double-subset loops are O(3^n), use transforms. • Overflow in intermediate products: even under a modulus, cast to 64-bit during multiplication before reducing.

Key Formulas

Discrete Convolution

h [k] = i = - \infty \sum \infty f [i] g [k - i]

Explanation: Each output index k sums the products of f and a reversed-and-shifted g. For finite arrays, the sum is effectively over valid i where both indices are in range.

Polynomial Multiplication

H (x) = F (x) G (x), F (x) = i = 0 \sum n - 1 f [i] x^{i}, G (x) = j = 0 \sum m - 1 g [j] x^{j}

Explanation: Multiplying polynomials corresponds to convolving their coefficient arrays. The coefficient of $x^{k}$ in H is the sum of f[i]g[k-i] over i.

Convolution Theorem

DFT (h) = DFT (f) ⊙ DFT (g)

Explanation: The DFT turns convolution into elementwise multiplication. Compute DFTs of both inputs, multiply pointwise, then inverse DFT to get h.

NTT Modulus and Root

p = c \cdot 2^{k} + 1, ω = g^{\frac{p - 1}{2 ^{k}}} mod p

Explanation: An NTT of length up to 2^k exists modulo prime p if p−1 is divisible by 2^k. The primitive 2^k-th root of unity is computed from a primitive root g.

XOR Convolution

h [s] = u \oplus v = s \sum f [u] g [v]

Explanation: Combines indices by bitwise XOR. The Walsh–Hadamard transform diagonalizes this operation so it becomes pointwise multiplication in the transformed basis.

OR Convolution

h [s] = u \lor v = s \sum f [u] g [v]

Explanation: Combines indices by bitwise OR. Apply OR-zeta transform to both arrays, multiply pointwise, then invert with OR-Möbius to obtain h.

Subset Convolution

(f * g) [S] = A \subseteq S \sum f [A] g [S ∖ A]

Explanation: For each set S, split it into two disjoint parts A and S $\A$ and sum products. This can be computed for all S using subset zeta transforms bucketed by popcount.

FFT/NTT Complexity

T (n) = O (n lo g n)

Explanation: Computing one convolution of length n with FFT/NTT is O(n log n). This reflects the transform (n log n), pointwise multiplication (n), and inverse transform (n log n).

FWT Complexity

T (2^{n}) = O (n \cdot 2^{n})

Explanation: Walsh–Hadamard or zeta transforms run in O(n 2^n) for arrays indexed by n-bit masks. There are n stages and the array has 2^n elements.

Sum Distribution

C [k] = i = 0 \sum k A [i] B [k - i]

Explanation: When A and B are histograms of independent nonnegative integer variables, C is the histogram of their sum. This is exactly a convolution.

Complexity Analysis

For classic numeric convolution of two arrays with lengths n and m, the naive O(nm) method is too slow when n and m are large. FFT/NTT reduces this to O(L

lo g

L) where L is the next power of two at least n+m−1. Memory is O(L) for arrays and twiddle factors. In integer problems, NTT with a suitable modulus (e.g., 998244353) avoids floating-point error; if coefficients may exceed the modulus, CRT or coefficient splitting is needed. In string matching by indicator convolutions over an alphabet of size

σ

, performing

σ

NTTs yields O(

σ

lo g

L) time and O(L) extra space; often

σ

is small (e.g., 26). For frequency-sum distributions, a single NTT suffices, O(L

lo g

L). For bitwise convolutions over arrays of size

N = 2^{n}

, the Fast Walsh–Hadamard Transform (XOR) and the subset/superset zeta transforms run in O(n N) time and O(N) memory; both forward and inverse transforms have the same complexity. Performing two transforms, one pointwise multiplication, and one inverse transform keeps overall O(n N). Subset convolution across all masks can be computed with bucket-by-popcount zeta transforms; a standard implementation runs in O(

n^{2}

2^n) time and O(n 2^n) memory, practical up to n≈20–22. Optimizations and careful constant factors (bit loops, in-place operations, precomputed popcounts) are crucial for competitive programming limits. Always account for transform size constraints (maximum power-of-two length supported by your modulus/root) and normalize correctly in inverse transforms to avoid wrong answers.

Code Examples

String Matching with Wildcards via NTT-based Convolution

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // NTT implementation modulo 998244353
5 // Supports convolution of integer vectors with exact arithmetic.
6 
7 const uint32_t MOD = 998244353; // 119 * 2^23 + 1
8 const uint32_t PRIMITIVE_ROOT = 3;
9 
10 uint32_t modpow(uint32_t a, uint32_t e) {
11     uint64_t r = 1;
12     uint64_t x = a;
13     while (e) {
14         if (e & 1) r = (r * x) % MOD;
15         x = (x * x) % MOD;
16         e >>= 1;
17     }
18     return (uint32_t)r;
19 }
20 
21 void ntt(vector<uint32_t>& a, bool invert) {
22     int n = (int)a.size();
23     static vector<int> rev;
24     static vector<uint32_t> roots{0,1};
25 
26     if ((int)rev.size() != n) {
27         int k = __builtin_ctz(n);
28         rev.assign(n, 0);
29         for (int i = 0; i < n; i++) {
30             rev[i] = (rev[i >> 1] >> 1) | ((i & 1) << (k - 1));
31         }
32     }
33     if ((int)roots.size() < n) {
34         int k = __builtin_ctz(roots.size());
35         roots.resize(n);
36         while ((1 << k) < n) {
37             uint32_t z = modpow(PRIMITIVE_ROOT, (MOD - 1) >> (k + 1));
38             for (int i = 1 << (k - 1); i < (1 << k); i++) {
39                 roots[2 * i] = roots[i];
40                 roots[2 * i + 1] = (uint64_t)roots[i] * z % MOD;
41             }
42             k++;
43         }
44     }
45 
46     for (int i = 0; i < n; i++) if (i < rev[i]) swap(a[i], a[rev[i]]);
47 
48     for (int len = 1; len < n; len <<= 1) {
49         for (int i = 0; i < n; i += 2 * len) {
50             for (int j = 0; j < len; j++) {
51                 uint32_t u = a[i + j];
52                 uint32_t v = (uint64_t)a[i + j + len] * roots[len + j] % MOD;
53                 a[i + j] = u + v < MOD ? u + v : u + v - MOD;
54                 a[i + j + len] = u >= v ? u - v : u + MOD - v;
55             }
56         }
57     }
58 
59     if (invert) {
60         // Inverse NTT using reversed roots: compute a^{-1} by reversing all except a[0]
61         reverse(a.begin() + 1, a.end());
62         uint32_t inv_n = modpow(n, MOD - 2);
63         for (int i = 0; i < n; i++) a[i] = (uint64_t)a[i] * inv_n % MOD;
64     }
65 }
66 
67 vector<uint32_t> convolution(const vector<uint32_t>& A, const vector<uint32_t>& B) {
68     int n = (int)A.size(), m = (int)B.size();
69     if (!n || !m) return {};
70     int need = n + m - 1;
71     int L = 1; while (L < need) L <<= 1;
72     vector<uint32_t> fa(L), fb(L);
73     copy(A.begin(), A.end(), fa.begin());
74     copy(B.begin(), B.end(), fb.begin());
75     ntt(fa, false); ntt(fb, false);
76     for (int i = 0; i < L; i++) fa[i] = (uint64_t)fa[i] * fb[i] % MOD;
77     ntt(fa, true);
78     fa.resize(need);
79     return fa;
80 }
81 
82 int main() {
83     ios::sync_with_stdio(false);
84     cin.tie(nullptr);
85     string T, P;
86     // Input: text T and pattern P over lowercase letters; '?' in P matches any char.
87     // Example:
88     // Input:
89     // abacaba
90     // a?a
91     if (!(cin >> T)) return 0;
92     cin >> P;
93     int n = (int)T.size(), m = (int)P.size();
94     vector<uint32_t> total(n + m - 1, 0);
95 
96     int wildcard = 0;
97     for (char ch : P) if (ch == '?') wildcard++;
98 
99     // For each letter, build indicator arrays and convolve
100     for (char c = 'a'; c <= 'z'; c++) {
101         vector<uint32_t> A(n, 0), B(m, 0);
102         for (int i = 0; i < n; i++) if (T[i] == c) A[i] = 1;
103         for (int j = 0; j < m; j++) if (P[j] == c) B[m - 1 - j] = 1; // reverse pattern
104         auto C = convolution(A, B);
105         if ((int)C.size() != n + m - 1) C.resize(n + m - 1, 0);
106         for (int k = 0; k < n + m - 1; k++) {
107             uint32_t s = total[k] + C[k];
108             total[k] = s >= MOD ? s - MOD : s;
109         }
110     }
111 
112     // A full match at position i (0-based) occurs if matches + wildcard == m.
113     vector<int> ans;
114     for (int i = 0; i + m <= n; i++) {
115         uint32_t matches = total[i + m - 1];
116         if ((int)matches + wildcard == m) ans.push_back(i);
117     }
118 
119     cout << ans.size() << "\n";
120     for (int i = 0; i < (int)ans.size(); i++) {
121         if (i) cout << ' ';
122         cout << ans[i];
123     }
124     cout << "\n";
125     return 0;
126 }
127

We represent the text and pattern as 26 indicator arrays (one per letter). For each letter c, convolving A_c (text) with reversed B_c (pattern) counts matches of c at every alignment. Summing these counts over letters gives, at position i+m−1, the number of matched fixed letters at shift i. A position is a match if this equals the pattern length minus the number of wildcards. NTT ensures exact integer arithmetic modulo 998244353.

Time: O(26 · L log L) where L ≥ n + m − 1 is a power of twoSpace: O(L)

Counting Sum Distributions via NTT (Histogram Convolution)

1 #include <bits/stdc++.h>
2 using namespace std;
3 const uint32_t MOD = 998244353; const uint32_t G = 3;
4 
5 uint32_t modpow(uint32_t a, uint32_t e){ uint64_t r=1,x=a; while(e){ if(e&1) r=r*x%MOD; x=x*x%MOD; e>>=1;} return (uint32_t)r; }
6 
7 void ntt(vector<uint32_t>& a, bool inv){ int n=a.size(); static vector<int> rev; static vector<uint32_t> roots{0,1}; if((int)rev.size()!=n){ int k=__builtin_ctz(n); rev.assign(n,0); for(int i=0;i<n;i++) rev[i]=(rev[i>>1]>>1)|((i&1)<<(k-1)); }
8     if((int)roots.size()<n){ int k=__builtin_ctz(roots.size()); roots.resize(n); while((1<<k)<n){ uint32_t z=modpow(G,(MOD-1)>>(k+1)); for(int i=1<<(k-1); i<(1<<k); ++i){ roots[2*i]=roots[i]; roots[2*i+1]=(uint64_t)roots[i]*z%MOD; } ++k; } }
9     for(int i=0;i<n;i++) if(i<rev[i]) swap(a[i],a[rev[i]]);
10     for(int len=1; len<n; len<<=1){ for(int i=0;i<n;i+=2*len){ for(int j=0;j<len;j++){ uint32_t u=a[i+j]; uint32_t v=(uint64_t)a[i+j+len]*roots[len+j]%MOD; a[i+j]=u+v<MOD?u+v:u+v-MOD; a[i+j+len]=u>=v?u-v:u+MOD-v; } } }
11     if(inv){ reverse(a.begin()+1,a.end()); uint32_t inv_n=modpow(n,MOD-2); for(int i=0;i<n;i++) a[i]=(uint64_t)a[i]*inv_n%MOD; }
12 }
13 
14 vector<uint32_t> convolution(const vector<uint32_t>& A, const vector<uint32_t>& B){ int n=A.size(), m=B.size(); if(!n||!m) return {}; int need=n+m-1; int L=1; while(L<need) L<<=1; vector<uint32_t> fa(L), fb(L); copy(A.begin(),A.end(),fa.begin()); copy(B.begin(),B.end(),fb.begin()); ntt(fa,false); ntt(fb,false); for(int i=0;i<L;i++) fa[i]=(uint64_t)fa[i]*fb[i]%MOD; ntt(fa,true); fa.resize(need); return fa; }
15 
16 int main(){ ios::sync_with_stdio(false); cin.tie(nullptr);
17     int Amax, Bmax; // Maximum values (support) of two integer variables
18     if(!(cin>>Amax>>Bmax)) return 0;
19     vector<uint32_t> f(Amax+1), g(Bmax+1);
20     // Read histograms: counts of occurrences for each value 0..Amax and 0..Bmax
21     for(int i=0;i<=Amax;i++){ uint64_t x; cin>>x; f[i]=x%MOD; }
22     for(int j=0;j<=Bmax;j++){ uint64_t y; cin>>y; g[j]=y%MOD; }
23     auto h = convolution(f,g);
24     // h[t] is the count (mod MOD) of achieving sum t
25     cout<<h.size()<<"\n";
26     for(size_t i=0;i<h.size();i++){
27         if(i) cout<<' ';
28         cout<<h[i];
29     }
30     cout<<"\n";
31     return 0;
32 }
33

Given two histograms f and g (counts at each integer), the histogram of sums is their convolution. We compute it with an NTT-based convolution for exactness modulo 998244353. This is a template for dice problems, knapsack-like independent sums, or combining independent distributions.

Time: O(L log L) where L ≥ (Amax + Bmax + 1) is a power of twoSpace: O(L)

XOR and OR Convolutions via Fast Walsh–Hadamard/Zeta Transforms

1 #include <bits/stdc++.h>
2 using namespace std;
3 const int MOD = 998244353;
4 long long modpow(long long a,long long e){ long long r=1%MOD; while(e){ if(e&1) r=r*a%MOD; a=a*a%MOD; e>>=1;} return r; }
5 
6 // In-place FWT for XOR convolution. If invert=true, performs inverse transform.
7 void fwt_xor(vector<int>& a, bool invert){ int n=a.size(); for(int len=1; 2*len<=n; len<<=1){ for(int i=0;i<n;i+=2*len){ for(int j=0;j<len;j++){ int u=a[i+j]; int v=a[i+j+len]; a[i+j]=(u+v>=MOD?u+v-MOD:u+v); a[i+j+len]=(u-v<0?u-v+MOD:u-v); } } } if(invert){ long long invN = modpow(n, MOD-2); for(int &x:a) x = (long long)x*invN%MOD; } }
8 
9 // OR-zeta transform and its inverse (Mbius) for OR convolution.
10 void fwt_or(vector<int>& a, bool invert){ int n=a.size(); for(int len=1; 2*len<=n; len<<=1){ for(int i=0;i<n;i+=2*len){ for(int j=0;j<len;j++){ int u=a[i+j]; int v=a[i+j+len]; if(!invert){ // forward OR-zeta
11                     a[i+j] = u; // unchanged
12                     a[i+j+len] = (u+v>=MOD?u+v-MOD:u+v);
13                 } else { // inverse OR-Mbius
14                     a[i+j] = u; // unchanged
15                     a[i+j+len] = (v-u<0?v-u+MOD:v-u);
16                 }
17             } }
18     } }
19 
20 int main(){ ios::sync_with_stdio(false); cin.tie(nullptr);
21     int k; if(!(cin>>k)) return 0; int N=1<<k; vector<int> f(N), g(N);
22     for(int i=0;i<N;i++){ long long x; cin>>x; f[i]=x%MOD; }
23     for(int i=0;i<N;i++){ long long x; cin>>x; g[i]=x%MOD; }
24 
25     // XOR convolution
26     vector<int> fx=f, gx=g; fwt_xor(fx,false); fwt_xor(gx,false);
27     vector<int> hx(N);
28     for(int i=0;i<N;i++) hx[i] = (long long)fx[i]*gx[i]%MOD;
29     fwt_xor(hx,true);
30 
31     // OR convolution
32     vector<int> fo=f, go=g; fwt_or(fo,false); fwt_or(go,false);
33     vector<int> ho(N);
34     for(int i=0;i<N;i++) ho[i] = (long long)fo[i]*go[i]%MOD;
35     fwt_or(ho,true);
36 
37     // Output both results
38     for(int i=0;i<N;i++){ if(i) cout<<' '; cout<<hx[i]; } cout<<"\n";
39     for(int i=0;i<N;i++){ if(i) cout<<' '; cout<<ho[i]; } cout<<"\n";
40     return 0;
41 }
42

XOR convolution uses the Walsh–Hadamard transform: after the forward transform, convolution becomes pointwise multiplication; the inverse normalizes by 1/N. OR convolution uses the OR-zeta transform that accumulates over subsets where a bit is set; its inverse is the OR-Möbius transform. This program reads two length-2^k arrays and outputs both their XOR and OR convolutions modulo 998244353.

Time: O(k · 2^k) per transform; overall O(k · 2^k) for each convolutionSpace: O(2^k)

Subset Convolution over Bitmasks (h[S] = sum_{A ⊆ S} f[A] g[S\A])

1 #include <bits/stdc++.h>
2 using namespace std;
3 const int MOD = 998244353;
4 int addmod(int a,int b){ a+=b; if(a>=MOD) a-=MOD; return a; }
5 int submod(int a,int b){ a-=b; if(a<0) a+=MOD; return a; }
6 
7 int main(){ ios::sync_with_stdio(false); cin.tie(nullptr);
8     int n; if(!(cin>>n)) return 0; int N=1<<n;
9     vector<int> f(N), g(N); for(int i=0;i<N;i++){ long long x; cin>>x; f[i]=x%MOD; }
10     for(int i=0;i<N;i++){ long long x; cin>>x; g[i]=x%MOD; }
11 
12     // Bucket by popcount
13     vector<vector<int>> F(n+1, vector<int>(N,0)), G(n+1, vector<int>(N,0));
14     for(int mask=0; mask<N; ++mask){ int pc=__builtin_popcount((unsigned)mask); F[pc][mask]=f[mask]; G[pc][mask]=g[mask]; }
15 
16     // Subset zeta transform on each bucket: for each bit b, accumulate from subset without b to subset with b.
17     for(int b=0; b<n; ++b){ int bit=1<<b; for(int k=0; k<=n; ++k){ for(int mask=0; mask<N; ++mask){ if(mask & bit){ F[k][mask] = addmod(F[k][mask], F[k][mask ^ bit]); G[k][mask] = addmod(G[k][mask], G[k][mask ^ bit]); } } } }
18 
19     // Pointwise convolution over k (popcount) dimensions
20     vector<vector<int>> H(n+1, vector<int>(N,0));
21     for(int mask=0; mask<N; ++mask){ for(int k=0; k<=n; ++k){ long long acc=0; for(int i=0; i<=k; ++i){ acc += (long long)F[i][mask] * G[k-i][mask]; if(acc >= (1LL<<61)) acc %= MOD; } H[k][mask] = (int)(acc % MOD); } }
22 
23     // Inverse subset zeta transform (Mbius) on each H[k]
24     for(int b=0; b<n; ++b){ int bit=1<<b; for(int k=0; k<=n; ++k){ for(int mask=0; mask<N; ++mask){ if(mask & bit){ H[k][mask] = submod(H[k][mask], H[k][mask ^ bit]); } } } }
25 
26     // The result h[mask] is H[popcount(mask)][mask]
27     vector<int> h(N,0);
28     for(int mask=0; mask<N; ++mask){ int pc=__builtin_popcount((unsigned)mask); h[mask]=H[pc][mask]; }
29 
30     for(int i=0;i<N;i++){ if(i) cout<<' '; cout<<h[i]; } cout<<"\n";
31     return 0;
32 }
33

We compute subset convolution by bucketing f and g by popcount and performing subset zeta transforms in each bucket. After zeta, for each mask we convolve along the popcount dimension (a short polynomial product). Finally we invert the zeta transform and extract h[mask] at index equal to popcount(mask). This is the standard subset convolution technique practical up to n≈20–22.

Time: O(n^2 · 2^n) in this implementationSpace: O(n · 2^n)

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// NTT implementation modulo 998244353
5	// Supports convolution of integer vectors with exact arithmetic.
6
7	const uint32_t MOD = 998244353; // 119 * 2^23 + 1
8	const uint32_t PRIMITIVE_ROOT = 3;
9
10	uint32_t modpow(uint32_t a, uint32_t e) {
11	uint64_t r = 1;
12	uint64_t x = a;
13	while (e) {
14	if (e & 1) r = (r * x) % MOD;
15	x = (x * x) % MOD;
16	e >>= 1;
17	}
18	return (uint32_t)r;
19	}
20
21	void ntt(vector<uint32_t>& a, bool invert) {
22	int n = (int)a.size();
23	static vector<int> rev;
24	static vector<uint32_t> roots{0,1};
25
26	if ((int)rev.size() != n) {
27	int k = __builtin_ctz(n);
28	rev.assign(n, 0);
29	for (int i = 0; i < n; i++) {
30	rev[i] = (rev[i >> 1] >> 1) \| ((i & 1) << (k - 1));
31	}
32	}
33	if ((int)roots.size() < n) {
34	int k = __builtin_ctz(roots.size());
35	roots.resize(n);
36	while ((1 << k) < n) {
37	uint32_t z = modpow(PRIMITIVE_ROOT, (MOD - 1) >> (k + 1));
38	for (int i = 1 << (k - 1); i < (1 << k); i++) {
39	roots[2 * i] = roots[i];
40	roots[2 * i + 1] = (uint64_t)roots[i] * z % MOD;
41	}
42	k++;
43	}
44	}
45
46	for (int i = 0; i < n; i++) if (i < rev[i]) swap(a[i], a[rev[i]]);
47
48	for (int len = 1; len < n; len <<= 1) {
49	for (int i = 0; i < n; i += 2 * len) {
50	for (int j = 0; j < len; j++) {
51	uint32_t u = a[i + j];
52	uint32_t v = (uint64_t)a[i + j + len] * roots[len + j] % MOD;
53	a[i + j] = u + v < MOD ? u + v : u + v - MOD;
54	a[i + j + len] = u >= v ? u - v : u + MOD - v;
55	}
56	}
57	}
58
59	if (invert) {
60	// Inverse NTT using reversed roots: compute a^{-1} by reversing all except a[0]
61	reverse(a.begin() + 1, a.end());
62	uint32_t inv_n = modpow(n, MOD - 2);
63	for (int i = 0; i < n; i++) a[i] = (uint64_t)a[i] * inv_n % MOD;
64	}
65	}
66
67	vector<uint32_t> convolution(const vector<uint32_t>& A, const vector<uint32_t>& B) {
68	int n = (int)A.size(), m = (int)B.size();
69	if (!n \|\| !m) return {};
70	int need = n + m - 1;
71	int L = 1; while (L < need) L <<= 1;
72	vector<uint32_t> fa(L), fb(L);
73	copy(A.begin(), A.end(), fa.begin());
74	copy(B.begin(), B.end(), fb.begin());
75	ntt(fa, false); ntt(fb, false);
76	for (int i = 0; i < L; i++) fa[i] = (uint64_t)fa[i] * fb[i] % MOD;
77	ntt(fa, true);
78	fa.resize(need);
79	return fa;
80	}
81
82	int main() {
83	ios::sync_with_stdio(false);
84	cin.tie(nullptr);
85	string T, P;
86	// Input: text T and pattern P over lowercase letters; '?' in P matches any char.
87	// Example:
88	// Input:
89	// abacaba
90	// a?a
91	if (!(cin >> T)) return 0;
92	cin >> P;
93	int n = (int)T.size(), m = (int)P.size();
94	vector<uint32_t> total(n + m - 1, 0);
95
96	int wildcard = 0;
97	for (char ch : P) if (ch == '?') wildcard++;
98
99	// For each letter, build indicator arrays and convolve
100	for (char c = 'a'; c <= 'z'; c++) {
101	vector<uint32_t> A(n, 0), B(m, 0);
102	for (int i = 0; i < n; i++) if (T[i] == c) A[i] = 1;
103	for (int j = 0; j < m; j++) if (P[j] == c) B[m - 1 - j] = 1; // reverse pattern
104	auto C = convolution(A, B);
105	if ((int)C.size() != n + m - 1) C.resize(n + m - 1, 0);
106	for (int k = 0; k < n + m - 1; k++) {
107	uint32_t s = total[k] + C[k];
108	total[k] = s >= MOD ? s - MOD : s;
109	}
110	}
111
112	// A full match at position i (0-based) occurs if matches + wildcard == m.
113	vector<int> ans;
114	for (int i = 0; i + m <= n; i++) {
115	uint32_t matches = total[i + m - 1];
116	if ((int)matches + wildcard == m) ans.push_back(i);
117	}
118
119	cout << ans.size() << "\n";
120	for (int i = 0; i < (int)ans.size(); i++) {
121	if (i) cout << ' ';
122	cout << ans[i];
123	}
124	cout << "\n";
125	return 0;
126	}
127

1	#include <bits/stdc++.h>
2	using namespace std;
3	const uint32_t MOD = 998244353; const uint32_t G = 3;
4
5	uint32_t modpow(uint32_t a, uint32_t e){ uint64_t r=1,x=a; while(e){ if(e&1) r=rx%MOD; x=xx%MOD; e>>=1;} return (uint32_t)r; }
6
7	void ntt(vector<uint32_t>& a, bool inv){ int n=a.size(); static vector<int> rev; static vector<uint32_t> roots{0,1}; if((int)rev.size()!=n){ int k=__builtin_ctz(n); rev.assign(n,0); for(int i=0;i<n;i++) rev[i]=(rev[i>>1]>>1)\|((i&1)<<(k-1)); }
8	if((int)roots.size()<n){ int k=__builtin_ctz(roots.size()); roots.resize(n); while((1<<k)<n){ uint32_t z=modpow(G,(MOD-1)>>(k+1)); for(int i=1<<(k-1); i<(1<<k); ++i){ roots[2i]=roots[i]; roots[2i+1]=(uint64_t)roots[i]*z%MOD; } ++k; } }
9	for(int i=0;i<n;i++) if(i<rev[i]) swap(a[i],a[rev[i]]);
10	for(int len=1; len<n; len<<=1){ for(int i=0;i<n;i+=2len){ for(int j=0;j<len;j++){ uint32_t u=a[i+j]; uint32_t v=(uint64_t)a[i+j+len]roots[len+j]%MOD; a[i+j]=u+v<MOD?u+v:u+v-MOD; a[i+j+len]=u>=v?u-v:u+MOD-v; } } }
11	if(inv){ reverse(a.begin()+1,a.end()); uint32_t inv_n=modpow(n,MOD-2); for(int i=0;i<n;i++) a[i]=(uint64_t)a[i]*inv_n%MOD; }
12	}
13
14	vector<uint32_t> convolution(const vector<uint32_t>& A, const vector<uint32_t>& B){ int n=A.size(), m=B.size(); if(!n\|\|!m) return {}; int need=n+m-1; int L=1; while(L<need) L<<=1; vector<uint32_t> fa(L), fb(L); copy(A.begin(),A.end(),fa.begin()); copy(B.begin(),B.end(),fb.begin()); ntt(fa,false); ntt(fb,false); for(int i=0;i<L;i++) fa[i]=(uint64_t)fa[i]*fb[i]%MOD; ntt(fa,true); fa.resize(need); return fa; }
15
16	int main(){ ios::sync_with_stdio(false); cin.tie(nullptr);
17	int Amax, Bmax; // Maximum values (support) of two integer variables
18	if(!(cin>>Amax>>Bmax)) return 0;
19	vector<uint32_t> f(Amax+1), g(Bmax+1);
20	// Read histograms: counts of occurrences for each value 0..Amax and 0..Bmax
21	for(int i=0;i<=Amax;i++){ uint64_t x; cin>>x; f[i]=x%MOD; }
22	for(int j=0;j<=Bmax;j++){ uint64_t y; cin>>y; g[j]=y%MOD; }
23	auto h = convolution(f,g);
24	// h[t] is the count (mod MOD) of achieving sum t
25	cout<<h.size()<<"\n";
26	for(size_t i=0;i<h.size();i++){
27	if(i) cout<<' ';
28	cout<<h[i];
29	}
30	cout<<"\n";
31	return 0;
32	}
33

1	#include <bits/stdc++.h>
2	using namespace std;
3	const int MOD = 998244353;
4	long long modpow(long long a,long long e){ long long r=1%MOD; while(e){ if(e&1) r=ra%MOD; a=aa%MOD; e>>=1;} return r; }
5
6	// In-place FWT for XOR convolution. If invert=true, performs inverse transform.
7	void fwt_xor(vector<int>& a, bool invert){ int n=a.size(); for(int len=1; 2len<=n; len<<=1){ for(int i=0;i<n;i+=2len){ for(int j=0;j<len;j++){ int u=a[i+j]; int v=a[i+j+len]; a[i+j]=(u+v>=MOD?u+v-MOD:u+v); a[i+j+len]=(u-v<0?u-v+MOD:u-v); } } } if(invert){ long long invN = modpow(n, MOD-2); for(int &x:a) x = (long long)x*invN%MOD; } }
8
9	// OR-zeta transform and its inverse (Mbius) for OR convolution.
10	void fwt_or(vector<int>& a, bool invert){ int n=a.size(); for(int len=1; 2len<=n; len<<=1){ for(int i=0;i<n;i+=2len){ for(int j=0;j<len;j++){ int u=a[i+j]; int v=a[i+j+len]; if(!invert){ // forward OR-zeta
11	a[i+j] = u; // unchanged
12	a[i+j+len] = (u+v>=MOD?u+v-MOD:u+v);
13	} else { // inverse OR-Mbius
14	a[i+j] = u; // unchanged
15	a[i+j+len] = (v-u<0?v-u+MOD:v-u);
16	}
17	} }
18	} }
19
20	int main(){ ios::sync_with_stdio(false); cin.tie(nullptr);
21	int k; if(!(cin>>k)) return 0; int N=1<<k; vector<int> f(N), g(N);
22	for(int i=0;i<N;i++){ long long x; cin>>x; f[i]=x%MOD; }
23	for(int i=0;i<N;i++){ long long x; cin>>x; g[i]=x%MOD; }
24
25	// XOR convolution
26	vector<int> fx=f, gx=g; fwt_xor(fx,false); fwt_xor(gx,false);
27	vector<int> hx(N);
28	for(int i=0;i<N;i++) hx[i] = (long long)fx[i]*gx[i]%MOD;
29	fwt_xor(hx,true);
30
31	// OR convolution
32	vector<int> fo=f, go=g; fwt_or(fo,false); fwt_or(go,false);
33	vector<int> ho(N);
34	for(int i=0;i<N;i++) ho[i] = (long long)fo[i]*go[i]%MOD;
35	fwt_or(ho,true);
36
37	// Output both results
38	for(int i=0;i<N;i++){ if(i) cout<<' '; cout<<hx[i]; } cout<<"\n";
39	for(int i=0;i<N;i++){ if(i) cout<<' '; cout<<ho[i]; } cout<<"\n";
40	return 0;
41	}
42