Subset Sum Convolution
Key Points
- •Subset Sum Convolution (often called Subset Convolution) computes C[S] by summing A[T]×B[U] over all disjoint pairs T and U whose union is S.
- •It can be reduced to polynomial multiplication in the popcount (subset-size) dimension for every fixed S.
- •The standard implementation uses SOS DP (subset zeta transform) per popcount layer, yielding O( 2^n) time and O(n 2^n) space.
- •Möbius inversion over the subset lattice undoes the zeta transform to recover the final answer.
- •This technique is crucial for subset-partition DPs where transitions select a disjoint sub-subset at each step.
- •Be careful to handle the empty subset correctly; setting weights for empty blocks often needs special treatment.
- •Modulo arithmetic is recommended to avoid overflow, and careful rank bookkeeping by popcount is essential.
- •Fast Subset Convolution variants exist with O(n 2^n) time but are more complex; the O( 2^n) approach is simpler and sufficient for most CF tasks.
Prerequisites
- →Bitmask DP — Subset convolution is defined over bitmask-indexed arrays and frequently replaces naive subset-iteration transitions.
- →SOS DP (Subset Zeta/Möbius Transforms) — The algorithm relies on zeta transforms over subsets and their Möbius inverses.
- →Popcount and Binary Representation — Layering by subset size (popcount) is central to turning the convolution into polynomial-like multiplication.
- →Modular Arithmetic — To avoid overflow and maintain correctness when summing/multiplying large numbers.
- →Polynomial Convolution Basics — Understanding why combining layers is equivalent to multiplying polynomials in the rank dimension.
- →Inclusion–Exclusion Principle — Möbius inversion on the subset lattice is a direct application of inclusion–exclusion.
Detailed Explanation
Tap terms for definitions01Overview
Subset Sum Convolution (commonly known as Subset Convolution) is an operation on functions defined over all subsets of an n-element ground set. Given two arrays A and B indexed by bitmasks (subsets), the goal is to compute an array C where each entry C[S] sums products A[T]×B[U] over all ways to split S into two disjoint parts T and U whose union is S. This captures 'build S by choosing a disjoint piece for A and the remaining piece for B'. A naive implementation would enumerate all T ⊆ S for every S, leading to O(3^n) work. The key insight is to reorganize the computation by subset size (popcount) and apply the SOS (Sum Over Subsets) transform. By grouping values by the number of bits set and running a zeta transform per size layer, we convert the convolution over disjoint unions into pointwise polynomial multiplications in the popcount dimension for each fixed S. Finally, a Möbius inversion undoes the zeta transform to retrieve C. The standard approach runs in O(n^2 2^n) time and O(n 2^n) space, which is practical up to n≈20–22 in competitive programming when implemented carefully with iterative bit-DP loops and modulo arithmetic.
02Intuition & Analogies
Imagine you are assembling a team S from a pool of n people. You want to split this team into two disjoint subteams T and U (their union is the whole team S) and then score the split as A[T]×B[U]. The final score for S, C[S], is the sum of scores over all possible disjoint splits. Naively, you would try every subset T of S and take the complement U = S \ T, which is too slow. Instead, think of every fixed S as a shelf containing all its subteams T, organized by size (how many people are in T). For this shelf S, make two 'catalogs': one that, for each size r, lists the cumulative total of A over all r-sized subteams of S, and similarly for B. These are the zeta-transformed layers. Now the beautiful trick: picking a split of S into T and U with sizes p and q is the same as picking degrees p and q that add to |S| in a product of two polynomials. So for each S, you multiply the two size-indexed 'polynomials' (the catalogs) and take coefficients that correspond to valid splits. This converts an exponential search into a structured multiplication across sizes. When you do this for all S simultaneously using SOS DP, you massively reduce the cost. However, the catalogs we built were cumulative over all sub-subsets; to extract the exact split contributions, we need to 'un-cumulate' the values, exactly what Möbius inversion over subsets does. The process is just like performing a subset-version of FFT: transform (zeta), multiply per S in the degree dimension, and inverse transform (Möbius).
03Formal Definition
04When to Use
Use Subset Sum Convolution when your DP transition builds a set S by selecting a disjoint piece T and combining it with S\T in a multiplicative way. Typical scenarios include: (1) Partition DPs: You want the number or total weight of ways to partition S into blocks where each block contributes a weight w[block]; then transitions look like F = w * F (subset convolution) iterated by the number of blocks. (2) Cover DPs: Counting covers or decompositions where you repeatedly pick a sub-subset T and combine with the remainder disjointly. (3) Problems that ask for counts over all ordered/unordered ways to split S into two labeled parts with weights A and B. (4) Accelerating O(3^n) subset transitions that enumerate T ⊆ S by replacing them with O(n^2 2^n) zeta-based convolutions. If your combining rule does not require disjointness (e.g., OR convolution without the i & j = 0 constraint), then consider other transforms (OR/AND/XOR FWT). If your n is very small (n ≤ 18) and constants are tight, a direct O(3^n) may suffice; otherwise, for n ≈ 20–22, the O(n^2 2^n) subset convolution is often the sweet spot in CF 2200–2600 problems.
⚠️Common Mistakes
• Confusing subset convolution with OR/AND convolutions: subset convolution requires i & j = 0 and i | j = k, i.e., a disjoint union forming exactly k. OR/AND FWT do not enforce disjointness. Double-check the combining rule. • Mishandling the empty set: In many partition DPs, the block-weight for the empty set must be zero (w[∅] = 0); otherwise, you overcount infinitely many ways by repeatedly choosing ∅. Ensure base cases like F_0[∅] = 1 and F_0[S≠∅] = 0 are correct. • Forgetting Möbius inversion or applying the wrong transform direction: The forward zeta sums over subsets; the inverse must subtract to undo accumulation. Using the superset-variant by mistake yields wrong answers. • Off-by-one in rank bookkeeping: Layers are indexed by popcount r ∈ [0..n]. When combining layers, ensure you only sum p + q = r and finally read C[S] from layer r = |S|. • Integer overflow and modulo bugs: Intermediate products can exceed 64-bit for large moduli or values. Use modulo arithmetic consistently, and normalize negatives after Möbius inversion (add MOD back). • Memory layout inefficiency: Using vectors of vectors with frequent reallocation or poor cache locality can TLE. Prefer flat arrays with shape [(n+1) × 2^n] and in-place bit DP. • Complexity underestimation: The transforms cost O(n^2 2^n). For n ≈ 22, use fast IO, tight loops, and avoid unnecessary conditionals inside the hottest loops.
Key Formulas
Subset Convolution Definition
Explanation: For each set S, sum over all ways to split S into a subsubset T and its complement in S. This is the natural 'disjoint union' convolution on the subset lattice.
Bitmask Form
Explanation: In bitwise terms, we only allow pairs (i,j) that are disjoint (i & j = 0) and whose union equals k (i | j = k). This is equivalent to summing over T ⊆ S.
Rank-Restricted Zeta
Explanation: We group values by subset size r and zeta-transform each layer over subsets. This prepares for polynomial-like multiplication in the size dimension.
Convolution in Popcount
Explanation: For each fixed S, the vectors ([S],...,[S]) and ([S],...,[S]) act as coefficients of polynomials in degree r; their product’s degree-r coefficient is [S].
Subset Möbius Inversion
Explanation: This inverts the subset zeta transform. After accumulating over all submasks, we use this formula (implemented iteratively) to recover exact, non-accumulated values.
Final Extraction
Explanation: After inverse-transforming the -layers, the answer for mask S lives in the layer equal to its popcount r = .
Layer Sizes
Explanation: Masks are partitioned by popcount into n+1 layers whose sizes sum to 2^n. This underlies the layered algorithm and its memory footprint.
Time and Space Complexity
Explanation: The n factor comes from iterating bits in the zeta/Möbius transforms, and another n from convolving across ranks. Space stores n+1 layers of size 2^n.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Computes C[S] = sum_{T subset S} A[T] * B[S \ T] (subset convolution) 5 // Complexity: O(n^2 * 2^n) time, O(n * 2^n) space 6 // MOD must fit in 64-bit and be positive; operations are done modulo MOD. 7 8 static inline int popcnt_u32(unsigned x) { return __builtin_popcount(x); } 9 10 vector<long long> subset_convolution(const vector<long long>& A, 11 const vector<long long>& B, 12 int n, 13 long long MOD) { 14 const int N = 1 << n; 15 // Layered buffers: index [r][S] flattened as r*N + S for cache efficiency 16 vector<long long> F((n + 1) * N, 0), G((n + 1) * N, 0), H((n + 1) * N, 0); 17 18 // Initialize layers by popcount 19 for (int S = 0; S < N; ++S) { 20 int r = popcnt_u32((unsigned)S); 21 F[r * N + S] = (A[S] % MOD + MOD) % MOD; 22 G[r * N + S] = (B[S] % MOD + MOD) % MOD; 23 } 24 25 // Subset Zeta Transform over subsets for each rank layer 26 auto zeta_subsets = [&](vector<long long>& X){ 27 for (int b = 0; b < n; ++b) { 28 for (int r = 0; r <= n; ++r) { 29 long long* base = X.data() + r * N; 30 for (int S = 0; S < N; ++S) { 31 if (S & (1 << b)) { 32 long long v = base[S - (1 << b)]; 33 long long& cur = base[S]; 34 cur += v; 35 if (cur >= MOD) cur -= MOD; 36 } 37 } 38 } 39 } 40 }; 41 42 zeta_subsets(F); 43 zeta_subsets(G); 44 45 // Convolution in the popcount dimension: H_r[S] = sum_{p=0..r} F_p[S] * G_{r-p}[S] 46 for (int S = 0; S < N; ++S) { 47 for (int r = 0; r <= n; ++r) { 48 __int128 acc = 0; // avoid overflow before modulo 49 for (int p = 0; p <= r; ++p) { 50 long long fp = F[p * N + S]; 51 long long gq = G[(r - p) * N + S]; 52 acc += (__int128)fp * gq; 53 } 54 long long val = (long long)(acc % MOD); 55 if (val < 0) val += MOD; 56 H[r * N + S] = val; 57 } 58 } 59 60 // Inverse Subset Zeta (Möbius) over subsets for each rank layer 61 auto mobius_subsets = [&](vector<long long>& X){ 62 for (int b = 0; b < n; ++b) { 63 for (int r = 0; r <= n; ++r) { 64 long long* base = X.data() + r * N; 65 for (int S = 0; S < N; ++S) { 66 if (S & (1 << b)) { 67 long long& cur = base[S]; 68 long long v = base[S - (1 << b)]; 69 cur -= v; 70 if (cur < 0) cur += MOD; 71 } 72 } 73 } 74 } 75 }; 76 77 mobius_subsets(H); 78 79 // Extract final answer: C[S] lives in H[|S|][S] 80 vector<long long> C(N, 0); 81 for (int S = 0; S < N; ++S) { 82 int r = popcnt_u32((unsigned)S); 83 C[S] = H[r * N + S] % MOD; 84 } 85 return C; 86 } 87 88 int main() { 89 ios::sync_with_stdio(false); 90 cin.tie(nullptr); 91 92 // Example usage: 93 // Input: n, arrays A and B of length 2^n 94 int n; long long MOD = 1000000007LL; 95 if (!(cin >> n)) return 0; 96 int N = 1 << n; 97 vector<long long> A(N), B(N); 98 for (int i = 0; i < N; ++i) cin >> A[i]; 99 for (int i = 0; i < N; ++i) cin >> B[i]; 100 101 auto C = subset_convolution(A, B, n, MOD); 102 103 for (int i = 0; i < N; ++i) { 104 if (i) cout << ' '; 105 cout << C[i]; 106 } 107 cout << '\n'; 108 return 0; 109 } 110
This program implements the classic layered subset convolution. It groups A and B by popcount into (n+1) layers, applies the subset zeta transform per layer, multiplies the layers per mask as a polynomial in the rank dimension, and finally applies Möbius inversion per layer. The result C[S] equals Σ_{T⊆S} A[T]·B[S\T] modulo MOD.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // We model a DP counting ordered partitions of a set S into k nonempty blocks. 5 // Let w[T] be the weight of choosing block T (w[0] = 0). Then 6 // F_0[empty] = 1, F_0[S != empty] = 0 and for k >= 1: F_k = w (*) F_{k-1} 7 // where (*) is subset convolution: (w (*) F_{k-1})[S] = sum_{T subset S} w[T] * F_{k-1}[S\T]. 8 // Summing F_k over k=0..n gives ordered-partition counts when w[T] = 1 for |T|>0. 9 10 static inline int popcnt_u32(unsigned x) { return __builtin_popcount(x); } 11 12 vector<long long> subset_convolution(const vector<long long>& A, 13 const vector<long long>& B, 14 int n, 15 long long MOD) { 16 const int N = 1 << n; 17 vector<long long> F((n + 1) * N, 0), G((n + 1) * N, 0), H((n + 1) * N, 0); 18 for (int S = 0; S < N; ++S) { 19 int r = popcnt_u32((unsigned)S); 20 F[r * N + S] = (A[S] % MOD + MOD) % MOD; 21 G[r * N + S] = (B[S] % MOD + MOD) % MOD; 22 } 23 auto zeta = [&](vector<long long>& X){ 24 for (int b = 0; b < n; ++b) 25 for (int r = 0; r <= n; ++r) { 26 long long* base = X.data() + r * N; 27 for (int S = 0; S < N; ++S) if (S & (1 << b)) { 28 long long v = base[S - (1 << b)]; 29 long long& cur = base[S]; 30 cur += v; if (cur >= MOD) cur -= MOD; 31 } 32 } 33 }; 34 zeta(F); zeta(G); 35 36 for (int S = 0; S < N; ++S) 37 for (int r = 0; r <= n; ++r) { 38 __int128 acc = 0; 39 for (int p = 0; p <= r; ++p) acc += (__int128)F[p * N + S] * G[(r - p) * N + S]; 40 long long val = (long long)(acc % MOD); 41 if (val < 0) val += MOD; 42 H[r * N + S] = val; 43 } 44 45 auto mobius = [&](vector<long long>& X){ 46 for (int b = 0; b < n; ++b) 47 for (int r = 0; r <= n; ++r) { 48 long long* base = X.data() + r * N; 49 for (int S = 0; S < N; ++S) if (S & (1 << b)) { 50 long long& cur = base[S]; 51 long long v = base[S - (1 << b)]; 52 cur -= v; if (cur < 0) cur += MOD; 53 } 54 } 55 }; 56 mobius(H); 57 58 vector<long long> C(N); 59 for (int S = 0; S < N; ++S) C[S] = H[popcnt_u32((unsigned)S) * N + S]; 60 return C; 61 } 62 63 int main() { 64 ios::sync_with_stdio(false); 65 cin.tie(nullptr); 66 67 int n = 3; // ground set size 68 long long MOD = 1'000'000'007LL; 69 int N = 1 << n; 70 71 // Define w[T] = 1 for nonempty T, 0 for empty T 72 vector<long long> w(N, 0); 73 for (int S = 1; S < N; ++S) w[S] = 1; 74 75 // F_0: only empty set has one way (choose no blocks) 76 vector<long long> F_prev(N, 0); F_prev[0] = 1; 77 78 // Compute F_k for k=1..n via repeated subset convolution 79 vector<long long> total(N, 0); // sum over k of F_k (ordered partitions count when w=1) 80 for (int k = 1; k <= n; ++k) { 81 vector<long long> Fk = subset_convolution(w, F_prev, n, MOD); 82 // accumulate 83 for (int S = 0; S < N; ++S) { 84 total[S] += Fk[S]; 85 if (total[S] >= MOD) total[S] -= MOD; 86 } 87 F_prev.swap(Fk); 88 } 89 90 // For S with |S| = m, total[S] equals the number of ordered partitions of an m-set. 91 // For m=3, this is 13. 92 93 for (int S = 0; S < N; ++S) { 94 cout << "S=" << S << " popcount=" << __builtin_popcount((unsigned)S) 95 << " ordered_partitions= " << total[S] % MOD << '\n'; 96 } 97 98 return 0; 99 } 100
This demonstrates how subset convolution naturally powers a partition DP. Setting w[T] = 1 for nonempty T, F_k = w (*) F_{k-1} counts ordered partitions into exactly k blocks; summing over k yields the ordered Bell numbers per subset size. The same pattern extends to weighted blocks (use any w[T]) and to bounded k.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Flat, cache-friendly buffers and precomputed popcount to reduce overhead. 5 // Same asymptotics, better constants in practice. 6 7 struct SubsetConvolver { 8 int n, N; 9 long long MOD; 10 vector<int> pc; // popcount for 0..N-1 11 vector<long long> F, G, H; // (n+1) * N each 12 13 SubsetConvolver(int n_, long long MOD_) : n(n_), N(1<<n_), MOD(MOD_) { 14 pc.resize(N); 15 for (int i = 0; i < N; ++i) pc[i] = __builtin_popcount((unsigned)i); 16 F.assign((n+1)*N, 0); 17 G.assign((n+1)*N, 0); 18 H.assign((n+1)*N, 0); 19 } 20 21 inline long long& at(vector<long long>& X, int r, int S) { return X[r * N + S]; } 22 23 void zeta(vector<long long>& X) { 24 for (int b = 0; b < n; ++b) { 25 const int step = 1 << b; 26 for (int r = 0; r <= n; ++r) { 27 long long* base = X.data() + r * N; 28 for (int S = step; S < N; ++S) { 29 if (S & step) { 30 long long add = base[S - step]; 31 long long &cur = base[S]; 32 cur += add; if (cur >= MOD) cur -= MOD; 33 } 34 } 35 } 36 } 37 } 38 39 void mobius(vector<long long>& X) { 40 for (int b = 0; b < n; ++b) { 41 const int step = 1 << b; 42 for (int r = 0; r <= n; ++r) { 43 long long* base = X.data() + r * N; 44 for (int S = step; S < N; ++S) { 45 if (S & step) { 46 long long &cur = base[S]; 47 long long sub = base[S - step]; 48 cur -= sub; if (cur < 0) cur += MOD; 49 } 50 } 51 } 52 } 53 } 54 55 vector<long long> convolve(const vector<long long>& A, const vector<long long>& B) { 56 fill(F.begin(), F.end(), 0); 57 fill(G.begin(), G.end(), 0); 58 fill(H.begin(), H.end(), 0); 59 60 for (int S = 0; S < N; ++S) { 61 int r = pc[S]; 62 at(F, r, S) = (A[S] % MOD + MOD) % MOD; 63 at(G, r, S) = (B[S] % MOD + MOD) % MOD; 64 } 65 66 zeta(F); zeta(G); 67 68 // vectorized-like nested loops improving memory locality 69 for (int r = 0; r <= n; ++r) { 70 for (int p = 0; p <= r; ++p) { 71 long long* Hr = H.data() + r * N; 72 long long* Fp = F.data() + p * N; 73 long long* Gq = G.data() + (r - p) * N; 74 for (int S = 0; S < N; ++S) { 75 Hr[S] = (Hr[S] + (__int128)Fp[S] * Gq[S]) % MOD; 76 } 77 } 78 } 79 80 mobius(H); 81 82 vector<long long> C(N); 83 for (int S = 0; S < N; ++S) C[S] = at(H, pc[S], S); 84 return C; 85 } 86 }; 87 88 int main() { 89 ios::sync_with_stdio(false); 90 cin.tie(nullptr); 91 92 int n = 4; long long MOD = 998244353LL; // N up to 16 for quick demo 93 int N = 1 << n; 94 vector<long long> A(N), B(N); 95 96 // Example: random small values 97 mt19937_64 rng(712367); 98 uniform_int_distribution<long long> dist(0, MOD-1); 99 for (int i = 0; i < N; ++i) { A[i] = dist(rng); B[i] = dist(rng); } 100 101 SubsetConvolver conv(n, MOD); 102 vector<long long> C = conv.convolve(A, B); 103 104 // Print first few values 105 for (int i = 0; i < min(N, 16); ++i) { 106 cout << i << ": " << C[i] << '\n'; 107 } 108 109 return 0; 110 } 111
This is the same O(n^2 2^n) algorithm but written with a flat-memory layout and fused loops to improve cache locality and reduce loop overhead. This version is more likely to pass tighter time limits at n≈22.