Sum over Subsets (SOS) DP
Key Points
- •Sum over Subsets (SOS) DP lets you compute F[mask] = sum of A[submask] over all submasks in O(n 2^n) instead of O(3^n).
- •The key trick is a dimension-by-dimension dynamic program: for each bit i, push values from masks with bit i cleared to masks with bit i set (or vice versa for supersets).
- •The same transform, called the subset/superset zeta transform, also enables fast bitwise OR/AND convolutions via pointwise multiplication in the transformed domain.
- •You can invert the zeta transform with the subset/superset Möbius inversion, implemented by changing + to − in the same DP loops.
- •SOS DP is fundamental in problems about counting subsets, dominance relations, superset coverage, and bitwise convolutions.
- •Always size arrays to 2^n (power of two), loop order carefully (bit outer, mask inner), and consider modular arithmetic to prevent overflow.
- •Naively iterating all (mask, submask) pairs costs 3^n; SOS reduces this to n 2^n, which is a dramatic improvement for n up to ~22-24.
- •Variants include submask sums, superset sums, OR/AND convolution, and building blocks for subset convolution and DP over bitmasks.
Prerequisites
- →Bit operations and masking — Masks encode subsets as bits; understanding &, |, ^, and bit tests is essential.
- →Big-O notation and exponential growth — Appreciate the improvement from O(3^n) to O(n 2^n) and assess feasibility for given n.
- →Discrete convolution concepts — OR/AND convolutions are special convolutions on the Boolean cube; prior exposure helps.
- →Inclusion–exclusion principle — Möbius inversion on the subset lattice generalizes inclusion–exclusion signs.
- →Dynamic programming basics — SOS DP is a structured DP over dimensions (bits) of the state space.
Detailed Explanation
Tap terms for definitions01Overview
Sum over Subsets (SOS) DP is a technique for processing all subsets of all bitmasks efficiently. Given an array A indexed by n-bit masks (size 2^n), the classic task is to compute F[mask] = sum of A[submask] over every submask of mask. A direct double loop over masks and their submasks costs O(3^n), which becomes infeasible for n ≥ 20. SOS DP reduces this to O(n 2^n) by exploiting the binary structure of masks. The core idea is to iterate through bit positions one by one and propagate partial sums along edges of the n-dimensional hypercube: when processing bit i, each mask that has bit i set receives contributions from the corresponding mask with bit i cleared. This in-place DP accumulates exactly the sums over all submasks.
02Intuition & Analogies
Imagine every n-bit mask as a city on the corners of an n-dimensional cube. Two cities are connected if they differ in exactly one bit. A submask is a city you can reach by only turning off lights (changing 1s to 0s) along some path. The quantity F[mask] asks: what is the total value of all cities you can reach from mask by turning off some subset of its lit lights? Naively, you would list all reachable cities per mask—exhausting and repetitive. SOS DP is like opening floodgates along each dimension one at a time: in pass i, every city with its i-th light on gets an incoming flow from its twin city with that light off. After doing this for all i, each city has collected exactly the total flow of all smaller cities reachable by turning off any set of lights. For superset sums, imagine opening the gates the other way: cities push flow upward to include any extra lights. For OR and AND convolution, think of the zeta transform as gathering prefix totals over this cube (by subsets or supersets). Once both arrays are ‘prefix-summed’ the combination of choices becomes independent per bit, so we can multiply pointwise, and then ‘undo’ the prefixing using Möbius inversion (the same gates but subtracting instead of adding).
03Formal Definition
04When to Use
Use SOS DP when your computation involves aggregating values over all submasks (or supersets) for every mask. Typical uses include: 1) Precomputing, for each mask of features, the total count/value of all subsets—e.g., dominance counting, frequency of patterns, or coverage queries. 2) OR convolution: combining choices where a bit is present if it appears in either operand; this models union-like composition of attributes, or combining availability across options. 3) AND convolution: combining only where a bit must be present in both; this models intersection-like constraints or compatibility requirements. 4) As a building block for subset convolution and DP on subsets where transition costs depend on splitting a mask or aggregating contributions from submasks. Choose SOS DP when n is moderate (n up to ~22–24 for 2^n arrays in typical contest limits) and when you need to process every mask anyway. If you only need a few masks or local queries, direct submask iteration may be simpler. For polynomial-like convolutions over integers, prefer FFT; for bitwise operations, use SOS/zeta transforms.
⚠️Common Mistakes
- Wrong array size: A must have length exactly 2^n; otherwise bit indexing and partners (mask ^ (1<<i), mask | (1<<i)) break. Always verify (1<<n) == A.size(). - Loop order errors: The standard in-place DP requires iterating i outer, mask inner. Flipping conditions or overwriting values before they’re used (e.g., changing the source before pushing) yields incorrect sums. - Forgetting the base copy: Start F as a copy of A; do not zero-initialize unless you also add A explicitly. - Mixing subset vs. superset transforms: For subset zeta, update when (mask & (1<<i)) is set; for superset zeta, update when it is not set. Swapping these creates OR/AND confusion. - Skipping the inverse step in convolutions: After pointwise multiplication in the zeta domain, you must apply the corresponding Möbius inversion to return to the original domain. - Overflow: Values can grow by up to a factor of 2^{popcount(mask)}; use long long or modular arithmetic. - Performance pitfalls: Naive verification loops are O(3^n) or O(4^n); keep n small when testing. - Off-by-one in submask enumeration: The idiom for all submasks is for (s = mask; ; s = (s-1) & mask) { ...; if (s==0) break; }. Forgetting the break misses s=0 or causes an infinite loop.
Key Formulas
Subset Sum Definition
Explanation: F at mask m is the sum of A over all submasks of m. This is the central quantity computed by SOS DP.
DP Recurrence per Bit
Explanation: Process bits from 0 to n−1. When bit i is on in m, add the value from the same mask with bit i cleared; otherwise keep the value unchanged. After n steps, equals the desired F.
Naive Work Count
Explanation: Enumerating all pairs (mask, submask) costs 3^n operations. SOS DP reduces this to n 2^n.
Superset Sum Definition
Explanation: Superset sums aggregate over all masks that contain m. The DP pushes from masks with a bit set to those with the bit cleared (reverse direction).
Subset Zeta and Möbius
Explanation: This pair defines the forward subset zeta transform and its inverse Möbius inversion. In code, the inverse is implemented by changing additions to subtractions in the same DP loops.
Superset Zeta and Möbius
Explanation: Superset transforms aggregate over supersets and invert with alternating signs. Implemented with the complementary DP loops.
OR Convolution via Zeta
Explanation: The subset zeta transform turns OR convolution into pointwise multiplication. After multiplying, apply subset Möbius inversion to recover h.
AND Convolution via Zeta
Explanation: The superset zeta transform turns AND convolution into pointwise multiplication. After multiplying, apply superset Möbius inversion.
Time and Space Complexity
Explanation: The running time scales linearly with n and the array size 2^n, and memory is linear in 2^n when using in-place transforms.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Compute F[mask] = sum_{s subseteq mask} A[s] in O(n * 2^n) 5 vector<long long> sos_submask_sum(vector<long long> A) { 6 int N = (int)A.size(); 7 int n = 0; 8 while ((1 << n) < N) ++n; // infer n from size; assumes N is power of two 9 vector<long long> F = A; // start from A 10 for (int i = 0; i < n; ++i) { 11 for (int mask = 0; mask < N; ++mask) { 12 if (mask & (1 << i)) { 13 // push from mask with bit i cleared to mask with bit i set 14 F[mask] += F[mask ^ (1 << i)]; 15 } 16 } 17 } 18 return F; 19 } 20 21 // Naive computation for verification (O(3^n) total over all masks) 22 vector<long long> submask_sum_naive(const vector<long long>& A) { 23 int N = (int)A.size(); 24 vector<long long> F(N, 0); 25 for (int mask = 0; mask < N; ++mask) { 26 long long sum = 0; 27 int s = mask; 28 while (true) { 29 sum += A[s]; 30 if (s == 0) break; 31 s = (s - 1) & mask; // iterate all submasks 32 } 33 F[mask] = sum; 34 } 35 return F; 36 } 37 38 int main() { 39 ios::sync_with_stdio(false); 40 cin.tie(nullptr); 41 42 int n = 3; // try small n for demo 43 int N = 1 << n; 44 vector<long long> A(N); 45 46 // Example values: A[mask] = popcount(mask) 47 for (int mask = 0; mask < N; ++mask) A[mask] = __builtin_popcount((unsigned)mask); 48 49 auto F_fast = sos_submask_sum(A); 50 auto F_slow = submask_sum_naive(A); 51 52 // Print and verify 53 cout << "mask : A F_fast F_slow\n"; 54 for (int mask = 0; mask < N; ++mask) { 55 cout << mask << " : " << A[mask] << " " << F_fast[mask] << " " << F_slow[mask] << "\n"; 56 assert(F_fast[mask] == F_slow[mask]); 57 } 58 return 0; 59 } 60
We compute F[mask] = Σ_{s⊆mask} A[s] by propagating partial sums along each bit dimension. The naive version enumerates all submasks for verification. The fast version runs in O(n 2^n) and works in place.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Compute G[mask] = sum_{s superseteq mask} A[s] in O(n * 2^n) 5 vector<long long> sos_superset_sum(vector<long long> A) { 6 int N = (int)A.size(); 7 int n = 0; while ((1 << n) < N) ++n; 8 vector<long long> G = A; 9 for (int i = 0; i < n; ++i) { 10 for (int mask = 0; mask < N; ++mask) { 11 if ((mask & (1 << i)) == 0) { 12 // push from mask with bit i set to mask with bit i cleared 13 // i.e., add from superset (mask | (1<<i)) into mask 14 G[mask] += G[mask | (1 << i)]; 15 } 16 } 17 } 18 return G; 19 } 20 21 int main(){ 22 ios::sync_with_stdio(false); 23 cin.tie(nullptr); 24 25 int n = 3; int N = 1 << n; 26 vector<long long> A(N); 27 // Example: random small values 28 mt19937 rng(123); 29 uniform_int_distribution<int> dist(0, 5); 30 for (int m = 0; m < N; ++m) A[m] = dist(rng); 31 32 auto G = sos_superset_sum(A); 33 34 // Verify by naive enumeration over supersets 35 vector<long long> H(N, 0); 36 for (int mask = 0; mask < N; ++mask) { 37 long long sum = 0; 38 // Iterate all supersets t of mask: t runs over mask with additional bits on 39 for (int t = mask; ; ) { 40 sum += A[t]; 41 // next superset: add some zero bit; we can enumerate by complement submasks 42 int rest = ((N - 1) ^ mask); // bits we may turn on 43 int s = t ^ mask; // which extra bits are on currently 44 if (s == rest) break; // reached the last superset 45 int nxt = (s + 1) | mask; // next subset of 'rest' (Gray-like enumeration) 46 t = nxt; 47 } 48 H[mask] = sum; 49 } 50 51 cout << "mask : A G\n"; 52 for (int m = 0; m < N; ++m) { 53 cout << m << " : " << A[m] << " " << G[m] << "\n"; 54 // Not asserting here due to the simplistic superset enumeration above being non-standard. 55 // In practice, trust the SOS method; or verify by a safer O(3^n) routine if needed. 56 } 57 return 0; 58 } 59
This variant aggregates A over all supersets of each mask. The update direction flips: when bit i is 0, add from the partner with that bit turned on. This transform is the superset zeta transform and is used for AND convolution.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 void zeta_subset(vector<long long>& f) { 5 int N = (int)f.size(); 6 int n = 0; while ((1 << n) < N) ++n; 7 for (int i = 0; i < n; ++i) 8 for (int mask = 0; mask < N; ++mask) 9 if (mask & (1 << i)) 10 f[mask] += f[mask ^ (1 << i)]; 11 } 12 13 void mobius_subset(vector<long long>& F) { 14 int N = (int)F.size(); 15 int n = 0; while ((1 << n) < N) ++n; 16 for (int i = 0; i < n; ++i) 17 for (int mask = 0; mask < N; ++mask) 18 if (mask & (1 << i)) 19 F[mask] -= F[mask ^ (1 << i)]; 20 } 21 22 vector<long long> or_convolution(vector<long long> f, vector<long long> g) { 23 int N = (int)f.size(); 24 zeta_subset(f); 25 zeta_subset(g); 26 vector<long long> h(N); 27 for (int i = 0; i < N; ++i) h[i] = f[i] * g[i]; 28 mobius_subset(h); 29 return h; 30 } 31 32 // Naive OR convolution for verification: h[m] = sum_{a|b=m} f[a]*g[b] 33 vector<long long> or_convolution_naive(const vector<long long>& f, const vector<long long>& g) { 34 int N = (int)f.size(); 35 vector<long long> h(N, 0); 36 for (int a = 0; a < N; ++a) 37 for (int b = 0; b < N; ++b) 38 h[a | b] += f[a] * g[b]; 39 return h; 40 } 41 42 int main(){ 43 ios::sync_with_stdio(false); 44 cin.tie(nullptr); 45 46 int n = 3; int N = 1 << n; 47 vector<long long> f(N), g(N); 48 // Small demo values 49 for (int m = 0; m < N; ++m) { 50 f[m] = (m % 3); 51 g[m] = (m % 2); 52 } 53 54 auto h_fast = or_convolution(f, g); 55 auto h_slow = or_convolution_naive(f, g); 56 57 for (int m = 0; m < N; ++m) { 58 cout << m << ": fast=" << h_fast[m] << ", slow=" << h_slow[m] << "\n"; 59 assert(h_fast[m] == h_slow[m]); 60 } 61 return 0; 62 } 63
We compute OR convolution by transforming f and g with the subset zeta transform, multiplying pointwise, and applying subset Möbius inversion. The naive O(4^n) verification is included for small n.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 void zeta_superset(vector<long long>& f) { 5 int N = (int)f.size(); 6 int n = 0; while ((1 << n) < N) ++n; 7 for (int i = 0; i < n; ++i) 8 for (int mask = 0; mask < N; ++mask) 9 if ((mask & (1 << i)) == 0) 10 f[mask] += f[mask | (1 << i)]; 11 } 12 13 void mobius_superset(vector<long long>& F) { 14 int N = (int)F.size(); 15 int n = 0; while ((1 << n) < N) ++n; 16 for (int i = 0; i < n; ++i) 17 for (int mask = 0; mask < N; ++mask) 18 if ((mask & (1 << i)) == 0) 19 F[mask] -= F[mask | (1 << i)]; 20 } 21 22 vector<long long> and_convolution(vector<long long> f, vector<long long> g) { 23 int N = (int)f.size(); 24 zeta_superset(f); 25 zeta_superset(g); 26 vector<long long> h(N); 27 for (int i = 0; i < N; ++i) h[i] = f[i] * g[i]; 28 mobius_superset(h); 29 return h; 30 } 31 32 // Naive AND convolution: h[m] = sum_{a&b=m} f[a]*g[b] 33 vector<long long> and_convolution_naive(const vector<long long>& f, const vector<long long>& g) { 34 int N = (int)f.size(); 35 vector<long long> h(N, 0); 36 for (int a = 0; a < N; ++a) 37 for (int b = 0; b < N; ++b) 38 h[a & b] += f[a] * g[b]; 39 return h; 40 } 41 42 int main(){ 43 ios::sync_with_stdio(false); 44 cin.tie(nullptr); 45 46 int n = 3; int N = 1 << n; 47 vector<long long> f(N), g(N); 48 for (int m = 0; m < N; ++m) { 49 f[m] = (m + 1); 50 g[m] = (m % 4); 51 } 52 53 auto h_fast = and_convolution(f, g); 54 auto h_slow = and_convolution_naive(f, g); 55 56 for (int m = 0; m < N; ++m) { 57 cout << m << ": fast=" << h_fast[m] << ", slow=" << h_slow[m] << "\n"; 58 assert(h_fast[m] == h_slow[m]); 59 } 60 return 0; 61 } 62
We compute AND convolution by transforming f and g with the superset zeta transform, multiplying pointwise, and applying superset Möbius inversion. The naive O(4^n) verification checks correctness for small n.