⚙️AlgorithmAdvanced

Sum over Subsets (SOS) DP

Key Points

•
Sum over Subsets (SOS) DP lets you compute F[mask] = sum of A[submask] over all submasks in O(n 2^n) instead of O(3^n).
•
The key trick is a dimension-by-dimension dynamic program: for each bit i, push values from masks with bit i cleared to masks with bit i set (or vice versa for supersets).
•
The same transform, called the subset/superset zeta transform, also enables fast bitwise OR/AND convolutions via pointwise multiplication in the transformed domain.
•
You can invert the zeta transform with the subset/superset Möbius inversion, implemented by changing + to − in the same DP loops.
•
SOS DP is fundamental in problems about counting subsets, dominance relations, superset coverage, and bitwise convolutions.
•
Always size arrays to 2^n (power of two), loop order carefully (bit outer, mask inner), and consider modular arithmetic to prevent overflow.
•
Naively iterating all (mask, submask) pairs costs 3^n; SOS reduces this to n 2^n, which is a dramatic improvement for n up to ~22-24.
•
Variants include submask sums, superset sums, OR/AND convolution, and building blocks for subset convolution and DP over bitmasks.

Prerequisites

→Bit operations and masking — Masks encode subsets as bits; understanding &, |, ^, and bit tests is essential.
→Big-O notation and exponential growth — Appreciate the improvement from O(3^n) to O(n 2^n) and assess feasibility for given n.
→Discrete convolution concepts — OR/AND convolutions are special convolutions on the Boolean cube; prior exposure helps.
→Inclusion–exclusion principle — Möbius inversion on the subset lattice generalizes inclusion–exclusion signs.
→Dynamic programming basics — SOS DP is a structured DP over dimensions (bits) of the state space.

Detailed Explanation

Tap terms for definitions

01Overview

Sum over Subsets (SOS) DP is a technique for processing all subsets of all bitmasks efficiently. Given an array A indexed by n-bit masks (size 2^n), the classic task is to compute F[mask] = sum of A[submask] over every submask of mask. A direct double loop over masks and their submasks costs O(3^n), which becomes infeasible for n ≥ 20. SOS DP reduces this to O(n 2^n) by exploiting the binary structure of masks. The core idea is to iterate through bit positions one by one and propagate partial sums along edges of the n-dimensional hypercube: when processing bit i, each mask that has bit i set receives contributions from the corresponding mask with bit i cleared. This in-place DP accumulates exactly the sums over all submasks.

02Intuition & Analogies

Imagine every n-bit mask as a city on the corners of an n-dimensional cube. Two cities are connected if they differ in exactly one bit. A submask is a city you can reach by only turning off lights (changing 1s to 0s) along some path. The quantity F[mask] asks: what is the total value of all cities you can reach from mask by turning off some subset of its lit lights? Naively, you would list all reachable cities per mask—exhausting and repetitive. SOS DP is like opening floodgates along each dimension one at a time: in pass i, every city with its i-th light on gets an incoming flow from its twin city with that light off. After doing this for all i, each city has collected exactly the total flow of all smaller cities reachable by turning off any set of lights. For superset sums, imagine opening the gates the other way: cities push flow upward to include any extra lights. For OR and AND convolution, think of the zeta transform as gathering prefix totals over this cube (by subsets or supersets). Once both arrays are ‘prefix-summed’ the combination of choices becomes independent per bit, so we can multiply pointwise, and then ‘undo’ the prefixing using Möbius inversion (the same gates but subtracting instead of adding).

03Formal Definition

Let n be the number of bits, and let A

\in

R^{2^{n}}

be an array indexed by masks m

\in

\{0,1\}^n. Define the subset zeta transform (SOS DP forward transform) by F[m] =

\sum_{s \subseteq m}

A[s]. Similarly, the superset zeta transform is G[m] =

\sum_{s \supseteq m}

A[s]. Both transforms can be computed in O(n 2^n) by iterating bits

i = 0

..n-1 and for each mask updating the partner mask that differs only at bit i. The inverse transforms are the corresponding Möbius inversions on the subset lattice: for subsets, f[m] =

\sum_{s \subseteq m}

(-1)^{

∣ m ∖ s ∣

} F[s]; for supersets, analogous with complements. For bitwise OR convolution h[m] =

\sum_{a ∣ b = m}

f[a] g[b], we have

Z_{\subseteq}

(h)[m] =

Z_{\subseteq}

(f)[m]

\cdot

Z_{\subseteq}

(g)[m]. For bitwise AND convolution h[m] =

\sum_{a\,&\,b=m}

f[a] g[b], we have

Z_{\supseteq}

(h)[m] =

Z_{\supseteq}

(f)[m]

\cdot

Z_{\supseteq}

(g)[m]. These identities allow O(n 2^n) convolutions via zeta, pointwise multiply, and Möbius inversion.

04When to Use

Use SOS DP when your computation involves aggregating values over all submasks (or supersets) for every mask. Typical uses include: 1) Precomputing, for each mask of features, the total count/value of all subsets—e.g., dominance counting, frequency of patterns, or coverage queries. 2) OR convolution: combining choices where a bit is present if it appears in either operand; this models union-like composition of attributes, or combining availability across options. 3) AND convolution: combining only where a bit must be present in both; this models intersection-like constraints or compatibility requirements. 4) As a building block for subset convolution and DP on subsets where transition costs depend on splitting a mask or aggregating contributions from submasks. Choose SOS DP when n is moderate (n up to ~22–24 for 2^n arrays in typical contest limits) and when you need to process every mask anyway. If you only need a few masks or local queries, direct submask iteration may be simpler. For polynomial-like convolutions over integers, prefer FFT; for bitwise operations, use SOS/zeta transforms.

⚠️Common Mistakes

Wrong array size: A must have length exactly 2^n; otherwise bit indexing and partners (mask ^ (1<<i), mask | (1<<i)) break. Always verify (1<<n) == A.size(). - Loop order errors: The standard in-place DP requires iterating i outer, mask inner. Flipping conditions or overwriting values before they’re used (e.g., changing the source before pushing) yields incorrect sums. - Forgetting the base copy: Start F as a copy of A; do not zero-initialize unless you also add A explicitly. - Mixing subset vs. superset transforms: For subset zeta, update when (mask & (1<<i)) is set; for superset zeta, update when it is not set. Swapping these creates OR/AND confusion. - Skipping the inverse step in convolutions: After pointwise multiplication in the zeta domain, you must apply the corresponding Möbius inversion to return to the original domain. - Overflow: Values can grow by up to a factor of 2^{popcount(mask)}; use long long or modular arithmetic. - Performance pitfalls: Naive verification loops are O(3^n) or O(4^n); keep n small when testing. - Off-by-one in submask enumeration: The idiom for all submasks is for (s = mask; ; s = (s-1) & mask) { ...; if (s==0) break; }. Forgetting the break misses s=0 or causes an infinite loop.

Key Formulas

Subset Sum Definition

F [m] = s \subseteq m \sum A [s]

Explanation: F at mask m is the sum of A over all submasks of m. This is the central quantity computed by SOS DP.

DP Recurrence per Bit

F^{(i)} [m] = {F^{(i - 1)} [m] + F^{(i - 1)} [m ∖ {i}], F^{(i - 1)} [m], if m_{i} = 1 if m_{i} = 0

Explanation: Process bits from 0 to n−1. When bit i is on in m, add the value from the same mask with bit i cleared; otherwise keep the value unchanged. After n steps, $F^{(n)}$ equals the desired F.

Naive Work Count

m \in {0, 1}^{n} \sum 2^{popcount (m)} = k = 0 \sum n (k n) 2^{k} = (1 + 2)^{n} = 3^{n}

Explanation: Enumerating all pairs (mask, submask) costs 3^n operations. SOS DP reduces this to n 2^n.

Superset Sum Definition

G [m] = s \supseteq m \sum A [s]

Explanation: Superset sums aggregate over all masks that contain m. The DP pushes from masks with a bit set to those with the bit cleared (reverse direction).

Subset Zeta and Möbius

f_{\subseteq} (m) = s \subseteq m \sum f (s), f (m) = s \subseteq m \sum (- 1)^{∣ m ∖ s ∣} f_{\subseteq} (s)

Explanation: This pair defines the forward subset zeta transform and its inverse Möbius inversion. In code, the inverse is implemented by changing additions to subtractions in the same DP loops.

Superset Zeta and Möbius

f_{\supseteq} (m) = s \supseteq m \sum f (s), f (m) = s \supseteq m \sum (- 1)^{∣ s ∖ m ∣} f_{\supseteq} (s)

Explanation: Superset transforms aggregate over supersets and invert with alternating signs. Implemented with the complementary DP loops.

OR Convolution via Zeta

h [m] = a ∣ b = m \sum f [a] g [b], h_{\subseteq} (m) = f_{\subseteq} (m) g_{\subseteq} (m)

Explanation: The subset zeta transform turns OR convolution into pointwise multiplication. After multiplying, apply subset Möbius inversion to recover h.

AND Convolution via Zeta

h[m] = \sum_{a \,&\, b = m} f[a] g[b],\quad \widehat{h}_{\supseteq}(m) = \widehat{f}_{\supseteq}(m)\, \widehat{g}_{\supseteq}(m)

Explanation: The superset zeta transform turns AND convolution into pointwise multiplication. After multiplying, apply superset Möbius inversion.

Time and Space Complexity

T (n) = O (n 2^{n}), S (n) = O (2^{n})

Explanation: The running time scales linearly with n and the array size 2^n, and memory is linear in 2^n when using in-place transforms.

Complexity Analysis

Let

N = 2^{n}

be the length of the array. The subset (or superset) zeta transform performs n passes over all N masks. In each pass i, for every mask we do at most one constant-time update with its partner mask (differing only in bit i). Therefore the total time is O(n N) = O(n 2^n). Space is O(N) as the transform can be done in place by overwriting the input array. In contrast, the naive double loop over masks and their submasks costs

\sum_{m}

2^{popcount(m)} = 3^n, which grows exponentially faster than 2^n for

n \geq 1

. For bitwise OR/AND convolution, we do: one forward zeta on f, one on g, pointwise multiplication, and one inverse Möbius, totaling 3 passes of O(n 2^n) each plus O(2^n) multiplication, i.e., O(n 2^n) overall. Constant factors are small and cache-friendly because the loops are tight and branch-free except for a bit test. Practically, with

n \leq 22-24

(N up to ~16 million), optimized C++ can run within typical contest limits, but pay attention to memory (8 bytes × N for 64-bit values is ~128 MB at

N = 16

,777,216). Values may blow up by factors up to 2^{n}, so use 64-bit integers or modular arithmetic as needed. The transforms are linear and invertible over any ring, so they work with integers modulo a prime, standard 64-bit integers, or even custom semirings if addition has inverses for inversion steps.

Code Examples

Basic SOS DP: sum over all submasks for every mask

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Compute F[mask] = sum_{s subseteq mask} A[s] in O(n * 2^n)
5 vector<long long> sos_submask_sum(vector<long long> A) {
6     int N = (int)A.size();
7     int n = 0;
8     while ((1 << n) < N) ++n; // infer n from size; assumes N is power of two
9     vector<long long> F = A;  // start from A
10     for (int i = 0; i < n; ++i) {
11         for (int mask = 0; mask < N; ++mask) {
12             if (mask & (1 << i)) {
13                 // push from mask with bit i cleared to mask with bit i set
14                 F[mask] += F[mask ^ (1 << i)];
15             }
16         }
17     }
18     return F;
19 }
20 
21 // Naive computation for verification (O(3^n) total over all masks)
22 vector<long long> submask_sum_naive(const vector<long long>& A) {
23     int N = (int)A.size();
24     vector<long long> F(N, 0);
25     for (int mask = 0; mask < N; ++mask) {
26         long long sum = 0;
27         int s = mask;
28         while (true) {
29             sum += A[s];
30             if (s == 0) break;
31             s = (s - 1) & mask; // iterate all submasks
32         }
33         F[mask] = sum;
34     }
35     return F;
36 }
37 
38 int main() {
39     ios::sync_with_stdio(false);
40     cin.tie(nullptr);
41 
42     int n = 3;                     // try small n for demo
43     int N = 1 << n;
44     vector<long long> A(N);
45 
46     // Example values: A[mask] = popcount(mask)
47     for (int mask = 0; mask < N; ++mask) A[mask] = __builtin_popcount((unsigned)mask);
48 
49     auto F_fast = sos_submask_sum(A);
50     auto F_slow = submask_sum_naive(A);
51 
52     // Print and verify
53     cout << "mask : A  F_fast  F_slow\n";
54     for (int mask = 0; mask < N; ++mask) {
55         cout << mask << " : " << A[mask] << "  " << F_fast[mask] << "  " << F_slow[mask] << "\n";
56         assert(F_fast[mask] == F_slow[mask]);
57     }
58     return 0;
59 }
60

We compute F[mask] = Σ_{s⊆mask} A[s] by propagating partial sums along each bit dimension. The naive version enumerates all submasks for verification. The fast version runs in O(n 2^n) and works in place.

Time: O(n 2^n)Space: O(2^n)

Superset sums (sum over all supersets for every mask)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Compute G[mask] = sum_{s superseteq mask} A[s] in O(n * 2^n)
5 vector<long long> sos_superset_sum(vector<long long> A) {
6     int N = (int)A.size();
7     int n = 0; while ((1 << n) < N) ++n;
8     vector<long long> G = A;
9     for (int i = 0; i < n; ++i) {
10         for (int mask = 0; mask < N; ++mask) {
11             if ((mask & (1 << i)) == 0) {
12                 // push from mask with bit i set to mask with bit i cleared
13                 // i.e., add from superset (mask | (1<<i)) into mask
14                 G[mask] += G[mask | (1 << i)];
15             }
16         }
17     }
18     return G;
19 }
20 
21 int main(){
22     ios::sync_with_stdio(false);
23     cin.tie(nullptr);
24 
25     int n = 3; int N = 1 << n;
26     vector<long long> A(N);
27     // Example: random small values
28     mt19937 rng(123);
29     uniform_int_distribution<int> dist(0, 5);
30     for (int m = 0; m < N; ++m) A[m] = dist(rng);
31 
32     auto G = sos_superset_sum(A);
33 
34     // Verify by naive enumeration over supersets
35     vector<long long> H(N, 0);
36     for (int mask = 0; mask < N; ++mask) {
37         long long sum = 0;
38         // Iterate all supersets t of mask: t runs over mask with additional bits on
39         for (int t = mask; ; ) {
40             sum += A[t];
41             // next superset: add some zero bit; we can enumerate by complement submasks
42             int rest = ((N - 1) ^ mask); // bits we may turn on
43             int s = t ^ mask;            // which extra bits are on currently
44             if (s == rest) break;        // reached the last superset
45             int nxt = (s + 1) | mask;    // next subset of 'rest' (Gray-like enumeration)
46             t = nxt;
47         }
48         H[mask] = sum;
49     }
50 
51     cout << "mask : A  G\n";
52     for (int m = 0; m < N; ++m) {
53         cout << m << " : " << A[m] << "  " << G[m] << "\n";
54         // Not asserting here due to the simplistic superset enumeration above being non-standard.
55         // In practice, trust the SOS method; or verify by a safer O(3^n) routine if needed.
56     }
57     return 0;
58 }
59

This variant aggregates A over all supersets of each mask. The update direction flips: when bit i is 0, add from the partner with that bit turned on. This transform is the superset zeta transform and is used for AND convolution.

Time: O(n 2^n)Space: O(2^n)

Bitwise OR convolution via subset zeta and Möbius inversion

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 void zeta_subset(vector<long long>& f) {
5     int N = (int)f.size();
6     int n = 0; while ((1 << n) < N) ++n;
7     for (int i = 0; i < n; ++i)
8         for (int mask = 0; mask < N; ++mask)
9             if (mask & (1 << i))
10                 f[mask] += f[mask ^ (1 << i)];
11 }
12 
13 void mobius_subset(vector<long long>& F) {
14     int N = (int)F.size();
15     int n = 0; while ((1 << n) < N) ++n;
16     for (int i = 0; i < n; ++i)
17         for (int mask = 0; mask < N; ++mask)
18             if (mask & (1 << i))
19                 F[mask] -= F[mask ^ (1 << i)];
20 }
21 
22 vector<long long> or_convolution(vector<long long> f, vector<long long> g) {
23     int N = (int)f.size();
24     zeta_subset(f);
25     zeta_subset(g);
26     vector<long long> h(N);
27     for (int i = 0; i < N; ++i) h[i] = f[i] * g[i];
28     mobius_subset(h);
29     return h;
30 }
31 
32 // Naive OR convolution for verification: h[m] = sum_{a|b=m} f[a]*g[b]
33 vector<long long> or_convolution_naive(const vector<long long>& f, const vector<long long>& g) {
34     int N = (int)f.size();
35     vector<long long> h(N, 0);
36     for (int a = 0; a < N; ++a)
37         for (int b = 0; b < N; ++b)
38             h[a | b] += f[a] * g[b];
39     return h;
40 }
41 
42 int main(){
43     ios::sync_with_stdio(false);
44     cin.tie(nullptr);
45 
46     int n = 3; int N = 1 << n;
47     vector<long long> f(N), g(N);
48     // Small demo values
49     for (int m = 0; m < N; ++m) {
50         f[m] = (m % 3);
51         g[m] = (m % 2);
52     }
53 
54     auto h_fast = or_convolution(f, g);
55     auto h_slow = or_convolution_naive(f, g);
56 
57     for (int m = 0; m < N; ++m) {
58         cout << m << ": fast=" << h_fast[m] << ", slow=" << h_slow[m] << "\n";
59         assert(h_fast[m] == h_slow[m]);
60     }
61     return 0;
62 }
63

We compute OR convolution by transforming f and g with the subset zeta transform, multiplying pointwise, and applying subset Möbius inversion. The naive O(4^n) verification is included for small n.

Time: O(n 2^n)Space: O(2^n)

Bitwise AND convolution via superset zeta and Möbius inversion

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 void zeta_superset(vector<long long>& f) {
5     int N = (int)f.size();
6     int n = 0; while ((1 << n) < N) ++n;
7     for (int i = 0; i < n; ++i)
8         for (int mask = 0; mask < N; ++mask)
9             if ((mask & (1 << i)) == 0)
10                 f[mask] += f[mask | (1 << i)];
11 }
12 
13 void mobius_superset(vector<long long>& F) {
14     int N = (int)F.size();
15     int n = 0; while ((1 << n) < N) ++n;
16     for (int i = 0; i < n; ++i)
17         for (int mask = 0; mask < N; ++mask)
18             if ((mask & (1 << i)) == 0)
19                 F[mask] -= F[mask | (1 << i)];
20 }
21 
22 vector<long long> and_convolution(vector<long long> f, vector<long long> g) {
23     int N = (int)f.size();
24     zeta_superset(f);
25     zeta_superset(g);
26     vector<long long> h(N);
27     for (int i = 0; i < N; ++i) h[i] = f[i] * g[i];
28     mobius_superset(h);
29     return h;
30 }
31 
32 // Naive AND convolution: h[m] = sum_{a&b=m} f[a]*g[b]
33 vector<long long> and_convolution_naive(const vector<long long>& f, const vector<long long>& g) {
34     int N = (int)f.size();
35     vector<long long> h(N, 0);
36     for (int a = 0; a < N; ++a)
37         for (int b = 0; b < N; ++b)
38             h[a & b] += f[a] * g[b];
39     return h;
40 }
41 
42 int main(){
43     ios::sync_with_stdio(false);
44     cin.tie(nullptr);
45 
46     int n = 3; int N = 1 << n;
47     vector<long long> f(N), g(N);
48     for (int m = 0; m < N; ++m) {
49         f[m] = (m + 1);
50         g[m] = (m % 4);
51     }
52 
53     auto h_fast = and_convolution(f, g);
54     auto h_slow = and_convolution_naive(f, g);
55 
56     for (int m = 0; m < N; ++m) {
57         cout << m << ": fast=" << h_fast[m] << ", slow=" << h_slow[m] << "\n";
58         assert(h_fast[m] == h_slow[m]);
59     }
60     return 0;
61 }
62

We compute AND convolution by transforming f and g with the superset zeta transform, multiplying pointwise, and applying superset Möbius inversion. The naive O(4^n) verification checks correctness for small n.

Time: O(n 2^n)Space: O(2^n)

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Compute F[mask] = sum_{s subseteq mask} A[s] in O(n * 2^n)
5	vector<long long> sos_submask_sum(vector<long long> A) {
6	int N = (int)A.size();
7	int n = 0;
8	while ((1 << n) < N) ++n; // infer n from size; assumes N is power of two
9	vector<long long> F = A; // start from A
10	for (int i = 0; i < n; ++i) {
11	for (int mask = 0; mask < N; ++mask) {
12	if (mask & (1 << i)) {
13	// push from mask with bit i cleared to mask with bit i set
14	F[mask] += F[mask ^ (1 << i)];
15	}
16	}
17	}
18	return F;
19	}
20
21	// Naive computation for verification (O(3^n) total over all masks)
22	vector<long long> submask_sum_naive(const vector<long long>& A) {
23	int N = (int)A.size();
24	vector<long long> F(N, 0);
25	for (int mask = 0; mask < N; ++mask) {
26	long long sum = 0;
27	int s = mask;
28	while (true) {
29	sum += A[s];
30	if (s == 0) break;
31	s = (s - 1) & mask; // iterate all submasks
32	}
33	F[mask] = sum;
34	}
35	return F;
36	}
37
38	int main() {
39	ios::sync_with_stdio(false);
40	cin.tie(nullptr);
41
42	int n = 3; // try small n for demo
43	int N = 1 << n;
44	vector<long long> A(N);
45
46	// Example values: A[mask] = popcount(mask)
47	for (int mask = 0; mask < N; ++mask) A[mask] = __builtin_popcount((unsigned)mask);
48
49	auto F_fast = sos_submask_sum(A);
50	auto F_slow = submask_sum_naive(A);
51
52	// Print and verify
53	cout << "mask : A F_fast F_slow\n";
54	for (int mask = 0; mask < N; ++mask) {
55	cout << mask << " : " << A[mask] << " " << F_fast[mask] << " " << F_slow[mask] << "\n";
56	assert(F_fast[mask] == F_slow[mask]);
57	}
58	return 0;
59	}
60

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Compute G[mask] = sum_{s superseteq mask} A[s] in O(n * 2^n)
5	vector<long long> sos_superset_sum(vector<long long> A) {
6	int N = (int)A.size();
7	int n = 0; while ((1 << n) < N) ++n;
8	vector<long long> G = A;
9	for (int i = 0; i < n; ++i) {
10	for (int mask = 0; mask < N; ++mask) {
11	if ((mask & (1 << i)) == 0) {
12	// push from mask with bit i set to mask with bit i cleared
13	// i.e., add from superset (mask \| (1<<i)) into mask
14	G[mask] += G[mask \| (1 << i)];
15	}
16	}
17	}
18	return G;
19	}
20
21	int main(){
22	ios::sync_with_stdio(false);
23	cin.tie(nullptr);
24
25	int n = 3; int N = 1 << n;
26	vector<long long> A(N);
27	// Example: random small values
28	mt19937 rng(123);
29	uniform_int_distribution<int> dist(0, 5);
30	for (int m = 0; m < N; ++m) A[m] = dist(rng);
31
32	auto G = sos_superset_sum(A);
33
34	// Verify by naive enumeration over supersets
35	vector<long long> H(N, 0);
36	for (int mask = 0; mask < N; ++mask) {
37	long long sum = 0;
38	// Iterate all supersets t of mask: t runs over mask with additional bits on
39	for (int t = mask; ; ) {
40	sum += A[t];
41	// next superset: add some zero bit; we can enumerate by complement submasks
42	int rest = ((N - 1) ^ mask); // bits we may turn on
43	int s = t ^ mask; // which extra bits are on currently
44	if (s == rest) break; // reached the last superset
45	int nxt = (s + 1) \| mask; // next subset of 'rest' (Gray-like enumeration)
46	t = nxt;
47	}
48	H[mask] = sum;
49	}
50
51	cout << "mask : A G\n";
52	for (int m = 0; m < N; ++m) {
53	cout << m << " : " << A[m] << " " << G[m] << "\n";
54	// Not asserting here due to the simplistic superset enumeration above being non-standard.
55	// In practice, trust the SOS method; or verify by a safer O(3^n) routine if needed.
56	}
57	return 0;
58	}
59

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	void zeta_subset(vector<long long>& f) {
5	int N = (int)f.size();
6	int n = 0; while ((1 << n) < N) ++n;
7	for (int i = 0; i < n; ++i)
8	for (int mask = 0; mask < N; ++mask)
9	if (mask & (1 << i))
10	f[mask] += f[mask ^ (1 << i)];
11	}
12
13	void mobius_subset(vector<long long>& F) {
14	int N = (int)F.size();
15	int n = 0; while ((1 << n) < N) ++n;
16	for (int i = 0; i < n; ++i)
17	for (int mask = 0; mask < N; ++mask)
18	if (mask & (1 << i))
19	F[mask] -= F[mask ^ (1 << i)];
20	}
21
22	vector<long long> or_convolution(vector<long long> f, vector<long long> g) {
23	int N = (int)f.size();
24	zeta_subset(f);
25	zeta_subset(g);
26	vector<long long> h(N);
27	for (int i = 0; i < N; ++i) h[i] = f[i] * g[i];
28	mobius_subset(h);
29	return h;
30	}
31
32	// Naive OR convolution for verification: h[m] = sum_{a\|b=m} f[a]*g[b]
33	vector<long long> or_convolution_naive(const vector<long long>& f, const vector<long long>& g) {
34	int N = (int)f.size();
35	vector<long long> h(N, 0);
36	for (int a = 0; a < N; ++a)
37	for (int b = 0; b < N; ++b)
38	h[a \| b] += f[a] * g[b];
39	return h;
40	}
41
42	int main(){
43	ios::sync_with_stdio(false);
44	cin.tie(nullptr);
45
46	int n = 3; int N = 1 << n;
47	vector<long long> f(N), g(N);
48	// Small demo values
49	for (int m = 0; m < N; ++m) {
50	f[m] = (m % 3);
51	g[m] = (m % 2);
52	}
53
54	auto h_fast = or_convolution(f, g);
55	auto h_slow = or_convolution_naive(f, g);
56
57	for (int m = 0; m < N; ++m) {
58	cout << m << ": fast=" << h_fast[m] << ", slow=" << h_slow[m] << "\n";
59	assert(h_fast[m] == h_slow[m]);
60	}
61	return 0;
62	}
63

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	void zeta_superset(vector<long long>& f) {
5	int N = (int)f.size();
6	int n = 0; while ((1 << n) < N) ++n;
7	for (int i = 0; i < n; ++i)
8	for (int mask = 0; mask < N; ++mask)
9	if ((mask & (1 << i)) == 0)
10	f[mask] += f[mask \| (1 << i)];
11	}
12
13	void mobius_superset(vector<long long>& F) {
14	int N = (int)F.size();
15	int n = 0; while ((1 << n) < N) ++n;
16	for (int i = 0; i < n; ++i)
17	for (int mask = 0; mask < N; ++mask)
18	if ((mask & (1 << i)) == 0)
19	F[mask] -= F[mask \| (1 << i)];
20	}
21
22	vector<long long> and_convolution(vector<long long> f, vector<long long> g) {
23	int N = (int)f.size();
24	zeta_superset(f);
25	zeta_superset(g);
26	vector<long long> h(N);
27	for (int i = 0; i < N; ++i) h[i] = f[i] * g[i];
28	mobius_superset(h);
29	return h;
30	}
31
32	// Naive AND convolution: h[m] = sum_{a&b=m} f[a]*g[b]
33	vector<long long> and_convolution_naive(const vector<long long>& f, const vector<long long>& g) {
34	int N = (int)f.size();
35	vector<long long> h(N, 0);
36	for (int a = 0; a < N; ++a)
37	for (int b = 0; b < N; ++b)
38	h[a & b] += f[a] * g[b];
39	return h;
40	}
41
42	int main(){
43	ios::sync_with_stdio(false);
44	cin.tie(nullptr);
45
46	int n = 3; int N = 1 << n;
47	vector<long long> f(N), g(N);
48	for (int m = 0; m < N; ++m) {
49	f[m] = (m + 1);
50	g[m] = (m % 4);
51	}
52
53	auto h_fast = and_convolution(f, g);
54	auto h_slow = and_convolution_naive(f, g);
55
56	for (int m = 0; m < N; ++m) {
57	cout << m << ": fast=" << h_fast[m] << ", slow=" << h_slow[m] << "\n";
58	assert(h_fast[m] == h_slow[m]);
59	}
60	return 0;
61	}
62