Double Counting
Key Points
- •Double counting is the strategy of counting the same quantity in two different ways to derive an equality or an efficient algorithm.
- •It turns nested sums into reordered sums, often reducing O() loops to O(n) or O(n log n) by switching the counting perspective.
- •Classic examples include the Handshake Lemma and counting edges in bipartite graphs via the sum of degrees on each side.
- •In competitive programming, double counting appears as the “contribution technique,” where each element’s contribution to all structures (pairs, subarrays) is counted once.
- •Indicator variables plus linearity of expectation are a probabilistic form of double counting used to compute expected values cleanly.
- •When summing over pairs or triples, try reversing the order: count by items versus by values, or by left endpoints versus right endpoints.
- •Be careful about overcounting (e.g., i = j, ordered vs unordered pairs) and tie-breaking (strict vs non-strict inequalities).
- •Switching perspectives preserves correctness but can drastically change runtime and memory, enabling near-linear or harmonic-series time bounds.
Prerequisites
- →Big-O notation — To understand how switching perspectives affects runtime like O(n^2) to O(n log n).
- →Basic graph theory — Degree, edges, and bipartite graphs are classic double-counting grounds.
- →Prefix sums and counting arrays — Implement per-value counting and contribution sweeps efficiently.
- →Stacks (monotonic stacks) — Needed for contribution-based subarray computations in O(n).
- →Number theory basics — Divisibility, multiples, and harmonic bounds appear in value-based counting.
- →Discrete probability — Indicator variables and linearity of expectation are a probabilistic double-counting tool.
Detailed Explanation
Tap terms for definitions01Overview
Double counting is a problem-solving technique where we compute the same total quantity in two different ways and equate the results. In proofs, this gives identities like the Handshake Lemma: the sum of degrees in an undirected graph equals twice the number of edges. In algorithms, the same idea reduces complexity by switching viewpoints—for example, instead of “for each i, count j,” we do “for each j, count i,” or we count per-value rather than per-index. This often transforms explicit nested loops into structured loops over values, buckets, or boundaries. The method is especially powerful when a quantity is symmetric or can be indexed multiple ways (left/right partitions, endpoints vs elements, items vs categories). In competitive programming, the “contribution technique” is a double-counting mindset: sum the contribution of each element to the final answer rather than enumerating all composite structures. This yields efficient solutions for counting pairs, computing sums over all subarrays, and analyzing expected values.
02Intuition & Analogies
Imagine counting handshakes at a party. You could ask every person how many hands they shook and add it all up. Or you could simply count each handshake event directly. Both approaches must give the same total because they are different perspectives of the same thing. This is double counting: two viewpoints, one truth. Another analogy: calculating how many stickers are stuck in a sticker book. You can count per page (sum stickers on each page), or per sticker type (sum how many pages contain each type). The total stickers are the same either way, but one method may be faster depending on what information you have. In code, we mimic this by switching loops. If your original algorithm says “for each index i, inspect all j,” you might flip it to “for each value v, process all indices i that carry v.” If there are far fewer distinct values than indices, this re-indexing slashes runtime. For subarrays, rather than enumerating all O(n^2) subarrays and finding their minima, count for each element how many subarrays it is the minimum of, then multiply by its value. Each element is considered once, but its influence across many subarrays is captured through clever boundaries found with a stack. The heartbeat of double counting is: find a conserved total, choose two natural indexings, then pick the one that’s algorithmically cheaper.
03Formal Definition
04When to Use
Use double counting when: (1) You see nested loops counting pairs/triples with a symmetric property—try reversing the order or counting by categories (values, degrees, buckets). (2) There is a conserved quantity that can be indexed in two natural ways, e.g., edges counted by left vs right partition in a bipartite graph. (3) You need an O(n) or O(n \log n) solution for a problem that naively seems O(n^2), especially when data values are constrained (e.g., array values up to 10^6). (4) You compute expected values—indicator variables with linearity of expectation transform probabilistic counting into simple sums. (5) You are aggregating over all substructures (subarrays, subsequences) and can express the total as a sum of per-element contributions, with boundaries found via monotonic stacks or two-pointers. Typical use cases: counting pairs satisfying divisibility or equality, counting edges/degrees, summing subarray minima/maxima, number of inversions or pattern occurrences via contributions, and inclusion–exclusion derivations that treat overlaps carefully.
⚠️Common Mistakes
- Overcounting or undercounting: forgetting whether pairs are ordered or unordered, or whether i = j is allowed. Always fix the domain (i < j vs i \neq j vs ordered) before designing counts.\n- Tie-breaking errors: when using contribution methods (e.g., subarray minima), you must choose strict vs non-strict comparisons consistently on one side to avoid double counting equal values.\n- Ignoring constraints: switching to “count by value” only helps if the value domain is manageable; otherwise, you might exceed memory/time with a huge frequency array.\n- Boundary off-by-ones: in stack-based contributions, previous/next less elements need careful indexing; L_i and R_i must reflect distances correctly.\n- Integer overflow: contribution totals (e.g., sum over all subarrays) can exceed 32-bit range. Use 64-bit (long long) or big integers.\n- Misapplied expectation: indicator variables require clear event definitions and independence is not required for linearity of expectation—but you must define each indicator correctly and sum expectations, not products.\n- Falsely assuming equality: ensure the two perspectives truly count the same set with the same weights. A mismatch in what is being counted leads to incorrect equations.
Key Formulas
Handshake Lemma
Explanation: Each edge contributes 1 to the degree of its two endpoints, so the degree sum counts each edge twice. This identity often proves parity facts and enables quick edge counts.
Bipartite degree sum
Explanation: In a bipartite graph (L,R,E), every edge touches exactly one vertex in L and one in R. Counting edges by left or by right degrees yields the same total.
Order of summation (finite)
Explanation: For finite sums, switching the order of summation does not change the result. This is the algebraic backbone for reversing nested loops in algorithms.
Linearity via indicators
Explanation: If X = with as indicators of events, then E[X] is the sum of their expectations. This form of double counting avoids dependence issues.
Contribution technique
Explanation: Instead of enumerating all composite objects, sum how much each atomic element contributes. Correct tie-breaking ensures each object is counted once.
Harmonic multiple count
Explanation: The number of multiples across all v up to V is bounded by V times the harmonic number. It explains why looping over multiples often yields O(V log V).
Sum of subarray minima
Explanation: If is the number of choices for the left boundary and for the right boundary where is the minimum, then contributes to the total. Monotonic stacks compute and in O(n).
Inclusion–exclusion (two sets)
Explanation: Counting by sets A and B overcounts the intersection once; subtract it to fix the total. This is a simple double counting of membership.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 int main() { 5 ios::sync_with_stdio(false); 6 cin.tie(nullptr); 7 8 int n, m; // n = number of vertices, m = number of edges (undirected) 9 if (!(cin >> n >> m)) return 0; 10 vector<int> deg(n, 0); 11 12 // Read edges and build degree counts 13 for (int e = 0; e < m; ++e) { 14 int u, v; cin >> u >> v; // assume 0-indexed input 15 deg[u]++; 16 deg[v]++; 17 } 18 19 long long sum_deg = 0; 20 for (int d : deg) sum_deg += d; 21 22 // Handshake Lemma: sum of degrees equals 2 * number of edges 23 long long edges_from_degrees = sum_deg / 2; // integer division, sum_deg must be even 24 25 cout << "Sum of degrees: " << sum_deg << "\n"; 26 cout << "Edges (from input m): " << m << "\n"; 27 cout << "Edges (by double counting degrees/2): " << edges_from_degrees << "\n"; 28 29 // Note: This demonstrates counting every edge twice (once at each endpoint). 30 return 0; 31 } 32
We read an undirected graph, compute the degree of each vertex, and apply the Handshake Lemma. Each edge contributes 1 to two vertices’ degrees, so the total degree sum equals 2m. This is a direct double counting: counting edges via endpoints instead of by edges themselves.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Count ordered pairs (i, j) such that a[i] divides a[j]. 5 // Switch perspective: for each value v, pair it with all multiples k*v. 6 7 int main() { 8 ios::sync_with_stdio(false); 9 cin.tie(nullptr); 10 11 int n; cin >> n; 12 vector<int> a(n); 13 int maxA = 0; 14 for (int i = 0; i < n; ++i) { cin >> a[i]; maxA = max(maxA, a[i]); } 15 16 vector<long long> freq(maxA + 1, 0); 17 for (int x : a) freq[x]++; 18 19 long long ordered_pairs = 0; // counts (i, j), i and j can be equal if a[i] divides itself 20 21 for (int v = 1; v <= maxA; ++v) { 22 if (freq[v] == 0) continue; 23 // For all multiples m of v, each occurrence of v can pair with each occurrence of m 24 for (int m = v; m <= maxA; m += v) { 25 if (freq[m] == 0) continue; 26 ordered_pairs += freq[v] * freq[m]; 27 } 28 } 29 30 // If you want unordered pairs with i < j, remove self-pairings and divide by 2 appropriately. 31 // Example for unordered distinct indices: 32 // long long unordered_pairs = 0; 33 // for (int v = 1; v <= maxA; ++v) { 34 // if (!freq[v]) continue; 35 // // pairs (v, v): choose 2 36 // unordered_pairs += freq[v] * (freq[v] - 1) / 2; // since v divides v 37 // // pairs (v, m) with m > v 38 // for (int m = 2 * v; m <= maxA; m += v) { 39 // if (!freq[m]) continue; 40 // unordered_pairs += freq[v] * freq[m]; 41 // } 42 // } 43 44 cout << ordered_pairs << "\n"; 45 return 0; 46 } 47
Naively checking all n^2 pairs is slow. By double counting over values, we fix a value v and count all its matches among multiples m = k·v. The total work is proportional to the number of multiples across all v, which is O(V log V) where V is the maximum value. Adjust the counting for ordered/unordered pairs as needed.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Compute sum of minimums of all subarrays in O(n) using contribution counting. 5 // Each a[i] contributes a[i] * L[i] * R[i], where: 6 // L[i] = number of choices for left boundary so that a[i] is the first strictly smaller from the left 7 // R[i] = number of choices for right boundary so that a[i] is the first less-or-equal from the right 8 // Tie-breaking (< on left, <= on right) ensures each subarray has a unique owning minimum. 9 10 int main() { 11 ios::sync_with_stdio(false); 12 cin.tie(nullptr); 13 14 int n; cin >> n; 15 vector<long long> a(n); 16 for (int i = 0; i < n; ++i) cin >> a[i]; 17 18 vector<long long> L(n), R(n); 19 20 // Previous strictly less element distances (L) 21 { 22 vector<int> st; // increasing stack of indices 23 for (int i = 0; i < n; ++i) { 24 while (!st.empty() && a[st.back()] > a[i]) st.pop_back(); 25 int prev_less = st.empty() ? -1 : st.back(); 26 L[i] = i - prev_less; // number of choices for left boundary 27 st.push_back(i); 28 } 29 } 30 31 // Next less-or-equal element distances (R) 32 { 33 vector<int> st; // increasing stack of indices 34 for (int i = n - 1; i >= 0; --i) { 35 while (!st.empty() && a[st.back()] >= a[i]) st.pop_back(); 36 int next_leq = st.empty() ? n : st.back(); 37 R[i] = next_leq - i; // number of choices for right boundary 38 st.push_back(i); 39 } 40 } 41 42 long long ans = 0; 43 for (int i = 0; i < n; ++i) { 44 ans += a[i] * L[i] * R[i]; 45 } 46 47 cout << ans << "\n"; 48 return 0; 49 } 50
Instead of enumerating all O(n^2) subarrays and finding their minima, we double count by attributing each subarray to its unique minimum. For each index i, L[i] and R[i] count how many left and right boundaries keep a[i] as the minimum, computed by monotonic stacks. Summing a[i] * L[i] * R[i] yields the total.