DSU on Tree (Sack)
Key Points
- •DSU on Tree (also called the Sack technique) answers many subtree queries in O(n n) by keeping data from the heavy child and temporarily re-adding light subtrees.
- •The trick is to process all light children with keep=false (clear them), process the heavy child with keep=true (keep it), then re-add every light child’s nodes to the kept structure.
- •Each node is added to the data structure only O( n) times because there are at most n light edges on any root-to-node path.
- •It’s ideal for queries like number of distinct colors, sum of colors with maximum frequency, or counts with frequency thresholds inside a node’s subtree.
- •You precompute subtree sizes and pick one heavy child (largest subtree) per node; all other children are light.
- •The core functions are: df to compute sizes and heavy child, addSubtree to update counts, and df(u,p,keep) to run the algorithm.
- •Unlike DSU (Disjoint Set Union/Union-Find), DSU on Tree is not about merging sets; it’s a DFS ordering trick with a shared ‘bag’ of counts.
- •For advanced queries like 'how many colors appear at least k times?', combine DSU on Tree with a Fenwick tree to maintain frequency-of-frequencies.
Prerequisites
- →Depth-First Search (DFS) — DSU on Tree relies on subtree traversals and parent-child relationships established by DFS.
- →Subtree sizes — You must compute s(u) to select the heavy child per node.
- →Frequency counting — Core operations add/remove nodes by updating counts, often in O(1) with arrays.
- →Coordinate compression — Maps arbitrary values (e.g., colors) to small indices for array-based frequencies.
- →Fenwick Tree (Binary Indexed Tree) — Optional but useful for answering frequency-threshold queries in O(log n).
- →Asymptotic analysis — Understanding why each node is added O(log n) times and the overall O(n log n) bound.
- →Tree basics (rooting, parent, children) — The algorithm distinguishes heavy vs. light children relative to a chosen root.
Detailed Explanation
Tap terms for definitions01Overview
DSU on Tree (also known as the Sack technique) is a method to answer many offline subtree queries on trees efficiently. The main idea is to traverse the tree in a specific order that lets us reuse computation for the largest (heavy) child of each node while temporarily discarding and later re-adding information from smaller (light) children. This careful ordering ensures that each vertex is incorporated into the working data structure only a logarithmic number of times. In practical terms, DSU on Tree is perfect when queries ask for properties about the nodes inside the subtree of a given vertex, such as the number of distinct labels, the most frequent label, or the count of labels meeting a frequency threshold. The technique avoids the overhead of merging large maps or sets repeatedly by always merging small structures into a big one. The process involves three phases per node: clear light children, keep heavy child, and re-add light subtrees. With precomputed subtree sizes to identify the heavy child, the total running time becomes O(n \log n) for many common statistics. This performance is competitive with alternatives like Mo’s algorithm on trees and is often easier to implement for subtree-only queries.
02Intuition & Analogies
Imagine you’re cleaning rooms in a huge house (the tree), where each room has drawers full of colored items (node values). For each room, you want to know something about all items in that room’s entire suite (its subtree)—for example, how many distinct colors are present. A naive approach is to empty all drawers in the suite and count from scratch for every room, which is slow. Instead, you notice each room has one giant walk-in closet (the heavy child) and several small drawers (light children). If you keep the giant closet’s items in your working bag and temporarily ignore the small drawers, you can later bring in the contents of the small drawers one by one, adding them to the already large bag that you kept. This is much faster than repeatedly combining many small bags together because you always merge small collections into a big one. Crucially, every time you follow a small drawer (light child), you move to a place that’s at most half the size of where you were, so you can’t keep doing that many times along any path—logarithmically many at most. That’s why each item is only moved a few times overall. This strategy turns a potentially quadratic chore into something closer to linear times a small factor. In code, we mimic this by: (1) computing which child is the giant closet (largest subtree), (2) discarding light children’s data right after processing them, (3) keeping the heavy child’s data, and (4) re-adding all light children’s nodes so the current node’s answer is computed from a big, maintained bag.
03Formal Definition
04When to Use
Use DSU on Tree when you have many offline subtree queries that ask for aggregate statistics over node attributes, and when these statistics can be maintained by incremental add/remove operations. Typical examples include: counting distinct attribute values in subtrees, computing the most frequent value (mode) and its sum across ties, counting how many values appear at least k times, or answering point-attribute queries like 'how many nodes of color c are in subtree(u)?'. DSU on Tree excels when merges are asymmetric: always merge small into big, effectively by processing the big (heavy) child in a way that preserves its data. It is particularly appropriate when: (1) the tree is static (no edge updates), (2) queries are offline and each query is attached to a node (subtree queries), and (3) your statistic can be updated by O(1) or O(\log n) add/remove steps. If you need path queries (u to v) or dynamic edge updates, consider other techniques such as Heavy-Light Decomposition or Mo’s algorithm on trees. If your statistic is not easily maintainable under incremental updates (e.g., requires global recomputation after each change), DSU on Tree may not help. However, many frequency-based or set-union-like answers work very well.
⚠️Common Mistakes
- Confusing DSU on Tree with Disjoint Set Union (Union-Find). Despite the name, DSU on Tree is not about union-find sets; it is a DFS ordering trick with a shared global counter structure.
- Forgetting to precompute subtree sizes and heavy children. Without them, you cannot distinguish which child to keep, and performance degrades significantly.
- Double-counting nodes when re-adding light subtrees. You must skip the heavy child’s nodes during re-add passes (often by marking the heavy child as 'big' so addSubtree avoids it) to prevent duplicating contributions.
- Not clearing state correctly when keep=false. When a subtree is processed with keep=false, you must remove all its nodes from the global structure; otherwise, future answers become contaminated.
- Mishandling advanced aggregations (e.g., mode with sum over ties). If you decrement frequencies during clear, you must also reset auxiliary trackers (like current maximum frequency and tie-sum) or maintain them symmetrically; otherwise, you’ll compute stale answers.
- Using unordered_map for tight constraints without reserving. Frequency structures should be arrays or well-reserved hash maps to avoid TLE.
- Stack overflows on deep trees. When n is large and the tree is a line, consider tail recursion limits or iterative DFS, or increase recursion limits.
- Memory blowups by storing per-node large maps. DSU on Tree works best with one global structure updated incrementally; avoid copying per-node containers.
Key Formulas
Subtree Size Recurrence
Explanation: The size of a node’s subtree equals one (the node itself) plus the sizes of its children’s subtrees. This is computed in a first DFS and is used to select heavy children.
Heavy Child Definition
Explanation: The heavy child of u is the child with the largest subtree size. Keeping the heavy child’s data minimizes the number of times nodes are re-added.
Light Edge Bound
Explanation: Every time you traverse a light edge, the next subtree has size at most half the previous one, so this can occur only logarithmically many times. Hence each vertex is re-added O(log n) times.
Total Add Operations
Explanation: Summing over all vertices, each contributes O(log n) additions to the data structure due to the light-edge bound. This yields an overall O(n log n) time for add work.
DSU on Tree Time Recurrence
Explanation: The running time splits into processing the heavy child (kept), processing light children (cleared), and re-adding their nodes. The additive term accounts for adding nodes in light subtrees.
Fenwick Tree Operations
Explanation: Fenwick trees maintain prefix sums with point updates in O(log n). They are useful to track how many values have a given frequency and to query counts with frequency at least k.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // DSU on Tree (Sack) to compute, for every node u, the number of distinct colors in subtree(u). 5 // Input format (example): 6 // n\n c1 c2 ... cn\n (n-1 edges)\n 7 static const int MAXN = 200000 + 5; 8 9 int n; 10 vector<int> g[MAXN]; 11 int color[MAXN]; // color[u] in [1..m] (compress if needed) 12 int sz[MAXN]; // subtree size 13 int heavy[MAXN]; // heavy child of u or -1 14 bool isBig[MAXN]; // marks heavy child during re-add phase 15 16 // Global counting structure for current 'bag' 17 vector<int> freq; // freq[color] 18 int distinctColors = 0; // number of colors with freq > 0 19 20 int ans[MAXN]; // answer for each node: #distinct in subtree 21 22 // First DFS: compute subtree sizes and heavy child 23 void dfs_sz(int u, int p) { 24 sz[u] = 1; 25 heavy[u] = -1; 26 int best = 0; 27 for (int v : g[u]) if (v != p) { 28 dfs_sz(v, u); 29 sz[u] += sz[v]; 30 if (sz[v] > best) { 31 best = sz[v]; 32 heavy[u] = v; 33 } 34 } 35 } 36 37 // Add/remove a single node's contribution 38 inline void addNode(int u, int delta) { 39 int c = color[u]; 40 if (delta == +1) { 41 if (freq[c] == 0) distinctColors++; 42 freq[c]++; 43 } else { 44 freq[c]--; 45 if (freq[c] == 0) distinctColors--; 46 } 47 } 48 49 // Add/remove contributions of an entire subtree, skipping nodes in big subtrees when re-adding 50 void addSubtree(int u, int p, int delta) { 51 addNode(u, delta); 52 for (int v : g[u]) if (v != p && !isBig[v]) { 53 addSubtree(v, u, delta); 54 } 55 } 56 57 // Main DSU on Tree DFS 58 void dfs_sack(int u, int p, bool keep) { 59 // 1) Process light children first (keep=false) 60 for (int v : g[u]) if (v != p && v != heavy[u]) { 61 dfs_sack(v, u, /*keep=*/false); 62 } 63 // 2) Process heavy child (keep=true) 64 if (heavy[u] != -1) { 65 dfs_sack(heavy[u], u, /*keep=*/true); 66 isBig[heavy[u]] = true; // mark so we don't clear it in addSubtree 67 } 68 69 // 3) Re-add all light children's nodes into the kept structure 70 for (int v : g[u]) if (v != p && v != heavy[u]) { 71 addSubtree(v, u, +1); 72 } 73 // 4) Add current node itself 74 addNode(u, +1); 75 76 // Now the bag contains exactly the nodes in subtree(u) 77 ans[u] = distinctColors; 78 79 // 5) Unmark heavy child 80 if (heavy[u] != -1) isBig[heavy[u]] = false; 81 82 // 6) If not keeping, remove this subtree's contributions entirely 83 if (!keep) { 84 addSubtree(u, p, -1); 85 } 86 } 87 88 int main() { 89 ios::sync_with_stdio(false); 90 cin.tie(nullptr); 91 92 if (!(cin >> n)) return 0; 93 vector<int> allColors(n+1); 94 for (int i = 1; i <= n; ++i) cin >> allColors[i]; 95 96 // Read edges 97 for (int i = 0; i < n-1; ++i) { 98 int u, v; cin >> u >> v; 99 g[u].push_back(v); 100 g[v].push_back(u); 101 } 102 103 // Coordinate-compress colors to [1..m] 104 vector<int> vals = allColors; 105 sort(vals.begin()+1, vals.end()); 106 vals.erase(unique(vals.begin()+1, vals.end()), vals.end()); 107 int m = (int)vals.size()-1; 108 for (int i = 1; i <= n; ++i) { 109 int c = allColors[i]; 110 int id = int(lower_bound(vals.begin()+1, vals.end(), c) - vals.begin()); 111 color[i] = id; // in [1..m] 112 } 113 114 // Prepare globals 115 freq.assign(m+1, 0); 116 dfs_sz(1, 0); 117 dfs_sack(1, 0, /*keep=*/true); 118 119 for (int i = 1; i <= n; ++i) cout << ans[i] << (i==n?'\n':' '); 120 if (n) cout << '\n'; 121 return 0; 122 } 123
This program computes, for every node u, the number of distinct colors in its subtree using DSU on Tree. We first compute subtree sizes and pick the heavy child per node. In dfs_sack, we process light children with keep=false (cleared), process the heavy child with keep=true (kept), re-add all light nodes, then add u itself. The global freq array counts color frequencies in the current 'bag'. The answer ans[u] equals the number of colors with nonzero frequency.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // DSU on Tree: for each node u, compute the sum of colors that achieve 5 // the maximum frequency in subtree(u). If multiple colors tie, sum all of them. 6 // This mirrors Codeforces 600E (Lomsat Gelral). 7 8 static const int MAXN = 200000 + 5; 9 10 int n; 11 vector<int> g[MAXN]; 12 int color_[MAXN]; 13 int sz[MAXN], heavy[MAXN]; 14 bool isBig[MAXN]; 15 16 vector<long long> freq; // freq[color] 17 long long curMax = 0; // current maximum frequency observed 18 long long sumMax = 0; // sum of colors with frequency == curMax 19 20 long long ans[MAXN]; 21 22 void dfs_sz(int u, int p) { 23 sz[u] = 1; heavy[u] = -1; int best = 0; 24 for (int v : g[u]) if (v != p) { 25 dfs_sz(v, u); 26 sz[u] += sz[v]; 27 if (sz[v] > best) best = sz[v], heavy[u] = v; 28 } 29 } 30 31 inline void addColor(int c, int delta) { 32 // We will maintain freq[c] only on +1; on -1 we won't maintain curMax/sumMax 33 if (delta == +1) { 34 long long oldf = freq[c]; 35 long long newf = ++freq[c]; 36 if (newf > curMax) { 37 curMax = newf; 38 sumMax = c; // reset sum to this color 39 } else if (newf == curMax) { 40 sumMax += c; // tie: include this color 41 } 42 } else { 43 // Clearing phase: we just decrement freq; after a full clear we will reset curMax and sumMax. 44 --freq[c]; 45 } 46 } 47 48 void addSubtree(int u, int p, int delta) { 49 addColor(color_[u], delta); 50 for (int v : g[u]) if (v != p && !isBig[v]) addSubtree(v, u, delta); 51 } 52 53 void dfs_sack(int u, int p, bool keep) { 54 for (int v : g[u]) if (v != p && v != heavy[u]) dfs_sack(v, u, false); 55 if (heavy[u] != -1) { dfs_sack(heavy[u], u, true); isBig[heavy[u]] = true; } 56 57 for (int v : g[u]) if (v != p && v != heavy[u]) addSubtree(v, u, +1); 58 addColor(color_[u], +1); 59 60 ans[u] = sumMax; 61 62 if (heavy[u] != -1) isBig[heavy[u]] = false; 63 if (!keep) { 64 addSubtree(u, p, -1); 65 // Fully cleared: reset trackers 66 curMax = 0; sumMax = 0; 67 } 68 } 69 70 int main() { 71 ios::sync_with_stdio(false); 72 cin.tie(nullptr); 73 74 cin >> n; 75 vector<int> raw(n+1); 76 for (int i = 1; i <= n; ++i) cin >> raw[i]; 77 for (int i = 0; i < n-1; ++i) { 78 int u, v; cin >> u >> v; g[u].push_back(v); g[v].push_back(u); 79 } 80 81 // Compress colors to [1..m] 82 vector<int> vals = raw; sort(vals.begin()+1, vals.end()); 83 vals.erase(unique(vals.begin()+1, vals.end()), vals.end()); 84 int m = (int)vals.size()-1; 85 for (int i = 1; i <= n; ++i) { 86 color_[i] = int(lower_bound(vals.begin()+1, vals.end(), raw[i]) - vals.begin()); 87 } 88 89 freq.assign(m+1, 0); 90 dfs_sz(1, 0); 91 dfs_sack(1, 0, true); 92 93 for (int i = 1; i <= n; ++i) cout << ans[i] << (i==n?'\n':' '); 94 if (n) cout << '\n'; 95 return 0; 96 } 97
This program answers: for each node u, among colors in subtree(u), what is the sum of colors that have the maximum frequency? The key difference from the distinct-count template is how we maintain the aggregate: we track curMax (current maximum frequency) and sumMax (sum over colors achieving curMax). We update these only on +1 additions; during the clearing phase (when keep=false), we simply decrement freq and reset curMax and sumMax after the subtree is completely cleared.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // DSU on Tree + Fenwick Tree over frequency-of-frequencies. 5 // For each node u, we have multiple queries (u, k): count how many distinct colors 6 // occur at least k times in subtree(u). 7 // We'll maintain: 8 // freq[color] = current frequency of that color in the active bag 9 // cntFreq[f] = how many colors have frequency exactly f 10 // And a Fenwick tree over cntFreq to answer >=k via suffix sums: sum_{f>=k} cntFreq[f]. 11 12 struct Fenwick { 13 int n; vector<int> bit; 14 Fenwick() {} 15 Fenwick(int n): n(n), bit(n+1, 0) {} 16 void add(int i, int delta){ for(; i<=n; i+= i&-i) bit[i]+=delta; } 17 int sumPrefix(int i){ int r=0; for(; i>0; i-= i&-i) r+=bit[i]; return r; } 18 int sumRange(int l, int r){ if (l>r) return 0; return sumPrefix(r)-sumPrefix(l-1); } 19 }; 20 21 static const int MAXN = 200000 + 5; 22 int n; 23 vector<int> g[MAXN]; 24 int color_[MAXN]; 25 int sz[MAXN], heavy[MAXN]; 26 bool isBig[MAXN]; 27 28 // Globals for counting 29 vector<int> freq; // per color 30 vector<int> cntFreq; // how many colors have exact frequency f 31 Fenwick BIT; // prefix sums over cntFreq 32 33 struct Query { int k, id; }; 34 vector<vector<Query>> queries; // queries[u] = list of (k,id) asked at node u 35 vector<long long> ans; // answers by id 36 37 void dfs_sz(int u, int p){ 38 sz[u]=1; heavy[u]=-1; int best=0; 39 for(int v: g[u]) if(v!=p){ dfs_sz(v,u); sz[u]+=sz[v]; if (sz[v]>best) best=sz[v], heavy[u]=v; } 40 } 41 42 inline void touchColor(int c, int delta){ 43 int f = freq[c]; 44 if (f>0) { // remove old frequency 45 cntFreq[f]--; BIT.add(f, -1); 46 } 47 f += delta; // new frequency 48 freq[c] = f; 49 if (f>0) { // add new frequency 50 cntFreq[f]++; BIT.add(f, +1); 51 } 52 } 53 54 void addSubtree(int u, int p, int delta){ 55 touchColor(color_[u], delta); 56 for(int v: g[u]) if(v!=p && !isBig[v]) addSubtree(v,u,delta); 57 } 58 59 void dfs_sack(int u, int p, bool keep){ 60 for(int v: g[u]) if(v!=p && v!=heavy[u]) dfs_sack(v,u,false); 61 if(heavy[u]!=-1){ dfs_sack(heavy[u],u,true); isBig[heavy[u]] = true; } 62 63 for(int v: g[u]) if(v!=p && v!=heavy[u]) addSubtree(v,u,+1); 64 touchColor(color_[u], +1); 65 66 // answer queries at u 67 for(const auto &q: queries[u]){ 68 int k = q.k; if (k <= 0) { ans[q.id] = BIT.sumRange(1, n); } 69 else if (k > n) { ans[q.id] = 0; } 70 else { ans[q.id] = BIT.sumRange(k, n); } 71 } 72 73 if(heavy[u]!=-1) isBig[heavy[u]] = false; 74 if(!keep){ 75 addSubtree(u, p, -1); 76 } 77 } 78 79 int main(){ 80 ios::sync_with_stdio(false); 81 cin.tie(nullptr); 82 83 int q; // number of queries 84 cin >> n >> q; 85 vector<int> raw(n+1); 86 for(int i=1;i<=n;++i) cin >> raw[i]; 87 for(int i=0;i<n-1;++i){ int u,v; cin >> u >> v; g[u].push_back(v); g[v].push_back(u); } 88 89 // compress colors 90 vector<int> vals = raw; sort(vals.begin()+1, vals.end()); vals.erase(unique(vals.begin()+1, vals.end()), vals.end()); 91 int m = (int)vals.size()-1; 92 for(int i=1;i<=n;++i){ color_[i] = int(lower_bound(vals.begin()+1, vals.end(), raw[i]) - vals.begin()); } 93 94 // read queries: each is (u, k) 95 queries.assign(n+1, {}); 96 ans.assign(q, 0); 97 for(int i=0;i<q;++i){ int u,k; cin >> u >> k; queries[u].push_back({k, i}); } 98 99 // init globals 100 freq.assign(m+1, 0); 101 cntFreq.assign(n+1, 0); 102 BIT = Fenwick(n); 103 104 dfs_sz(1,0); 105 dfs_sack(1,0,true); 106 107 for(int i=0;i<q;++i) cout << ans[i] << (i+1==q?'\n':' '); 108 if(q) cout << '\n'; 109 return 0; 110 } 111
We attach multiple queries (u, k) to nodes: 'How many distinct colors appear at least k times in subtree(u)?' During DSU on Tree, we maintain freq[color] and a frequency-of-frequencies array cntFreq. A Fenwick tree over cntFreq lets us answer suffix sums quickly: count of colors with freq ≥ k. The add/remove operations update both freq and cntFreq (and the BIT) in O(log n).