⚙️AlgorithmAdvanced

DSU on Tree (Sack)

Key Points

  • DSU on Tree (also called the Sack technique) answers many subtree queries in O(n n) by keeping data from the heavy child and temporarily re-adding light subtrees.
  • The trick is to process all light children with keep=false (clear them), process the heavy child with keep=true (keep it), then re-add every light child’s nodes to the kept structure.
  • Each node is added to the data structure only O( n) times because there are at most n light edges on any root-to-node path.
  • It’s ideal for queries like number of distinct colors, sum of colors with maximum frequency, or counts with frequency thresholds inside a node’s subtree.
  • You precompute subtree sizes and pick one heavy child (largest subtree) per node; all other children are light.
  • The core functions are: df to compute sizes and heavy child, addSubtree to update counts, and df(u,p,keep) to run the algorithm.
  • Unlike DSU (Disjoint Set Union/Union-Find), DSU on Tree is not about merging sets; it’s a DFS ordering trick with a shared ‘bag’ of counts.
  • For advanced queries like 'how many colors appear at least k times?', combine DSU on Tree with a Fenwick tree to maintain frequency-of-frequencies.

Prerequisites

  • Depth-First Search (DFS)DSU on Tree relies on subtree traversals and parent-child relationships established by DFS.
  • Subtree sizesYou must compute s(u) to select the heavy child per node.
  • Frequency countingCore operations add/remove nodes by updating counts, often in O(1) with arrays.
  • Coordinate compressionMaps arbitrary values (e.g., colors) to small indices for array-based frequencies.
  • Fenwick Tree (Binary Indexed Tree)Optional but useful for answering frequency-threshold queries in O(log n).
  • Asymptotic analysisUnderstanding why each node is added O(log n) times and the overall O(n log n) bound.
  • Tree basics (rooting, parent, children)The algorithm distinguishes heavy vs. light children relative to a chosen root.

Detailed Explanation

Tap terms for definitions

01Overview

DSU on Tree (also known as the Sack technique) is a method to answer many offline subtree queries on trees efficiently. The main idea is to traverse the tree in a specific order that lets us reuse computation for the largest (heavy) child of each node while temporarily discarding and later re-adding information from smaller (light) children. This careful ordering ensures that each vertex is incorporated into the working data structure only a logarithmic number of times. In practical terms, DSU on Tree is perfect when queries ask for properties about the nodes inside the subtree of a given vertex, such as the number of distinct labels, the most frequent label, or the count of labels meeting a frequency threshold. The technique avoids the overhead of merging large maps or sets repeatedly by always merging small structures into a big one. The process involves three phases per node: clear light children, keep heavy child, and re-add light subtrees. With precomputed subtree sizes to identify the heavy child, the total running time becomes O(n \log n) for many common statistics. This performance is competitive with alternatives like Mo’s algorithm on trees and is often easier to implement for subtree-only queries.

02Intuition & Analogies

Imagine you’re cleaning rooms in a huge house (the tree), where each room has drawers full of colored items (node values). For each room, you want to know something about all items in that room’s entire suite (its subtree)—for example, how many distinct colors are present. A naive approach is to empty all drawers in the suite and count from scratch for every room, which is slow. Instead, you notice each room has one giant walk-in closet (the heavy child) and several small drawers (light children). If you keep the giant closet’s items in your working bag and temporarily ignore the small drawers, you can later bring in the contents of the small drawers one by one, adding them to the already large bag that you kept. This is much faster than repeatedly combining many small bags together because you always merge small collections into a big one. Crucially, every time you follow a small drawer (light child), you move to a place that’s at most half the size of where you were, so you can’t keep doing that many times along any path—logarithmically many at most. That’s why each item is only moved a few times overall. This strategy turns a potentially quadratic chore into something closer to linear times a small factor. In code, we mimic this by: (1) computing which child is the giant closet (largest subtree), (2) discarding light children’s data right after processing them, (3) keeping the heavy child’s data, and (4) re-adding all light children’s nodes so the current node’s answer is computed from a big, maintained bag.

03Formal Definition

Given a rooted tree with n vertices, let s(u) be the size of the subtree of u. For each node u, define its heavy child h(u) = s(v); the remaining children are light. The DSU on Tree algorithm organizes a DFS such that for each u: (a) recursively process all light children with (their contributions are cleared after finishing them), (b) recursively process the heavy child with (we retain its contributions in a global data structure), and (c) re-add every node in each light child’s subtree into the retained data structure, and finally answer the query for u. This design ensures that every vertex is added to the global structure O( n) times because each time a vertex is re-added when stepping over a light edge, the subtree size at least halves. Thus the total number of add operations is O(n n). The data structure typically stores frequencies (counts) of some per-vertex attribute (e.g., colors). The operations are: add(x) to incorporate vertex x, remove(x) to erase it (used when ), and a query-specific aggregator (e.g., number of distinct values, maximum frequency, sum among those with maximum frequency). When combined carefully, the overall time is O(n n + Q f), where f is per-query overhead (often O(1) or O( n)).

04When to Use

Use DSU on Tree when you have many offline subtree queries that ask for aggregate statistics over node attributes, and when these statistics can be maintained by incremental add/remove operations. Typical examples include: counting distinct attribute values in subtrees, computing the most frequent value (mode) and its sum across ties, counting how many values appear at least k times, or answering point-attribute queries like 'how many nodes of color c are in subtree(u)?'. DSU on Tree excels when merges are asymmetric: always merge small into big, effectively by processing the big (heavy) child in a way that preserves its data. It is particularly appropriate when: (1) the tree is static (no edge updates), (2) queries are offline and each query is attached to a node (subtree queries), and (3) your statistic can be updated by O(1) or O(\log n) add/remove steps. If you need path queries (u to v) or dynamic edge updates, consider other techniques such as Heavy-Light Decomposition or Mo’s algorithm on trees. If your statistic is not easily maintainable under incremental updates (e.g., requires global recomputation after each change), DSU on Tree may not help. However, many frequency-based or set-union-like answers work very well.

⚠️Common Mistakes

  1. Confusing DSU on Tree with Disjoint Set Union (Union-Find). Despite the name, DSU on Tree is not about union-find sets; it is a DFS ordering trick with a shared global counter structure.
  2. Forgetting to precompute subtree sizes and heavy children. Without them, you cannot distinguish which child to keep, and performance degrades significantly.
  3. Double-counting nodes when re-adding light subtrees. You must skip the heavy child’s nodes during re-add passes (often by marking the heavy child as 'big' so addSubtree avoids it) to prevent duplicating contributions.
  4. Not clearing state correctly when keep=false. When a subtree is processed with keep=false, you must remove all its nodes from the global structure; otherwise, future answers become contaminated.
  5. Mishandling advanced aggregations (e.g., mode with sum over ties). If you decrement frequencies during clear, you must also reset auxiliary trackers (like current maximum frequency and tie-sum) or maintain them symmetrically; otherwise, you’ll compute stale answers.
  6. Using unordered_map for tight constraints without reserving. Frequency structures should be arrays or well-reserved hash maps to avoid TLE.
  7. Stack overflows on deep trees. When n is large and the tree is a line, consider tail recursion limits or iterative DFS, or increase recursion limits.
  8. Memory blowups by storing per-node large maps. DSU on Tree works best with one global structure updated incrementally; avoid copying per-node containers.

Key Formulas

Subtree Size Recurrence

Explanation: The size of a node’s subtree equals one (the node itself) plus the sizes of its children’s subtrees. This is computed in a first DFS and is used to select heavy children.

Heavy Child Definition

Explanation: The heavy child of u is the child with the largest subtree size. Keeping the heavy child’s data minimizes the number of times nodes are re-added.

Light Edge Bound

Explanation: Every time you traverse a light edge, the next subtree has size at most half the previous one, so this can occur only logarithmically many times. Hence each vertex is re-added O(log n) times.

Total Add Operations

Explanation: Summing over all vertices, each contributes O(log n) additions to the data structure due to the light-edge bound. This yields an overall O(n log n) time for add work.

DSU on Tree Time Recurrence

Explanation: The running time splits into processing the heavy child (kept), processing light children (cleared), and re-adding their nodes. The additive term accounts for adding nodes in light subtrees.

Fenwick Tree Operations

Explanation: Fenwick trees maintain prefix sums with point updates in O(log n). They are useful to track how many values have a given frequency and to query counts with frequency at least k.

Complexity Analysis

Let n be the number of nodes. We first compute subtree sizes s(u) and heavy children in O(n). During the main DSU on Tree DFS, for each node u we: (1) process all light children with (their contributions are discarded), (2) process the heavy child with (its contributions persist), and (3) re-add all nodes from light subtrees so that the current node’s answer uses a large, persistent structure. The key bound is that each node belongs to at most n light subtrees along any root-to-node path. Therefore, each node is re-added O( n) times overall. Summed across all nodes, the total number of add operations is O(n n). If each add/remove is O(1) (array-based frequency tracking), the traversal costs O(n n). If your per-add operation is O( n) (e.g., due to a Fenwick tree update for frequency-of-frequencies), the total becomes O(n n), while answers per node often remain O(1) or O( n). Space is dominated by the adjacency list O(n) and the global counting structures. For array-based frequencies with value compression to m distinct values, space is O(n + m). If you maintain auxiliary structures (e.g., cntFreq and a Fenwick tree of size up to n for frequencies), space becomes O(n + m). The recursion stack is O(h) where h is tree height; in worst-case chains, ). In practice, for problem constraints up to about 2e5, an iterative stack or increased recursion limits may be needed in languages with strict recursion bounds.

Code Examples

DSU on Tree template: number of distinct colors in each subtree
1#include <bits/stdc++.h>
2using namespace std;
3
4// DSU on Tree (Sack) to compute, for every node u, the number of distinct colors in subtree(u).
5// Input format (example):
6// n\n c1 c2 ... cn\n (n-1 edges)\n
7static const int MAXN = 200000 + 5;
8
9int n;
10vector<int> g[MAXN];
11int color[MAXN]; // color[u] in [1..m] (compress if needed)
12int sz[MAXN]; // subtree size
13int heavy[MAXN]; // heavy child of u or -1
14bool isBig[MAXN]; // marks heavy child during re-add phase
15
16// Global counting structure for current 'bag'
17vector<int> freq; // freq[color]
18int distinctColors = 0; // number of colors with freq > 0
19
20int ans[MAXN]; // answer for each node: #distinct in subtree
21
22// First DFS: compute subtree sizes and heavy child
23void dfs_sz(int u, int p) {
24 sz[u] = 1;
25 heavy[u] = -1;
26 int best = 0;
27 for (int v : g[u]) if (v != p) {
28 dfs_sz(v, u);
29 sz[u] += sz[v];
30 if (sz[v] > best) {
31 best = sz[v];
32 heavy[u] = v;
33 }
34 }
35}
36
37// Add/remove a single node's contribution
38inline void addNode(int u, int delta) {
39 int c = color[u];
40 if (delta == +1) {
41 if (freq[c] == 0) distinctColors++;
42 freq[c]++;
43 } else {
44 freq[c]--;
45 if (freq[c] == 0) distinctColors--;
46 }
47}
48
49// Add/remove contributions of an entire subtree, skipping nodes in big subtrees when re-adding
50void addSubtree(int u, int p, int delta) {
51 addNode(u, delta);
52 for (int v : g[u]) if (v != p && !isBig[v]) {
53 addSubtree(v, u, delta);
54 }
55}
56
57// Main DSU on Tree DFS
58void dfs_sack(int u, int p, bool keep) {
59 // 1) Process light children first (keep=false)
60 for (int v : g[u]) if (v != p && v != heavy[u]) {
61 dfs_sack(v, u, /*keep=*/false);
62 }
63 // 2) Process heavy child (keep=true)
64 if (heavy[u] != -1) {
65 dfs_sack(heavy[u], u, /*keep=*/true);
66 isBig[heavy[u]] = true; // mark so we don't clear it in addSubtree
67 }
68
69 // 3) Re-add all light children's nodes into the kept structure
70 for (int v : g[u]) if (v != p && v != heavy[u]) {
71 addSubtree(v, u, +1);
72 }
73 // 4) Add current node itself
74 addNode(u, +1);
75
76 // Now the bag contains exactly the nodes in subtree(u)
77 ans[u] = distinctColors;
78
79 // 5) Unmark heavy child
80 if (heavy[u] != -1) isBig[heavy[u]] = false;
81
82 // 6) If not keeping, remove this subtree's contributions entirely
83 if (!keep) {
84 addSubtree(u, p, -1);
85 }
86}
87
88int main() {
89 ios::sync_with_stdio(false);
90 cin.tie(nullptr);
91
92 if (!(cin >> n)) return 0;
93 vector<int> allColors(n+1);
94 for (int i = 1; i <= n; ++i) cin >> allColors[i];
95
96 // Read edges
97 for (int i = 0; i < n-1; ++i) {
98 int u, v; cin >> u >> v;
99 g[u].push_back(v);
100 g[v].push_back(u);
101 }
102
103 // Coordinate-compress colors to [1..m]
104 vector<int> vals = allColors;
105 sort(vals.begin()+1, vals.end());
106 vals.erase(unique(vals.begin()+1, vals.end()), vals.end());
107 int m = (int)vals.size()-1;
108 for (int i = 1; i <= n; ++i) {
109 int c = allColors[i];
110 int id = int(lower_bound(vals.begin()+1, vals.end(), c) - vals.begin());
111 color[i] = id; // in [1..m]
112 }
113
114 // Prepare globals
115 freq.assign(m+1, 0);
116 dfs_sz(1, 0);
117 dfs_sack(1, 0, /*keep=*/true);
118
119 for (int i = 1; i <= n; ++i) cout << ans[i] << (i==n?'\n':' ');
120 if (n) cout << '\n';
121 return 0;
122}
123

This program computes, for every node u, the number of distinct colors in its subtree using DSU on Tree. We first compute subtree sizes and pick the heavy child per node. In dfs_sack, we process light children with keep=false (cleared), process the heavy child with keep=true (kept), re-add all light nodes, then add u itself. The global freq array counts color frequencies in the current 'bag'. The answer ans[u] equals the number of colors with nonzero frequency.

Time: O(n log n)Space: O(n + m) where m is the number of distinct colors
Mode with tie-sum per subtree (CF 600E-style)
1#include <bits/stdc++.h>
2using namespace std;
3
4// DSU on Tree: for each node u, compute the sum of colors that achieve
5// the maximum frequency in subtree(u). If multiple colors tie, sum all of them.
6// This mirrors Codeforces 600E (Lomsat Gelral).
7
8static const int MAXN = 200000 + 5;
9
10int n;
11vector<int> g[MAXN];
12int color_[MAXN];
13int sz[MAXN], heavy[MAXN];
14bool isBig[MAXN];
15
16vector<long long> freq; // freq[color]
17long long curMax = 0; // current maximum frequency observed
18long long sumMax = 0; // sum of colors with frequency == curMax
19
20long long ans[MAXN];
21
22void dfs_sz(int u, int p) {
23 sz[u] = 1; heavy[u] = -1; int best = 0;
24 for (int v : g[u]) if (v != p) {
25 dfs_sz(v, u);
26 sz[u] += sz[v];
27 if (sz[v] > best) best = sz[v], heavy[u] = v;
28 }
29}
30
31inline void addColor(int c, int delta) {
32 // We will maintain freq[c] only on +1; on -1 we won't maintain curMax/sumMax
33 if (delta == +1) {
34 long long oldf = freq[c];
35 long long newf = ++freq[c];
36 if (newf > curMax) {
37 curMax = newf;
38 sumMax = c; // reset sum to this color
39 } else if (newf == curMax) {
40 sumMax += c; // tie: include this color
41 }
42 } else {
43 // Clearing phase: we just decrement freq; after a full clear we will reset curMax and sumMax.
44 --freq[c];
45 }
46}
47
48void addSubtree(int u, int p, int delta) {
49 addColor(color_[u], delta);
50 for (int v : g[u]) if (v != p && !isBig[v]) addSubtree(v, u, delta);
51}
52
53void dfs_sack(int u, int p, bool keep) {
54 for (int v : g[u]) if (v != p && v != heavy[u]) dfs_sack(v, u, false);
55 if (heavy[u] != -1) { dfs_sack(heavy[u], u, true); isBig[heavy[u]] = true; }
56
57 for (int v : g[u]) if (v != p && v != heavy[u]) addSubtree(v, u, +1);
58 addColor(color_[u], +1);
59
60 ans[u] = sumMax;
61
62 if (heavy[u] != -1) isBig[heavy[u]] = false;
63 if (!keep) {
64 addSubtree(u, p, -1);
65 // Fully cleared: reset trackers
66 curMax = 0; sumMax = 0;
67 }
68}
69
70int main() {
71 ios::sync_with_stdio(false);
72 cin.tie(nullptr);
73
74 cin >> n;
75 vector<int> raw(n+1);
76 for (int i = 1; i <= n; ++i) cin >> raw[i];
77 for (int i = 0; i < n-1; ++i) {
78 int u, v; cin >> u >> v; g[u].push_back(v); g[v].push_back(u);
79 }
80
81 // Compress colors to [1..m]
82 vector<int> vals = raw; sort(vals.begin()+1, vals.end());
83 vals.erase(unique(vals.begin()+1, vals.end()), vals.end());
84 int m = (int)vals.size()-1;
85 for (int i = 1; i <= n; ++i) {
86 color_[i] = int(lower_bound(vals.begin()+1, vals.end(), raw[i]) - vals.begin());
87 }
88
89 freq.assign(m+1, 0);
90 dfs_sz(1, 0);
91 dfs_sack(1, 0, true);
92
93 for (int i = 1; i <= n; ++i) cout << ans[i] << (i==n?'\n':' ');
94 if (n) cout << '\n';
95 return 0;
96}
97

This program answers: for each node u, among colors in subtree(u), what is the sum of colors that have the maximum frequency? The key difference from the distinct-count template is how we maintain the aggregate: we track curMax (current maximum frequency) and sumMax (sum over colors achieving curMax). We update these only on +1 additions; during the clearing phase (when keep=false), we simply decrement freq and reset curMax and sumMax after the subtree is completely cleared.

Time: O(n log n)Space: O(n + m)
Answer queries: How many colors appear at least k times in subtree(u)? (Fenwick variant)
1#include <bits/stdc++.h>
2using namespace std;
3
4// DSU on Tree + Fenwick Tree over frequency-of-frequencies.
5// For each node u, we have multiple queries (u, k): count how many distinct colors
6// occur at least k times in subtree(u).
7// We'll maintain:
8// freq[color] = current frequency of that color in the active bag
9// cntFreq[f] = how many colors have frequency exactly f
10// And a Fenwick tree over cntFreq to answer >=k via suffix sums: sum_{f>=k} cntFreq[f].
11
12struct Fenwick {
13 int n; vector<int> bit;
14 Fenwick() {}
15 Fenwick(int n): n(n), bit(n+1, 0) {}
16 void add(int i, int delta){ for(; i<=n; i+= i&-i) bit[i]+=delta; }
17 int sumPrefix(int i){ int r=0; for(; i>0; i-= i&-i) r+=bit[i]; return r; }
18 int sumRange(int l, int r){ if (l>r) return 0; return sumPrefix(r)-sumPrefix(l-1); }
19};
20
21static const int MAXN = 200000 + 5;
22int n;
23vector<int> g[MAXN];
24int color_[MAXN];
25int sz[MAXN], heavy[MAXN];
26bool isBig[MAXN];
27
28// Globals for counting
29vector<int> freq; // per color
30vector<int> cntFreq; // how many colors have exact frequency f
31Fenwick BIT; // prefix sums over cntFreq
32
33struct Query { int k, id; };
34vector<vector<Query>> queries; // queries[u] = list of (k,id) asked at node u
35vector<long long> ans; // answers by id
36
37void dfs_sz(int u, int p){
38 sz[u]=1; heavy[u]=-1; int best=0;
39 for(int v: g[u]) if(v!=p){ dfs_sz(v,u); sz[u]+=sz[v]; if (sz[v]>best) best=sz[v], heavy[u]=v; }
40}
41
42inline void touchColor(int c, int delta){
43 int f = freq[c];
44 if (f>0) { // remove old frequency
45 cntFreq[f]--; BIT.add(f, -1);
46 }
47 f += delta; // new frequency
48 freq[c] = f;
49 if (f>0) { // add new frequency
50 cntFreq[f]++; BIT.add(f, +1);
51 }
52}
53
54void addSubtree(int u, int p, int delta){
55 touchColor(color_[u], delta);
56 for(int v: g[u]) if(v!=p && !isBig[v]) addSubtree(v,u,delta);
57}
58
59void dfs_sack(int u, int p, bool keep){
60 for(int v: g[u]) if(v!=p && v!=heavy[u]) dfs_sack(v,u,false);
61 if(heavy[u]!=-1){ dfs_sack(heavy[u],u,true); isBig[heavy[u]] = true; }
62
63 for(int v: g[u]) if(v!=p && v!=heavy[u]) addSubtree(v,u,+1);
64 touchColor(color_[u], +1);
65
66 // answer queries at u
67 for(const auto &q: queries[u]){
68 int k = q.k; if (k <= 0) { ans[q.id] = BIT.sumRange(1, n); }
69 else if (k > n) { ans[q.id] = 0; }
70 else { ans[q.id] = BIT.sumRange(k, n); }
71 }
72
73 if(heavy[u]!=-1) isBig[heavy[u]] = false;
74 if(!keep){
75 addSubtree(u, p, -1);
76 }
77}
78
79int main(){
80 ios::sync_with_stdio(false);
81 cin.tie(nullptr);
82
83 int q; // number of queries
84 cin >> n >> q;
85 vector<int> raw(n+1);
86 for(int i=1;i<=n;++i) cin >> raw[i];
87 for(int i=0;i<n-1;++i){ int u,v; cin >> u >> v; g[u].push_back(v); g[v].push_back(u); }
88
89 // compress colors
90 vector<int> vals = raw; sort(vals.begin()+1, vals.end()); vals.erase(unique(vals.begin()+1, vals.end()), vals.end());
91 int m = (int)vals.size()-1;
92 for(int i=1;i<=n;++i){ color_[i] = int(lower_bound(vals.begin()+1, vals.end(), raw[i]) - vals.begin()); }
93
94 // read queries: each is (u, k)
95 queries.assign(n+1, {});
96 ans.assign(q, 0);
97 for(int i=0;i<q;++i){ int u,k; cin >> u >> k; queries[u].push_back({k, i}); }
98
99 // init globals
100 freq.assign(m+1, 0);
101 cntFreq.assign(n+1, 0);
102 BIT = Fenwick(n);
103
104 dfs_sz(1,0);
105 dfs_sack(1,0,true);
106
107 for(int i=0;i<q;++i) cout << ans[i] << (i+1==q?'\n':' ');
108 if(q) cout << '\n';
109 return 0;
110}
111

We attach multiple queries (u, k) to nodes: 'How many distinct colors appear at least k times in subtree(u)?' During DSU on Tree, we maintain freq[color] and a frequency-of-frequencies array cntFreq. A Fenwick tree over cntFreq lets us answer suffix sums quickly: count of colors with freq ≥ k. The add/remove operations update both freq and cntFreq (and the BIT) in O(log n).

Time: O(n log n + q log n) for typical constraints; worst-case O(n log^2 n) if every re-add costs a BIT updateSpace: O(n + m)