Centroid Decomposition
Key Points
- •Centroid decomposition splits a tree around a special node (centroid) so that every remaining component has at most half the nodes.
- •Recursively removing centroids builds a centroid tree whose height is O(log n), giving logarithmic levels for processing.
- •Each original node appears in O(log n) centroid levels, enabling many path-counting and distance problems to be solved in O(n log n).
- •Any path in the original tree has a unique highest centroid where that path is first joined, preventing double counting when handled carefully.
- •Distances in centroid-based solutions are often combined with LCA or precomputed depth to compute dist(u, v) quickly.
- •Typical applications include counting pairs with constraints (e.g., ), online color updates with nearest colored node queries, and path-property aggregations.
- •A common pitfall is double counting pairs when combining subtrees around a centroid unless per-child contributions are subtracted.
- •Careful memory and state management is needed to avoid O() behavior; use vectors that are rebuilt locally at each centroid level.
Prerequisites
- →Tree basics (DFS, adjacency lists) — Centroid decomposition operates on trees and uses DFS to compute subtree sizes.
- →Big-O notation and recurrences — Understanding O(n log n) behavior and divide-and-conquer recurrences is crucial for complexity reasoning.
- →LCA and binary lifting — For dynamic distance queries, fast LCA-based distance computation is often required.
- →Sorting and two-pointer technique — Pair counting at each centroid relies on sorting depth vectors and linear-time two-pointer sweeps.
- →Set and multiset or frequency counting — Some centroid applications maintain per-depth counts or minima at centroid levels.
- →Recursion and stack discipline — The decomposition is recursive; proper base cases and visited/removed flags are essential.
Detailed Explanation
Tap terms for definitions01Overview
Centroid decomposition is a divide-and-conquer technique specialized for trees. It selects a node called the centroid, whose removal breaks the tree into connected components, each of size at most half of the original. This balancing property ensures that recursive decomposition has logarithmic depth. By building a hierarchy of centroids, we obtain a new tree—the centroid tree—that captures how the original tree can be separated recursively. The strength of centroid decomposition lies in its ability to reorganize global, path-based problems into simpler local computations aggregated around centroids. Typical tasks include counting path pairs satisfying conditions (e.g., total distance or color combinations) and fast online queries like finding the nearest special node. Each original node belongs to O(log n) centroid ancestors, allowing us to maintain information and answer queries efficiently. In practice, centroid decomposition proceeds in three steps per recursion level: compute subtree sizes, select a centroid, process the problem locally at that centroid, and then recurse on each remaining component. The processing at a centroid is carefully arranged to count paths that pass through the centroid exactly once, avoiding double counting by subtracting per-child contributions. The result is a highly versatile tool used extensively in competitive programming and systems that need fast tree queries.
02Intuition & Analogies
Imagine a countrywide road network that forms no cycles (a tree). You want to place service centers so that each region you split off is never too large to manage. A centroid is like choosing a town such that, if you close its roads, every isolated region has at most half the original population. That means no region becomes unwieldy, and you can continue dividing each region fairly. Now suppose you want to count how many pairs of towns are within, say, 100 km of each other. If you look at the entire map at once, the counting seems hard. But if you anchor your attention at the chosen town (the centroid), you can measure how far every other town is from it. Pairs that cross between different regions must pass through this anchor, so you can organize counting by considering distances from the center. However, to avoid double counting, you must subtract pairs formed entirely within a single region before adding everything up. By repeating this strategy—choose a balanced center, process around it, then move into smaller regions—you ensure that any town is involved in only a few balanced steps. Consequently, even though the whole country is huge, tracking information at a handful of balanced centers per town suffices. Finally, notice that any road trip between two towns eventually reaches a highest-level center where their paths first meet in the centroid hierarchy. This property guarantees that if we aggregate results properly at each chosen center, every pair is counted just once.
03Formal Definition
04When to Use
Use centroid decomposition when your task is fundamentally about paths or distances in a tree and requires either global counting or fast per-query updates where each node may need to contribute to multiple paths. Examples include:
- Counting node pairs whose distance is \leq K or equals K.
- Counting paths whose sum of edge weights satisfies a predicate (e.g., divisible by M, or within a range), often combined with frequency tricks and two-pointers.
- Online problems with toggles (e.g., paint a node red/blue) and queries such as the minimum distance to a red node. Each update/query touches only O(\log n) centroid ancestors.
- Maintaining per-centroid summaries (min, count histograms, prefix sums) to answer queries over all paths that pass through one of O(\log n) centroids per node. Prefer centroid decomposition over alternatives when: (1) queries/updates are path or distance-centric across the whole tree, (2) you seek O(\log n) per update/query with modest memory, or (3) Mo’s algorithm on trees or HLD would be less natural due to global pair interactions. If your operations are purely path-to-path with endpoints known (e.g., path sum queries between u and v), Heavy-Light Decomposition (HLD) might be simpler. If you need subtree-aggregate queries without path constraints, DSU-on-tree (small-to-large) may be more direct.
⚠️Common Mistakes
- Double counting pairs: When counting pairs at a centroid, you must subtract contributions per child subtree before adding the combined total. Forgetting this leads to counting pairs that lie entirely within a single child region multiple times.
- Not clearing temporary vectors: Depth lists or frequency arrays built per centroid and per child must be rebuilt/reset each time; reusing without clearing creates stale data and incorrect counts.
- Wrong centroid selection: Failing to recompute subtree sizes or incorrectly stopping the centroid search (e.g., not moving to a child with size > n/2) yields an unbalanced decomposition and can degrade complexity.
- Distance computation errors: For dynamic problems using LCA, an incorrect LCA, depth, or binary lifting table causes subtle wrong answers. Always verify dist(u, v) = depth[u] + depth[v] - 2*depth[lca(u, v)].
- Excess recursion depth or global state leaks: Forgetting to mark nodes as removed or to restrict DFS to the current component can cause O(n^2) traversals or stack overflows. Keep a removed[] array and skip those nodes in DFS.
- Assuming results for weighted vs. unweighted trees are interchangeable: Two-pointer distance counting with sorted depth vectors needs adaptation for weights; otherwise, pair sums won’t reflect actual distances.
Key Formulas
Centroid Balance Condition
Explanation: For a centroid c in a component of size n, every connected component resulting from removing c has size at most n/2. This guarantees recursive balance and O(log n) depth.
Decomposition Recurrence
Explanation: At each centroid we spend linear time in the current component, then recurse on k components of sizes ... Because each n/2 and , the total is O(n log n).
Centroid Tree Height
Explanation: Since each recursive step cuts component size by at least half, the number of levels h is at most the ceiling of log base 2 of n. Each node participates in at most h levels.
Tree Distance via LCA
Explanation: In an unweighted tree, the distance between two nodes equals the lengths from the root to each node minus twice the length to their LCA. With binary lifting, this is answered in O(1) or O(log n).
Pair Counting with Two Pointers
Explanation: Sorting depths takes O(m log m), then counting pairs with a moving window uses O(m). This is central for counting distance-bounded pairs at a centroid.
Per-node Participation
Explanation: Each node contributes to O(log n) centroids (one per level). Summing a constant amount of work per node per level leads to O(n log n) total complexity.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Example 1: Build centroid decomposition and print the parent in the centroid tree 5 // Input: 6 // n 7 // n-1 lines: u v (1-indexed tree) 8 // Output: 9 // parent array of the centroid tree (parent of centroid-root is -1) 10 11 const int MAXN = 200000 + 5; 12 13 int n; 14 vector<int> g[MAXN]; 15 bool removed_[MAXN]; 16 int sub[MAXN]; 17 int cpar[MAXN]; // parent in centroid tree (-1 for root) 18 19 // Compute subtree sizes for the component rooted at u, skipping removed nodes 20 int dfs_size(int u, int p) { 21 sub[u] = 1; 22 for (int v : g[u]) { 23 if (v == p || removed_[v]) continue; 24 sub[u] += dfs_size(v, u); 25 } 26 return sub[u]; 27 } 28 29 // Find centroid of the component containing u with total size sz 30 int dfs_centroid(int u, int p, int sz) { 31 for (int v : g[u]) { 32 if (v == p || removed_[v]) continue; 33 if (sub[v] > sz / 2) return dfs_centroid(v, u, sz); 34 } 35 return u; 36 } 37 38 void build(int entry, int parent) { 39 int sz = dfs_size(entry, -1); 40 int c = dfs_centroid(entry, -1, sz); 41 removed_[c] = true; 42 cpar[c] = parent; // parent in centroid tree 43 for (int v : g[c]) { 44 if (removed_[v]) continue; 45 build(v, c); 46 } 47 } 48 49 int main() { 50 ios::sync_with_stdio(false); 51 cin.tie(nullptr); 52 53 if (!(cin >> n)) return 0; 54 for (int i = 1; i <= n; ++i) g[i].clear(); 55 for (int i = 0; i < n - 1; ++i) { 56 int u, v; cin >> u >> v; 57 g[u].push_back(v); 58 g[v].push_back(u); 59 } 60 fill(begin(removed_), begin(removed_) + n + 1, false); 61 build(1, -1); 62 63 // Output centroid parent array 64 for (int i = 1; i <= n; ++i) { 65 cout << cpar[i] << (i == n ? '\n' : ' '); 66 } 67 return 0; 68 } 69
This program builds the centroid decomposition of a tree. It computes subtree sizes within the current component, finds the centroid by walking towards any child whose subtree exceeds half the component size, marks it as removed, and recurses on remaining components. The cpar array stores the centroid tree parent (−1 for the root centroid).
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Example 2: Count number of unordered pairs (u, v) with dist(u, v) <= K 5 // Input: 6 // n K 7 // n-1 lines: u v (1-indexed unweighted tree) 8 // Output: 9 // number of pairs with distance <= K 10 11 const int MAXN = 200000 + 5; 12 int n, K; 13 vector<int> g[MAXN]; 14 bool removed_[MAXN]; 15 int sub[MAXN]; 16 long long answerPairs = 0; 17 18 int dfs_size(int u, int p) { 19 sub[u] = 1; 20 for (int v : g[u]) if (v != p && !removed_[v]) sub[u] += dfs_size(v, u); 21 return sub[u]; 22 } 23 24 int dfs_centroid(int u, int p, int sz) { 25 for (int v : g[u]) if (v != p && !removed_[v]) 26 if (sub[v] > sz / 2) return dfs_centroid(v, u, sz); 27 return u; 28 } 29 30 void collectDepths(int u, int p, int d, vector<int>& depths) { 31 if (d > K) return; // optional pruning for distance constraint 32 depths.push_back(d); 33 for (int v : g[u]) if (v != p && !removed_[v]) collectDepths(v, u, d + 1, depths); 34 } 35 36 long long countPairsWithin(vector<int>& a) { 37 sort(a.begin(), a.end()); 38 long long cnt = 0; 39 int l = 0, r = (int)a.size() - 1; 40 while (l <= r) { 41 if (a[l] + a[r] <= K) { 42 cnt += (r - l); // all pairs (l, l+1..r) 43 ++l; 44 } else { 45 --r; 46 } 47 } 48 return cnt; 49 } 50 51 void decompose(int entry) { 52 int sz = dfs_size(entry, -1); 53 int c = dfs_centroid(entry, -1, sz); 54 removed_[c] = true; 55 56 // Gather depths from centroid c 57 vector<int> all; all.reserve(sz); // includes distance 0 for c 58 all.push_back(0); 59 60 // For each child component, collect depths and correct overcounts 61 for (int v : g[c]) if (!removed_[v]) { 62 vector<int> tmp; tmp.reserve(sub[v]); 63 collectDepths(v, c, 1, tmp); 64 // Subtract pairs entirely within this child (overcount if added later) 65 answerPairs -= countPairsWithin(tmp); 66 // Merge into all for cross-child pairs through centroid 67 all.insert(all.end(), tmp.begin(), tmp.end()); 68 } 69 70 // Count all pairs whose path goes through centroid c 71 answerPairs += countPairsWithin(all); 72 73 for (int v : g[c]) if (!removed_[v]) decompose(v); 74 } 75 76 int main() { 77 ios::sync_with_stdio(false); 78 cin.tie(nullptr); 79 80 if (!(cin >> n >> K)) return 0; 81 for (int i = 1; i <= n; ++i) g[i].clear(); 82 for (int i = 0; i < n - 1; ++i) { 83 int u, v; cin >> u >> v; 84 g[u].push_back(v); g[v].push_back(u); 85 } 86 87 fill(begin(removed_), begin(removed_) + n + 1, false); 88 answerPairs = 0; 89 decompose(1); 90 91 cout << answerPairs << "\n"; 92 return 0; 93 } 94
This classic solution counts unordered pairs at distance ≤ K. At each centroid c, it collects depths to nodes in every child component. By first subtracting pairs within each child (to prevent double counting) and then adding pairs within the union that includes the centroid (distance 0), it counts exactly those pairs whose highest centroid is c. Summing across all centroid levels yields the total number of valid pairs.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Example 3: Xenia-like queries on a tree using centroid decomposition and LCA 5 // Operations: 6 // 1 u -> paint node u red (idempotent) 7 // 2 u -> query minimum distance from u to any red node 8 // Initially, no red nodes (or you can initialize one as needed by calling update(1)). 9 10 const int MAXN = 200000 + 5; 11 const int LOG = 20; 12 13 int n, q; 14 vector<int> g[MAXN]; 15 16 // LCA structures 17 int up[LOG][MAXN]; 18 int depth_[MAXN]; 19 int tin[MAXN], tout[MAXN], timerDFS = 0; 20 21 void dfs_lca(int u, int p) { 22 tin[u] = ++timerDFS; 23 up[0][u] = (p == -1 ? u : p); 24 for (int k = 1; k < LOG; ++k) up[k][u] = up[k-1][up[k-1][u]]; 25 for (int v : g[u]) if (v != p) { 26 depth_[v] = depth_[u] + 1; 27 dfs_lca(v, u); 28 } 29 tout[u] = ++timerDFS; 30 } 31 32 bool is_ancestor(int u, int v) { // u is ancestor of v? 33 return tin[u] <= tin[v] && tout[v] <= tout[u]; 34 } 35 36 int lca(int u, int v) { 37 if (is_ancestor(u, v)) return u; 38 if (is_ancestor(v, u)) return v; 39 for (int k = LOG - 1; k >= 0; --k) { 40 int a = up[k][u]; 41 if (!is_ancestor(a, v)) u = a; 42 } 43 return up[0][u]; 44 } 45 46 int distTree(int u, int v) { 47 int w = lca(u, v); 48 return depth_[u] + depth_[v] - 2 * depth_[w]; 49 } 50 51 // Centroid decomposition 52 bool removed_[MAXN]; 53 int sub[MAXN]; 54 int cpar[MAXN]; // centroid parent (-1 for root) 55 56 int dfs_size(int u, int p) { 57 sub[u] = 1; 58 for (int v : g[u]) if (v != p && !removed_[v]) sub[u] += dfs_size(v, u); 59 return sub[u]; 60 } 61 62 int dfs_centroid(int u, int p, int sz) { 63 for (int v : g[u]) if (v != p && !removed_[v]) if (sub[v] > sz / 2) return dfs_centroid(v, u, sz); 64 return u; 65 } 66 67 void build_centroid(int entry, int parent) { 68 int sz = dfs_size(entry, -1); 69 int c = dfs_centroid(entry, -1, sz); 70 removed_[c] = true; 71 cpar[c] = parent; 72 for (int v : g[c]) if (!removed_[v]) build_centroid(v, c); 73 } 74 75 // Online maintenance: best[c] = min distance from centroid c to any red node. 76 const int INF = 1e9; 77 int best[MAXN]; 78 79 void update_red(int u) { 80 int x = u; 81 while (x != -1) { 82 best[x] = min(best[x], distTree(u, x)); 83 x = cpar[x]; 84 } 85 } 86 87 int query_min_dist(int u) { 88 int ans = INF; 89 int x = u; 90 while (x != -1) { 91 ans = min(ans, best[x] + distTree(u, x)); 92 x = cpar[x]; 93 } 94 return ans; 95 } 96 97 int main() { 98 ios::sync_with_stdio(false); 99 cin.tie(nullptr); 100 101 if (!(cin >> n >> q)) return 0; 102 for (int i = 1; i <= n; ++i) g[i].clear(); 103 for (int i = 0; i < n - 1; ++i) { 104 int u, v; cin >> u >> v; 105 g[u].push_back(v); g[v].push_back(u); 106 } 107 108 // Preprocess LCA 109 depth_[1] = 0; timerDFS = 0; 110 dfs_lca(1, -1); 111 112 // Build centroid decomposition 113 fill(begin(removed_), begin(removed_) + n + 1, false); 114 build_centroid(1, -1); 115 116 // Initialize best[] 117 for (int i = 1; i <= n; ++i) best[i] = INF; 118 119 // Optionally start with node 1 red (common in CF 342E) 120 // update_red(1); 121 122 while (q--) { 123 int t, u; cin >> t >> u; 124 if (t == 1) { 125 update_red(u); 126 } else if (t == 2) { 127 int ans = query_min_dist(u); 128 cout << ans << "\n"; 129 } 130 } 131 return 0; 132 } 133
This program handles online updates and queries: painting nodes red and finding the nearest red node. It combines LCA for O(1)/O(log n) distance and centroid decomposition for O(log n) updates/queries. Each update ascends the centroid tree and improves best[c] with dist(u, c). Each query ascends similarly and minimizes best[c] + dist(u, c).