🗂️Data StructureAdvanced

Centroid Decomposition

Key Points

  • Centroid decomposition splits a tree around a special node (centroid) so that every remaining component has at most half the nodes.
  • Recursively removing centroids builds a centroid tree whose height is O(log n), giving logarithmic levels for processing.
  • Each original node appears in O(log n) centroid levels, enabling many path-counting and distance problems to be solved in O(n log n).
  • Any path in the original tree has a unique highest centroid where that path is first joined, preventing double counting when handled carefully.
  • Distances in centroid-based solutions are often combined with LCA or precomputed depth to compute dist(u, v) quickly.
  • Typical applications include counting pairs with constraints (e.g., ), online color updates with nearest colored node queries, and path-property aggregations.
  • A common pitfall is double counting pairs when combining subtrees around a centroid unless per-child contributions are subtracted.
  • Careful memory and state management is needed to avoid O() behavior; use vectors that are rebuilt locally at each centroid level.

Prerequisites

  • Tree basics (DFS, adjacency lists)Centroid decomposition operates on trees and uses DFS to compute subtree sizes.
  • Big-O notation and recurrencesUnderstanding O(n log n) behavior and divide-and-conquer recurrences is crucial for complexity reasoning.
  • LCA and binary liftingFor dynamic distance queries, fast LCA-based distance computation is often required.
  • Sorting and two-pointer techniquePair counting at each centroid relies on sorting depth vectors and linear-time two-pointer sweeps.
  • Set and multiset or frequency countingSome centroid applications maintain per-depth counts or minima at centroid levels.
  • Recursion and stack disciplineThe decomposition is recursive; proper base cases and visited/removed flags are essential.

Detailed Explanation

Tap terms for definitions

01Overview

Centroid decomposition is a divide-and-conquer technique specialized for trees. It selects a node called the centroid, whose removal breaks the tree into connected components, each of size at most half of the original. This balancing property ensures that recursive decomposition has logarithmic depth. By building a hierarchy of centroids, we obtain a new tree—the centroid tree—that captures how the original tree can be separated recursively. The strength of centroid decomposition lies in its ability to reorganize global, path-based problems into simpler local computations aggregated around centroids. Typical tasks include counting path pairs satisfying conditions (e.g., total distance or color combinations) and fast online queries like finding the nearest special node. Each original node belongs to O(log n) centroid ancestors, allowing us to maintain information and answer queries efficiently. In practice, centroid decomposition proceeds in three steps per recursion level: compute subtree sizes, select a centroid, process the problem locally at that centroid, and then recurse on each remaining component. The processing at a centroid is carefully arranged to count paths that pass through the centroid exactly once, avoiding double counting by subtracting per-child contributions. The result is a highly versatile tool used extensively in competitive programming and systems that need fast tree queries.

02Intuition & Analogies

Imagine a countrywide road network that forms no cycles (a tree). You want to place service centers so that each region you split off is never too large to manage. A centroid is like choosing a town such that, if you close its roads, every isolated region has at most half the original population. That means no region becomes unwieldy, and you can continue dividing each region fairly. Now suppose you want to count how many pairs of towns are within, say, 100 km of each other. If you look at the entire map at once, the counting seems hard. But if you anchor your attention at the chosen town (the centroid), you can measure how far every other town is from it. Pairs that cross between different regions must pass through this anchor, so you can organize counting by considering distances from the center. However, to avoid double counting, you must subtract pairs formed entirely within a single region before adding everything up. By repeating this strategy—choose a balanced center, process around it, then move into smaller regions—you ensure that any town is involved in only a few balanced steps. Consequently, even though the whole country is huge, tracking information at a handful of balanced centers per town suffices. Finally, notice that any road trip between two towns eventually reaches a highest-level center where their paths first meet in the centroid hierarchy. This property guarantees that if we aggregate results properly at each chosen center, every pair is counted just once.

03Formal Definition

Given a tree T with n nodes, a centroid is a node c such that removing c (and its incident edges) partitions T into connected components , , ..., , each satisfying . A centroid always exists, and in fact a tree has either one or two centroids. In centroid decomposition, we recursively choose a centroid of the current component, mark it as the component’s representative, and recurse on each resulting component. The centroid tree C(T) is formed by creating a node for each centroid chosen during the recursion and connecting it to the centroids of its recursive subcomponents. The height of C(T) is O( n) because each step reduces component sizes by at least a factor of two. A fundamental structural property is that any simple path in T has a unique highest centroid in C(T) where the path’s two endpoints first appear in different recursive components; thus, handling path contributions at each centroid and subtracting per-child overcounts yields a partition of all paths by their highest centroid. In algorithms, we leverage this by precomputing distances from centroids to nodes in their components (or by using an LCA-based distance oracle), aggregating counts over subtrees, and ensuring each node contributes at O( n) centroid levels, which leads to near-linearithmic runtimes.

04When to Use

Use centroid decomposition when your task is fundamentally about paths or distances in a tree and requires either global counting or fast per-query updates where each node may need to contribute to multiple paths. Examples include:

  • Counting node pairs whose distance is \leq K or equals K.
  • Counting paths whose sum of edge weights satisfies a predicate (e.g., divisible by M, or within a range), often combined with frequency tricks and two-pointers.
  • Online problems with toggles (e.g., paint a node red/blue) and queries such as the minimum distance to a red node. Each update/query touches only O(\log n) centroid ancestors.
  • Maintaining per-centroid summaries (min, count histograms, prefix sums) to answer queries over all paths that pass through one of O(\log n) centroids per node. Prefer centroid decomposition over alternatives when: (1) queries/updates are path or distance-centric across the whole tree, (2) you seek O(\log n) per update/query with modest memory, or (3) Mo’s algorithm on trees or HLD would be less natural due to global pair interactions. If your operations are purely path-to-path with endpoints known (e.g., path sum queries between u and v), Heavy-Light Decomposition (HLD) might be simpler. If you need subtree-aggregate queries without path constraints, DSU-on-tree (small-to-large) may be more direct.

⚠️Common Mistakes

  • Double counting pairs: When counting pairs at a centroid, you must subtract contributions per child subtree before adding the combined total. Forgetting this leads to counting pairs that lie entirely within a single child region multiple times.
  • Not clearing temporary vectors: Depth lists or frequency arrays built per centroid and per child must be rebuilt/reset each time; reusing without clearing creates stale data and incorrect counts.
  • Wrong centroid selection: Failing to recompute subtree sizes or incorrectly stopping the centroid search (e.g., not moving to a child with size > n/2) yields an unbalanced decomposition and can degrade complexity.
  • Distance computation errors: For dynamic problems using LCA, an incorrect LCA, depth, or binary lifting table causes subtle wrong answers. Always verify dist(u, v) = depth[u] + depth[v] - 2*depth[lca(u, v)].
  • Excess recursion depth or global state leaks: Forgetting to mark nodes as removed or to restrict DFS to the current component can cause O(n^2) traversals or stack overflows. Keep a removed[] array and skip those nodes in DFS.
  • Assuming results for weighted vs. unweighted trees are interchangeable: Two-pointer distance counting with sorted depth vectors needs adaptation for weights; otherwise, pair sums won’t reflect actual distances.

Key Formulas

Centroid Balance Condition

Explanation: For a centroid c in a component of size n, every connected component resulting from removing c has size at most n/2. This guarantees recursive balance and O(log n) depth.

Decomposition Recurrence

Explanation: At each centroid we spend linear time in the current component, then recurse on k components of sizes ... Because each n/2 and , the total is O(n log n).

Centroid Tree Height

Explanation: Since each recursive step cuts component size by at least half, the number of levels h is at most the ceiling of log base 2 of n. Each node participates in at most h levels.

Tree Distance via LCA

Explanation: In an unweighted tree, the distance between two nodes equals the lengths from the root to each node minus twice the length to their LCA. With binary lifting, this is answered in O(1) or O(log n).

Pair Counting with Two Pointers

Explanation: Sorting depths takes O(m log m), then counting pairs with a moving window uses O(m). This is central for counting distance-bounded pairs at a centroid.

Per-node Participation

Explanation: Each node contributes to O(log n) centroids (one per level). Summing a constant amount of work per node per level leads to O(n log n) total complexity.

Complexity Analysis

Let n be the number of nodes. Computing subtree sizes and finding the centroid of a component of size m costs O(m). Across the entire decomposition, each node is part of O(log n) components because component sizes at least halve at each level. Therefore, the total time to build the centroid tree and perform one linear pass of work per level is O(n log n). Space for the centroid tree, removed flags, and parent arrays is O(n). For pair counting with a threshold K in an unweighted tree, at each centroid c we gather depths from c to nodes in each child component. If is the size of the processed component at centroid c, we sort depth lists of total length O() and run two-pointer counting. Summed over all centroids, the total sorting and counting is O( log ) = O(n log n), as each node appears in O(log n) components. Care must be taken to rebuild only local vectors to avoid O() memory reuse costs. For online nearest-red queries, we maintain best[c] = min distance from centroid c to any red node. An update at node u ascends the centroid tree, updating best[c] using precomputed distances; a query at u similarly ascends and computes min(best[c] + dist(u, c)). Each ascent touches O(log n) ancestors, and distance queries can be O(1) or O(log n) with LCA-based or precomputed tables, respectively. Thus, both updates and queries run in O(log n), with O(n log n) preprocessing for the centroid tree and the LCA tables. Memory usage is O(n log n) for LCA, O(n) for centroid structures, and O(n) for best values.

Code Examples

Building the Centroid Decomposition and Outputting the Centroid Tree Parents
1#include <bits/stdc++.h>
2using namespace std;
3
4// Example 1: Build centroid decomposition and print the parent in the centroid tree
5// Input:
6// n
7// n-1 lines: u v (1-indexed tree)
8// Output:
9// parent array of the centroid tree (parent of centroid-root is -1)
10
11const int MAXN = 200000 + 5;
12
13int n;
14vector<int> g[MAXN];
15bool removed_[MAXN];
16int sub[MAXN];
17int cpar[MAXN]; // parent in centroid tree (-1 for root)
18
19// Compute subtree sizes for the component rooted at u, skipping removed nodes
20int dfs_size(int u, int p) {
21 sub[u] = 1;
22 for (int v : g[u]) {
23 if (v == p || removed_[v]) continue;
24 sub[u] += dfs_size(v, u);
25 }
26 return sub[u];
27}
28
29// Find centroid of the component containing u with total size sz
30int dfs_centroid(int u, int p, int sz) {
31 for (int v : g[u]) {
32 if (v == p || removed_[v]) continue;
33 if (sub[v] > sz / 2) return dfs_centroid(v, u, sz);
34 }
35 return u;
36}
37
38void build(int entry, int parent) {
39 int sz = dfs_size(entry, -1);
40 int c = dfs_centroid(entry, -1, sz);
41 removed_[c] = true;
42 cpar[c] = parent; // parent in centroid tree
43 for (int v : g[c]) {
44 if (removed_[v]) continue;
45 build(v, c);
46 }
47}
48
49int main() {
50 ios::sync_with_stdio(false);
51 cin.tie(nullptr);
52
53 if (!(cin >> n)) return 0;
54 for (int i = 1; i <= n; ++i) g[i].clear();
55 for (int i = 0; i < n - 1; ++i) {
56 int u, v; cin >> u >> v;
57 g[u].push_back(v);
58 g[v].push_back(u);
59 }
60 fill(begin(removed_), begin(removed_) + n + 1, false);
61 build(1, -1);
62
63 // Output centroid parent array
64 for (int i = 1; i <= n; ++i) {
65 cout << cpar[i] << (i == n ? '\n' : ' ');
66 }
67 return 0;
68}
69

This program builds the centroid decomposition of a tree. It computes subtree sizes within the current component, finds the centroid by walking towards any child whose subtree exceeds half the component size, marks it as removed, and recurses on remaining components. The cpar array stores the centroid tree parent (−1 for the root centroid).

Time: O(n log n)Space: O(n)
Counting Pairs with Distance ≤ K Using Centroid Decomposition (Unweighted Tree)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Example 2: Count number of unordered pairs (u, v) with dist(u, v) <= K
5// Input:
6// n K
7// n-1 lines: u v (1-indexed unweighted tree)
8// Output:
9// number of pairs with distance <= K
10
11const int MAXN = 200000 + 5;
12int n, K;
13vector<int> g[MAXN];
14bool removed_[MAXN];
15int sub[MAXN];
16long long answerPairs = 0;
17
18int dfs_size(int u, int p) {
19 sub[u] = 1;
20 for (int v : g[u]) if (v != p && !removed_[v]) sub[u] += dfs_size(v, u);
21 return sub[u];
22}
23
24int dfs_centroid(int u, int p, int sz) {
25 for (int v : g[u]) if (v != p && !removed_[v])
26 if (sub[v] > sz / 2) return dfs_centroid(v, u, sz);
27 return u;
28}
29
30void collectDepths(int u, int p, int d, vector<int>& depths) {
31 if (d > K) return; // optional pruning for distance constraint
32 depths.push_back(d);
33 for (int v : g[u]) if (v != p && !removed_[v]) collectDepths(v, u, d + 1, depths);
34}
35
36long long countPairsWithin(vector<int>& a) {
37 sort(a.begin(), a.end());
38 long long cnt = 0;
39 int l = 0, r = (int)a.size() - 1;
40 while (l <= r) {
41 if (a[l] + a[r] <= K) {
42 cnt += (r - l); // all pairs (l, l+1..r)
43 ++l;
44 } else {
45 --r;
46 }
47 }
48 return cnt;
49}
50
51void decompose(int entry) {
52 int sz = dfs_size(entry, -1);
53 int c = dfs_centroid(entry, -1, sz);
54 removed_[c] = true;
55
56 // Gather depths from centroid c
57 vector<int> all; all.reserve(sz); // includes distance 0 for c
58 all.push_back(0);
59
60 // For each child component, collect depths and correct overcounts
61 for (int v : g[c]) if (!removed_[v]) {
62 vector<int> tmp; tmp.reserve(sub[v]);
63 collectDepths(v, c, 1, tmp);
64 // Subtract pairs entirely within this child (overcount if added later)
65 answerPairs -= countPairsWithin(tmp);
66 // Merge into all for cross-child pairs through centroid
67 all.insert(all.end(), tmp.begin(), tmp.end());
68 }
69
70 // Count all pairs whose path goes through centroid c
71 answerPairs += countPairsWithin(all);
72
73 for (int v : g[c]) if (!removed_[v]) decompose(v);
74}
75
76int main() {
77 ios::sync_with_stdio(false);
78 cin.tie(nullptr);
79
80 if (!(cin >> n >> K)) return 0;
81 for (int i = 1; i <= n; ++i) g[i].clear();
82 for (int i = 0; i < n - 1; ++i) {
83 int u, v; cin >> u >> v;
84 g[u].push_back(v); g[v].push_back(u);
85 }
86
87 fill(begin(removed_), begin(removed_) + n + 1, false);
88 answerPairs = 0;
89 decompose(1);
90
91 cout << answerPairs << "\n";
92 return 0;
93}
94

This classic solution counts unordered pairs at distance ≤ K. At each centroid c, it collects depths to nodes in every child component. By first subtracting pairs within each child (to prevent double counting) and then adding pairs within the union that includes the centroid (distance 0), it counts exactly those pairs whose highest centroid is c. Summing across all centroid levels yields the total number of valid pairs.

Time: O(n log n)Space: O(n)
Online Updates: Toggle Red Nodes and Query Minimum Distance to a Red Node
1#include <bits/stdc++.h>
2using namespace std;
3
4// Example 3: Xenia-like queries on a tree using centroid decomposition and LCA
5// Operations:
6// 1 u -> paint node u red (idempotent)
7// 2 u -> query minimum distance from u to any red node
8// Initially, no red nodes (or you can initialize one as needed by calling update(1)).
9
10const int MAXN = 200000 + 5;
11const int LOG = 20;
12
13int n, q;
14vector<int> g[MAXN];
15
16// LCA structures
17int up[LOG][MAXN];
18int depth_[MAXN];
19int tin[MAXN], tout[MAXN], timerDFS = 0;
20
21void dfs_lca(int u, int p) {
22 tin[u] = ++timerDFS;
23 up[0][u] = (p == -1 ? u : p);
24 for (int k = 1; k < LOG; ++k) up[k][u] = up[k-1][up[k-1][u]];
25 for (int v : g[u]) if (v != p) {
26 depth_[v] = depth_[u] + 1;
27 dfs_lca(v, u);
28 }
29 tout[u] = ++timerDFS;
30}
31
32bool is_ancestor(int u, int v) { // u is ancestor of v?
33 return tin[u] <= tin[v] && tout[v] <= tout[u];
34}
35
36int lca(int u, int v) {
37 if (is_ancestor(u, v)) return u;
38 if (is_ancestor(v, u)) return v;
39 for (int k = LOG - 1; k >= 0; --k) {
40 int a = up[k][u];
41 if (!is_ancestor(a, v)) u = a;
42 }
43 return up[0][u];
44}
45
46int distTree(int u, int v) {
47 int w = lca(u, v);
48 return depth_[u] + depth_[v] - 2 * depth_[w];
49}
50
51// Centroid decomposition
52bool removed_[MAXN];
53int sub[MAXN];
54int cpar[MAXN]; // centroid parent (-1 for root)
55
56int dfs_size(int u, int p) {
57 sub[u] = 1;
58 for (int v : g[u]) if (v != p && !removed_[v]) sub[u] += dfs_size(v, u);
59 return sub[u];
60}
61
62int dfs_centroid(int u, int p, int sz) {
63 for (int v : g[u]) if (v != p && !removed_[v]) if (sub[v] > sz / 2) return dfs_centroid(v, u, sz);
64 return u;
65}
66
67void build_centroid(int entry, int parent) {
68 int sz = dfs_size(entry, -1);
69 int c = dfs_centroid(entry, -1, sz);
70 removed_[c] = true;
71 cpar[c] = parent;
72 for (int v : g[c]) if (!removed_[v]) build_centroid(v, c);
73}
74
75// Online maintenance: best[c] = min distance from centroid c to any red node.
76const int INF = 1e9;
77int best[MAXN];
78
79void update_red(int u) {
80 int x = u;
81 while (x != -1) {
82 best[x] = min(best[x], distTree(u, x));
83 x = cpar[x];
84 }
85}
86
87int query_min_dist(int u) {
88 int ans = INF;
89 int x = u;
90 while (x != -1) {
91 ans = min(ans, best[x] + distTree(u, x));
92 x = cpar[x];
93 }
94 return ans;
95}
96
97int main() {
98 ios::sync_with_stdio(false);
99 cin.tie(nullptr);
100
101 if (!(cin >> n >> q)) return 0;
102 for (int i = 1; i <= n; ++i) g[i].clear();
103 for (int i = 0; i < n - 1; ++i) {
104 int u, v; cin >> u >> v;
105 g[u].push_back(v); g[v].push_back(u);
106 }
107
108 // Preprocess LCA
109 depth_[1] = 0; timerDFS = 0;
110 dfs_lca(1, -1);
111
112 // Build centroid decomposition
113 fill(begin(removed_), begin(removed_) + n + 1, false);
114 build_centroid(1, -1);
115
116 // Initialize best[]
117 for (int i = 1; i <= n; ++i) best[i] = INF;
118
119 // Optionally start with node 1 red (common in CF 342E)
120 // update_red(1);
121
122 while (q--) {
123 int t, u; cin >> t >> u;
124 if (t == 1) {
125 update_red(u);
126 } else if (t == 2) {
127 int ans = query_min_dist(u);
128 cout << ans << "\n";
129 }
130 }
131 return 0;
132}
133

This program handles online updates and queries: painting nodes red and finding the nearest red node. It combines LCA for O(1)/O(log n) distance and centroid decomposition for O(log n) updates/queries. Each update ascends the centroid tree and improves best[c] with dist(u, c). Each query ascends similarly and minimizes best[c] + dist(u, c).

Time: Preprocessing: O(n log n). Each update and query: O(log n).Space: O(n log n) for LCA tables and O(n) for centroid structures.