🗂️Data StructureAdvanced

Centroid Decomposition - Distance Queries

Key Points

  • Centroid decomposition splits a tree into levels by repeatedly removing a centroid so that each remaining component is at most half the size.
  • At each centroid, we precompute and store the distance from every node in its component to that centroid, creating O(n n) total stored distances.
  • Dynamic queries like 'nearest marked node' are answered by checking O( n) centroid ancestors and combining their precomputed distances.
  • Updates (mark/unmark a node) propagate along the same centroid-ancestor chain, also in O( n).
  • The key formula is ans(u) = \{[c] + (u,c)\}, where best[c] tracks the nearest marked node to centroid c.
  • The centroid tree has height O( n), giving time O( n) per query/update and space O(n n).
  • Using multisets at each centroid lets you support deletions (toggle on/off) at the cost of an extra factor.
  • This technique is powerful for tree distance problems where updates and queries interleave online.

Prerequisites

  • Trees and basic DFS/BFSCentroid decomposition relies on understanding tree traversal and connected components.
  • Recursion and divide-and-conquerThe decomposition recursively splits the tree and builds a hierarchy; recursion depth analysis is crucial.
  • Big-O analysisTo reason about O(n log n) preprocessing and O(log n) per query/update, you must be comfortable with asymptotics.
  • Sets and multisets (balanced BSTs)To support deletions, we need per-centroid multisets and their O(log n) operations.
  • Graph representationsAdjacency lists and edge iteration are used extensively to store the tree.
  • Shortest path on treesUnderstanding that the path between any two nodes is unique simplifies distance accumulation.
  • 64-bit integer arithmeticWeighted trees may require long long to avoid overflow.
  • Binary search trees and priority queues (optional)Alternative per-centroid structures can be built on ordered containers.

Detailed Explanation

Tap terms for definitions

01Overview

Centroid decomposition is a divide-and-conquer technique for trees that turns global distance problems into a sequence of small local checks. A centroid of a tree is a node whose removal splits the tree into components, each with at most half the original nodes. By recursively removing centroids, we build a hierarchy (the centroid tree) whose height is O(log n). The core trick for distance queries is to, for every centroid, precompute and store the distance from that centroid to every node in the component it governs at its recursion step. As a result, each original tree node stores a short list of distances to O(log n) centroids—precisely the centroids on the path from that node up to the root of the centroid tree. For dynamic problems like 'activate a node' and 'what is the distance from node u to the nearest active node?', we maintain, for each centroid c, a summary value best[c] equal to the smallest distance from c to any active node. To answer a query at u, we combine each centroid-distance pair (c, dist(u,c)) for u with best[c] and take the minimum. Updates (activations) simply relax best[c] along the same centroid list. This achieves O(log n) per operation after an O(n log n) preprocessing, and O(n log n) space. The method is widely used in competitive programming for online tree distance tasks.

02Intuition & Analogies

Imagine a city whose roads form a tree: neighborhoods are intersections and roads are edges. We want to repeatedly place emergency stations (activations) and, for any neighborhood, instantly know the distance to the nearest station (query). Checking every station each time would be too slow. Instead, we designate special hubs (centroids) such that each neighborhood has only a few hubs on its 'administrative ladder'. Every neighborhood knows how far it is from each hub above it. Each hub also keeps a note of 'how close is the nearest station to me?' Now, when someone asks from neighborhood u, we tell them: 'Consider all the hubs that administratively cover you. For each hub c, if the nearest station is d1 away from c, and you are d2 away from c, then one possible route to a station is d1 + d2. Take the best of those.' Because your ladder of hubs is short (O(log n)), checking them is fast. Why does each neighborhood have only a few hubs? Because hubs are defined by cutting the city at a centroid each time, ensuring the remaining districts are at most half the size. So your administrative ladder rises at most log2(n) levels. The beauty is that hubs summarize information about all stations efficiently. Updating when a new station appears is also easy: we just climb the neighborhood’s hub ladder and update each hub’s note with the station’s distance. This turns a global nearest-station problem over the whole city into a compact, hierarchical series of local checks.

03Formal Definition

Given an unrooted tree T = (V, E) with = n, a centroid is a vertex c V such that removing c splits T into connected components, each having size at most n/2. Centroid decomposition recursively selects a centroid c of T, marks it as the parent in a new tree (the centroid tree), removes c from T, and recurses on each remaining component, attaching the decomposition roots as children of c in . The height of is O( n). For distance queries, define for each vertex u a list P(u) = [(, ), (, ), ..., (, )] where , , ..., are the centroid ancestors of u in from bottom to top, and , ). Let S V be the dynamic set of 'active' (marked) vertices. For each centroid c, maintain best[c] = dis(x, c), with best[c] = + if S = . Then a query at u returns ans(u) = \{ best[c] + d \}. Preprocessing computes P(u) for all u by DFS from each centroid at its recursion step, pushing pairs (centroid, distance). This ensures each node u appears in P(u) exactly once per decomposition level, yielding total storage O(n n). Updates to S adjust best[c] along P(u) in O( n).

04When to Use

Use centroid decomposition with distance caching when your problem is on a tree and requires many interleaved operations of two kinds: (1) update some node(s) (e.g., toggle or add/remove a property), and (2) query distances to the nearest node with a property, often in the unweighted or weighted distance metric along tree edges. Classic tasks include nearest colored node, sum/min of distances to a subset, and handling online updates efficiently. It is particularly suitable when the query or update cost must be near O(\log n) and preprocessing can be O(n \log n). Centroid decomposition also helps when distances should be combined with simple associative operations (min, max, sum) that can be summarized per centroid. For example, maintain the minimum distance to a red node, or the count of red nodes within distance \leq D by storing histograms per centroid. Weighted edges are also manageable since distances can be accumulated during DFS. Avoid centroid decomposition when the graph is not a tree, or when heavy path decompositions or LCA-based formulas provide simpler solutions. If the operation involves complex constraints that cannot be summarized at each centroid (e.g., non-local predicates requiring cross-subtree interactions that are not decomposable), other structures might be better.

⚠️Common Mistakes

• Forgetting to add the centroid itself to each node’s path list: every node u must include (c, dist(u,c)) for every centroid ancestor c, including (c = u’s own centroid, possibly u = c) with distance 0 at that level. • Computing subtree sizes incorrectly during decomposition—after removing a centroid, you must recompute sizes only within the current component and ignore already 'removed' nodes; otherwise centroids become incorrect and the decomposition height guarantee breaks. • Traversing into removed nodes while collecting distances from a centroid: the collect DFS must skip removed nodes to avoid mixing components and to prevent revisiting. • Using only a single best value globally rather than per centroid: the algorithm needs one best[c] per centroid, not a single global best. Similarly, failing to re-minimize best[c] during multiple updates leads to wrong answers. • Attempting to support deletions with only best[c] (a single integer). Since best[c] is a min over active nodes, removing a node may increase the min; you need a multiset or priority structure per centroid to delete correctly. Otherwise, queries after deletions return outdated small values. • Overflow with weighted distances: use 64-bit integers (long long) for distances and INF, and guard for empty multisets to avoid reading invalid values. • Recursion depth and stack limits on large trees: use iterative DFS where necessary or increase stack size if the platform requires it (e.g., some judges).

Key Formulas

Centroid Tree Height

Explanation: Each removal of a centroid ensures every remaining component has at most half the nodes. Thus, along any path in the centroid tree, sizes decrease geometrically, yielding logarithmic height.

Storage Size

Explanation: Each node u appears in the path list P(u) once per level of the centroid tree, whose height is O(log n). Therefore the total number of stored centroid-distance pairs is O(n log n).

Per-centroid Summary

Explanation: For each centroid c, store the minimum distance to any active node x. This value is updated on activations and, with a multiset, on deletions.

Query Formula

Explanation: To find the nearest active node to u, combine the stored distance from u to each centroid ancestor c with c’s best value, then take the minimum over all such centroids.

Preprocessing Time

Explanation: At each of O(log n) levels, we traverse disjoint components whose total size sums to n, and we run distance-collection DFS per centroid, resulting in O(n log n) overall.

Query/Update Time

Explanation: Both operations traverse the O(log n) centroid ancestors of a node and do O(1) work per ancestor when using simple min summaries.

Toggle with Multisets

Explanation: If deletions must be supported, each per-centroid multiset operation takes O(log n); combined across O(log n) ancestors yields O(lo n) per update.

Geometric Decrease

Explanation: Across levels, the sizes of components visited along a path decrease geometrically, explaining linear work per level and the O(n log n) total.

Complexity Analysis

Building the centroid decomposition requires computing subtree sizes and finding a centroid at each recursion. Because removing a centroid ensures that remaining components are at most half the size, the centroid tree height is O(log n). At each level, we perform DFS traversals that collectively visit each node a constant number of times to collect distances to the current centroid, so per level work is O(n). Summed over O(log n) levels, preprocessing takes O(n log n) time. We store, for each node u, its distances to the centroids on its path in the centroid tree; there are O(log n) such centroids per node on average, producing O(n log n) space usage. For dynamic nearest-marked queries without deletions, we maintain best[c] for each centroid c. An update on a node u iterates over the O(log n) pairs (c, dist(u,c)) in P(u) and updates best[c] = min(best[c], dist(u,c)), costing O(log n). A query on node u computes ans(u) = mi { best[c] + dist(u,c) } over the same O(log n) list, which is O(log n) time. If we must support deletions (or toggling), a single integer best[c] is insufficient; we instead maintain a multiset of distances per centroid and use its minimum for queries. Each insertion or deletion into a multiset costs O(log n); since we do this for O(log n) centroids per update, the amortized update complexity becomes O(lo n). Query time remains O(log n) by reading the minimum from each relevant multiset. For weighted trees, time and space complexities are unchanged asymptotically; we only switch to 64-bit distances. Overall, centroid decomposition provides predictable logarithmic query/update bounds with manageable O(n log n) memory.

Code Examples

Nearest active node (unweighted tree) with only activations (no deletions)
1#include <bits/stdc++.h>
2using namespace std;
3
4/*
5 Centroid Decomposition for dynamic distance queries on a tree.
6 Operations:
7 - update(v): activate node v (make it red)
8 - query(u): distance from u to the nearest active node
9 Supports only activations; no deactivations. Uses per-centroid best[] = min distance to any active node.
10
11 Complexity:
12 Build: O(n log n)
13 update/query: O(log n)
14*/
15
16struct CentroidDecomposition {
17 int n;
18 vector<vector<int>> adj; // original tree adjacency
19 vector<int> subsz; // subtree sizes for centroid finding
20 vector<int> cpar; // parent in centroid tree (-1 for root)
21 vector<int> level; // level in centroid tree (depth)
22 vector<int> removed; // mark centroid as removed
23
24 // For each node u: list of pairs (centroid c, distance dist(u,c))
25 vector<vector<pair<int,int>>> paths;
26
27 // Per-centroid best distance to any active node
28 const int INF = 1e9;
29 vector<int> best;
30
31 CentroidDecomposition(int n = 0) { init(n); }
32
33 void init(int n_) {
34 n = n_;
35 adj.assign(n+1, {});
36 subsz.assign(n+1, 0);
37 cpar.assign(n+1, -1);
38 level.assign(n+1, 0);
39 removed.assign(n+1, 0);
40 paths.assign(n+1, {});
41 best.assign(n+1, INF); // indexed by centroid id
42 }
43
44 void add_edge(int u, int v) {
45 adj[u].push_back(v);
46 adj[v].push_back(u);
47 }
48
49 // 1) Compute subtree sizes in the current component
50 void dfs_sz(int u, int p) {
51 subsz[u] = 1;
52 for (int v : adj[u]) if (v != p && !removed[v]) {
53 dfs_sz(v, u);
54 subsz[u] += subsz[v];
55 }
56 }
57
58 // 2) Find centroid of current component (rooted at u with parent p), comp_size is total size
59 int find_centroid(int u, int p, int comp_size) {
60 for (int v : adj[u]) if (v != p && !removed[v]) {
61 if (subsz[v] > comp_size / 2) return find_centroid(v, u, comp_size);
62 }
63 return u; // u is centroid
64 }
65
66 // 3) Collect distances from centroid c into its component (skip removed nodes)
67 void collect(int u, int p, int dist, int c) {
68 paths[u].push_back({c, dist});
69 for (int v : adj[u]) if (v != p && !removed[v]) {
70 collect(v, u, dist + 1, c);
71 }
72 }
73
74 // 4) Decompose component rooted at u, with parent p in centroid tree, at depth d
75 void decompose(int u, int p, int d) {
76 dfs_sz(u, -1);
77 int c = find_centroid(u, -1, subsz[u]);
78 removed[c] = 1;
79 cpar[c] = p; // parent in centroid tree (=-1 for root)
80 level[c] = d;
81
82 // Distance from centroid to itself
83 paths[c].push_back({c, 0});
84 // Collect distances to all nodes in this component
85 for (int v : adj[c]) if (!removed[v]) {
86 collect(v, c, 1, c);
87 }
88
89 // Recurse on remaining components
90 for (int v : adj[c]) if (!removed[v]) {
91 decompose(v, c, d + 1);
92 }
93 removed[c] = 0; // mark back as usable for future computations
94 }
95
96 // Build entry point: call with any node as root of current component
97 void build(int root = 1) {
98 // During build, we temporarily use removed[] to cut components, then restore.
99 // So we keep a separate stack of 'cut' centroids. Simpler approach: use a helper that
100 // keeps track of components without permanently setting removed. Here we set and unset around recursion.
101 // Implementation above already resets removed[c] at exit, so we start with all zeros.
102 decompose(root, -1, 0);
103 }
104
105 // Activate node u
106 void update(int u) {
107 for (auto [c, d] : paths[u]) {
108 best[c] = min(best[c], d);
109 }
110 }
111
112 // Query nearest active node to u
113 int query(int u) const {
114 int ans = INF;
115 for (auto [c, d] : paths[u]) {
116 if (best[c] < INF) ans = min(ans, best[c] + d);
117 }
118 return ans;
119 }
120};
121
122int main() {
123 ios::sync_with_stdio(false);
124 cin.tie(nullptr);
125
126 int n, q;
127 if (!(cin >> n >> q)) return 0;
128 CentroidDecomposition cd(n);
129 for (int i = 0; i < n-1; ++i) {
130 int u, v; cin >> u >> v;
131 cd.add_edge(u, v);
132 }
133 cd.build(1);
134
135 // Example protocol (like CF 342E): initially node 1 is active
136 cd.update(1);
137
138 while (q--) {
139 int type, v; cin >> type >> v;
140 if (type == 1) {
141 cd.update(v); // activate v
142 } else if (type == 2) {
143 int ans = cd.query(v);
144 cout << ans << '\n';
145 }
146 }
147 return 0;
148}
149

This program builds a centroid decomposition, stores for each node u its O(log n) pairs (centroid, distance), and maintains a best distance per centroid to any active node. Updates relax best values along the centroid path of the updated node, and queries combine best[c] with dist(u,c) to get the minimum. It supports only activations (no deletions).

Time: Build O(n log n); update O(log n); query O(log n)Space: O(n log n) for paths plus O(n) for arrays
Nearest active node with toggles (insertions and deletions) via per-centroid multisets
1#include <bits/stdc++.h>
2using namespace std;
3
4/*
5 Centroid Decomposition with full toggles:
6 - toggle_on(v): activate node v
7 - toggle_off(v): deactivate node v
8 - query(u): distance from u to nearest active node
9
10 Uses per-centroid multisets to support deletions. The minimum element of each multiset
11 is the current best distance from that centroid to any active node.
12
13 Complexity:
14 Build: O(n log n)
15 toggle_on/off: O(log^2 n) (O(log n) centroids * O(log n) multiset ops)
16 query: O(log n)
17*/
18
19struct CD_Toggle {
20 int n;
21 vector<vector<int>> adj;
22 vector<int> subsz, cpar, level, removed;
23 vector<vector<pair<int,int>>> paths; // (centroid, dist)
24 vector<multiset<int>> bag; // per-centroid multiset of distances
25 vector<int> active; // 0/1 active flag per node for safety
26
27 CD_Toggle(int n = 0) { init(n); }
28
29 void init(int n_) {
30 n = n_;
31 adj.assign(n+1, {});
32 subsz.assign(n+1, 0);
33 cpar.assign(n+1, -1);
34 level.assign(n+1, 0);
35 removed.assign(n+1, 0);
36 paths.assign(n+1, {});
37 bag.assign(n+1, {});
38 active.assign(n+1, 0);
39 }
40
41 void add_edge(int u, int v) {
42 adj[u].push_back(v);
43 adj[v].push_back(u);
44 }
45
46 void dfs_sz(int u, int p) {
47 subsz[u] = 1;
48 for (int v : adj[u]) if (v != p && !removed[v]) {
49 dfs_sz(v, u);
50 subsz[u] += subsz[v];
51 }
52 }
53
54 int find_centroid(int u, int p, int comp) {
55 for (int v : adj[u]) if (v != p && !removed[v]) if (subsz[v] > comp/2) return find_centroid(v, u, comp);
56 return u;
57 }
58
59 void collect(int u, int p, int dist, int c) {
60 paths[u].push_back({c, dist});
61 for (int v : adj[u]) if (v != p && !removed[v]) {
62 collect(v, u, dist + 1, c);
63 }
64 }
65
66 void decompose(int u, int p, int d) {
67 dfs_sz(u, -1);
68 int c = find_centroid(u, -1, subsz[u]);
69 removed[c] = 1;
70 cpar[c] = p;
71 level[c] = d;
72 paths[c].push_back({c, 0});
73 for (int v : adj[c]) if (!removed[v]) collect(v, c, 1, c);
74 for (int v : adj[c]) if (!removed[v]) decompose(v, c, d+1);
75 removed[c] = 0;
76 }
77
78 void build(int root = 1) { decompose(root, -1, 0); }
79
80 void toggle_on(int u) {
81 if (active[u]) return;
82 active[u] = 1;
83 for (auto [c, d] : paths[u]) bag[c].insert(d);
84 }
85
86 void toggle_off(int u) {
87 if (!active[u]) return;
88 active[u] = 0;
89 for (auto [c, d] : paths[u]) {
90 auto it = bag[c].find(d);
91 if (it != bag[c].end()) bag[c].erase(it);
92 }
93 }
94
95 int query(int u) const {
96 const int INF = 1e9;
97 int ans = INF;
98 for (auto [c, d] : paths[u]) {
99 if (!bag[c].empty()) {
100 ans = min(ans, *bag[c].begin() + d);
101 }
102 }
103 return ans;
104 }
105};
106
107int main(){
108 ios::sync_with_stdio(false);
109 cin.tie(nullptr);
110
111 int n, q; if(!(cin >> n >> q)) return 0;
112 CD_Toggle cd(n);
113 for (int i = 0; i < n-1; ++i) {
114 int u, v; cin >> u >> v; cd.add_edge(u, v);
115 }
116 cd.build(1);
117
118 // Example interactive protocol:
119 // type 1 v: toggle on; type 2 v: toggle off; type 3 v: query
120 while (q--) {
121 int t, v; cin >> t >> v;
122 if (t == 1) cd.toggle_on(v);
123 else if (t == 2) cd.toggle_off(v);
124 else if (t == 3) {
125 int ans = cd.query(v);
126 if (ans >= (int)1e9) ans = -1; // no active nodes
127 cout << ans << '\n';
128 }
129 }
130 return 0;
131}
132

This version uses a multiset at each centroid to support both insertions and deletions. Each active node contributes one distance per centroid ancestor to the corresponding multiset. The nearest distance per centroid is the smallest element in its multiset. Querying a node u aggregates these minima with dist(u, c) across its centroid ancestors.

Time: Build O(n log n); toggle_on/off O(log^2 n); query O(log n)Space: O(n log n) for paths plus up to O(n log n) elements across multisets
Weighted tree variant (64-bit distances) with only activations
1#include <bits/stdc++.h>
2using namespace std;
3
4/*
5 Weighted Tree Centroid Decomposition for nearest active node queries.
6 Distances are long long. Supports only activations; deletions would require multisets.
7*/
8
9struct WCD {
10 struct Edge { int to; long long w; };
11 int n;
12 vector<vector<Edge>> adj;
13 vector<int> subsz, cpar, removed, level;
14 vector<vector<pair<int,long long>>> paths; // (centroid, dist)
15 vector<long long> best;
16 const long long INF = (long long)4e18;
17
18 WCD(int n = 0) { init(n); }
19
20 void init(int n_) {
21 n = n_;
22 adj.assign(n+1, {});
23 subsz.assign(n+1, 0);
24 cpar.assign(n+1, -1);
25 removed.assign(n+1, 0);
26 level.assign(n+1, 0);
27 paths.assign(n+1, {});
28 best.assign(n+1, INF);
29 }
30
31 void add_edge(int u, int v, long long w) {
32 adj[u].push_back({v, w});
33 adj[v].push_back({u, w});
34 }
35
36 void dfs_sz(int u, int p) {
37 subsz[u] = 1;
38 for (auto e : adj[u]) if (e.to != p && !removed[e.to]) {
39 dfs_sz(e.to, u);
40 subsz[u] += subsz[e.to];
41 }
42 }
43
44 int find_centroid(int u, int p, int comp) {
45 for (auto e : adj[u]) if (e.to != p && !removed[e.to]) if (subsz[e.to] > comp/2) return find_centroid(e.to, u, comp);
46 return u;
47 }
48
49 void collect(int u, int p, long long dist, int c) {
50 paths[u].push_back({c, dist});
51 for (auto e : adj[u]) if (e.to != p && !removed[e.to]) {
52 collect(e.to, u, dist + e.w, c);
53 }
54 }
55
56 void decompose(int u, int p, int d) {
57 dfs_sz(u, -1);
58 int c = find_centroid(u, -1, subsz[u]);
59 removed[c] = 1;
60 cpar[c] = p; level[c] = d;
61 paths[c].push_back({c, 0});
62 for (auto e : adj[c]) if (!removed[e.to]) collect(e.to, c, e.w, c);
63 for (auto e : adj[c]) if (!removed[e.to]) decompose(e.to, c, d+1);
64 removed[c] = 0;
65 }
66
67 void build(int root = 1) { decompose(root, -1, 0); }
68
69 void activate(int u) {
70 for (auto [c, d] : paths[u]) best[c] = min(best[c], d);
71 }
72
73 long long query(int u) const {
74 long long ans = INF;
75 for (auto [c, d] : paths[u]) if (best[c] < INF) ans = min(ans, best[c] + d);
76 return ans;
77 }
78};
79
80int main(){
81 ios::sync_with_stdio(false);
82 cin.tie(nullptr);
83
84 int n, q; if(!(cin >> n >> q)) return 0;
85 WCD cd(n);
86 for (int i = 0; i < n-1; ++i) {
87 int u, v; long long w; cin >> u >> v >> w;
88 cd.add_edge(u, v, w);
89 }
90 cd.build(1);
91
92 // Initially activate 1
93 cd.activate(1);
94
95 // type 1 v: activate v; type 2 v: query from v
96 while (q--) {
97 int t, v; cin >> t >> v;
98 if (t == 1) cd.activate(v);
99 else if (t == 2) {
100 long long ans = cd.query(v);
101 if (ans >= (long long)4e18) cout << -1 << '\n';
102 else cout << ans << '\n';
103 }
104 }
105 return 0;
106}
107

The same centroid-decomposition idea works on weighted trees. We accumulate long long distances in the collect DFS and maintain a long long best per centroid. The asymptotic bounds are identical to the unweighted version.

Time: Build O(n log n); activate O(log n); query O(log n)Space: O(n log n)