Centroid Decomposition - Distance Queries
Key Points
- •Centroid decomposition splits a tree into levels by repeatedly removing a centroid so that each remaining component is at most half the size.
- •At each centroid, we precompute and store the distance from every node in its component to that centroid, creating O(n n) total stored distances.
- •Dynamic queries like 'nearest marked node' are answered by checking O( n) centroid ancestors and combining their precomputed distances.
- •Updates (mark/unmark a node) propagate along the same centroid-ancestor chain, also in O( n).
- •The key formula is ans(u) = \{[c] + (u,c)\}, where best[c] tracks the nearest marked node to centroid c.
- •The centroid tree has height O( n), giving time O( n) per query/update and space O(n n).
- •Using multisets at each centroid lets you support deletions (toggle on/off) at the cost of an extra factor.
- •This technique is powerful for tree distance problems where updates and queries interleave online.
Prerequisites
- →Trees and basic DFS/BFS — Centroid decomposition relies on understanding tree traversal and connected components.
- →Recursion and divide-and-conquer — The decomposition recursively splits the tree and builds a hierarchy; recursion depth analysis is crucial.
- →Big-O analysis — To reason about O(n log n) preprocessing and O(log n) per query/update, you must be comfortable with asymptotics.
- →Sets and multisets (balanced BSTs) — To support deletions, we need per-centroid multisets and their O(log n) operations.
- →Graph representations — Adjacency lists and edge iteration are used extensively to store the tree.
- →Shortest path on trees — Understanding that the path between any two nodes is unique simplifies distance accumulation.
- →64-bit integer arithmetic — Weighted trees may require long long to avoid overflow.
- →Binary search trees and priority queues (optional) — Alternative per-centroid structures can be built on ordered containers.
Detailed Explanation
Tap terms for definitions01Overview
Centroid decomposition is a divide-and-conquer technique for trees that turns global distance problems into a sequence of small local checks. A centroid of a tree is a node whose removal splits the tree into components, each with at most half the original nodes. By recursively removing centroids, we build a hierarchy (the centroid tree) whose height is O(log n). The core trick for distance queries is to, for every centroid, precompute and store the distance from that centroid to every node in the component it governs at its recursion step. As a result, each original tree node stores a short list of distances to O(log n) centroids—precisely the centroids on the path from that node up to the root of the centroid tree. For dynamic problems like 'activate a node' and 'what is the distance from node u to the nearest active node?', we maintain, for each centroid c, a summary value best[c] equal to the smallest distance from c to any active node. To answer a query at u, we combine each centroid-distance pair (c, dist(u,c)) for u with best[c] and take the minimum. Updates (activations) simply relax best[c] along the same centroid list. This achieves O(log n) per operation after an O(n log n) preprocessing, and O(n log n) space. The method is widely used in competitive programming for online tree distance tasks.
02Intuition & Analogies
Imagine a city whose roads form a tree: neighborhoods are intersections and roads are edges. We want to repeatedly place emergency stations (activations) and, for any neighborhood, instantly know the distance to the nearest station (query). Checking every station each time would be too slow. Instead, we designate special hubs (centroids) such that each neighborhood has only a few hubs on its 'administrative ladder'. Every neighborhood knows how far it is from each hub above it. Each hub also keeps a note of 'how close is the nearest station to me?' Now, when someone asks from neighborhood u, we tell them: 'Consider all the hubs that administratively cover you. For each hub c, if the nearest station is d1 away from c, and you are d2 away from c, then one possible route to a station is d1 + d2. Take the best of those.' Because your ladder of hubs is short (O(log n)), checking them is fast. Why does each neighborhood have only a few hubs? Because hubs are defined by cutting the city at a centroid each time, ensuring the remaining districts are at most half the size. So your administrative ladder rises at most log2(n) levels. The beauty is that hubs summarize information about all stations efficiently. Updating when a new station appears is also easy: we just climb the neighborhood’s hub ladder and update each hub’s note with the station’s distance. This turns a global nearest-station problem over the whole city into a compact, hierarchical series of local checks.
03Formal Definition
04When to Use
Use centroid decomposition with distance caching when your problem is on a tree and requires many interleaved operations of two kinds: (1) update some node(s) (e.g., toggle or add/remove a property), and (2) query distances to the nearest node with a property, often in the unweighted or weighted distance metric along tree edges. Classic tasks include nearest colored node, sum/min of distances to a subset, and handling online updates efficiently. It is particularly suitable when the query or update cost must be near O(\log n) and preprocessing can be O(n \log n). Centroid decomposition also helps when distances should be combined with simple associative operations (min, max, sum) that can be summarized per centroid. For example, maintain the minimum distance to a red node, or the count of red nodes within distance \leq D by storing histograms per centroid. Weighted edges are also manageable since distances can be accumulated during DFS. Avoid centroid decomposition when the graph is not a tree, or when heavy path decompositions or LCA-based formulas provide simpler solutions. If the operation involves complex constraints that cannot be summarized at each centroid (e.g., non-local predicates requiring cross-subtree interactions that are not decomposable), other structures might be better.
⚠️Common Mistakes
• Forgetting to add the centroid itself to each node’s path list: every node u must include (c, dist(u,c)) for every centroid ancestor c, including (c = u’s own centroid, possibly u = c) with distance 0 at that level. • Computing subtree sizes incorrectly during decomposition—after removing a centroid, you must recompute sizes only within the current component and ignore already 'removed' nodes; otherwise centroids become incorrect and the decomposition height guarantee breaks. • Traversing into removed nodes while collecting distances from a centroid: the collect DFS must skip removed nodes to avoid mixing components and to prevent revisiting. • Using only a single best value globally rather than per centroid: the algorithm needs one best[c] per centroid, not a single global best. Similarly, failing to re-minimize best[c] during multiple updates leads to wrong answers. • Attempting to support deletions with only best[c] (a single integer). Since best[c] is a min over active nodes, removing a node may increase the min; you need a multiset or priority structure per centroid to delete correctly. Otherwise, queries after deletions return outdated small values. • Overflow with weighted distances: use 64-bit integers (long long) for distances and INF, and guard for empty multisets to avoid reading invalid values. • Recursion depth and stack limits on large trees: use iterative DFS where necessary or increase stack size if the platform requires it (e.g., some judges).
Key Formulas
Centroid Tree Height
Explanation: Each removal of a centroid ensures every remaining component has at most half the nodes. Thus, along any path in the centroid tree, sizes decrease geometrically, yielding logarithmic height.
Storage Size
Explanation: Each node u appears in the path list P(u) once per level of the centroid tree, whose height is O(log n). Therefore the total number of stored centroid-distance pairs is O(n log n).
Per-centroid Summary
Explanation: For each centroid c, store the minimum distance to any active node x. This value is updated on activations and, with a multiset, on deletions.
Query Formula
Explanation: To find the nearest active node to u, combine the stored distance from u to each centroid ancestor c with c’s best value, then take the minimum over all such centroids.
Preprocessing Time
Explanation: At each of O(log n) levels, we traverse disjoint components whose total size sums to n, and we run distance-collection DFS per centroid, resulting in O(n log n) overall.
Query/Update Time
Explanation: Both operations traverse the O(log n) centroid ancestors of a node and do O(1) work per ancestor when using simple min summaries.
Toggle with Multisets
Explanation: If deletions must be supported, each per-centroid multiset operation takes O(log n); combined across O(log n) ancestors yields O(lo n) per update.
Geometric Decrease
Explanation: Across levels, the sizes of components visited along a path decrease geometrically, explaining linear work per level and the O(n log n) total.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 /* 5 Centroid Decomposition for dynamic distance queries on a tree. 6 Operations: 7 - update(v): activate node v (make it red) 8 - query(u): distance from u to the nearest active node 9 Supports only activations; no deactivations. Uses per-centroid best[] = min distance to any active node. 10 11 Complexity: 12 Build: O(n log n) 13 update/query: O(log n) 14 */ 15 16 struct CentroidDecomposition { 17 int n; 18 vector<vector<int>> adj; // original tree adjacency 19 vector<int> subsz; // subtree sizes for centroid finding 20 vector<int> cpar; // parent in centroid tree (-1 for root) 21 vector<int> level; // level in centroid tree (depth) 22 vector<int> removed; // mark centroid as removed 23 24 // For each node u: list of pairs (centroid c, distance dist(u,c)) 25 vector<vector<pair<int,int>>> paths; 26 27 // Per-centroid best distance to any active node 28 const int INF = 1e9; 29 vector<int> best; 30 31 CentroidDecomposition(int n = 0) { init(n); } 32 33 void init(int n_) { 34 n = n_; 35 adj.assign(n+1, {}); 36 subsz.assign(n+1, 0); 37 cpar.assign(n+1, -1); 38 level.assign(n+1, 0); 39 removed.assign(n+1, 0); 40 paths.assign(n+1, {}); 41 best.assign(n+1, INF); // indexed by centroid id 42 } 43 44 void add_edge(int u, int v) { 45 adj[u].push_back(v); 46 adj[v].push_back(u); 47 } 48 49 // 1) Compute subtree sizes in the current component 50 void dfs_sz(int u, int p) { 51 subsz[u] = 1; 52 for (int v : adj[u]) if (v != p && !removed[v]) { 53 dfs_sz(v, u); 54 subsz[u] += subsz[v]; 55 } 56 } 57 58 // 2) Find centroid of current component (rooted at u with parent p), comp_size is total size 59 int find_centroid(int u, int p, int comp_size) { 60 for (int v : adj[u]) if (v != p && !removed[v]) { 61 if (subsz[v] > comp_size / 2) return find_centroid(v, u, comp_size); 62 } 63 return u; // u is centroid 64 } 65 66 // 3) Collect distances from centroid c into its component (skip removed nodes) 67 void collect(int u, int p, int dist, int c) { 68 paths[u].push_back({c, dist}); 69 for (int v : adj[u]) if (v != p && !removed[v]) { 70 collect(v, u, dist + 1, c); 71 } 72 } 73 74 // 4) Decompose component rooted at u, with parent p in centroid tree, at depth d 75 void decompose(int u, int p, int d) { 76 dfs_sz(u, -1); 77 int c = find_centroid(u, -1, subsz[u]); 78 removed[c] = 1; 79 cpar[c] = p; // parent in centroid tree (=-1 for root) 80 level[c] = d; 81 82 // Distance from centroid to itself 83 paths[c].push_back({c, 0}); 84 // Collect distances to all nodes in this component 85 for (int v : adj[c]) if (!removed[v]) { 86 collect(v, c, 1, c); 87 } 88 89 // Recurse on remaining components 90 for (int v : adj[c]) if (!removed[v]) { 91 decompose(v, c, d + 1); 92 } 93 removed[c] = 0; // mark back as usable for future computations 94 } 95 96 // Build entry point: call with any node as root of current component 97 void build(int root = 1) { 98 // During build, we temporarily use removed[] to cut components, then restore. 99 // So we keep a separate stack of 'cut' centroids. Simpler approach: use a helper that 100 // keeps track of components without permanently setting removed. Here we set and unset around recursion. 101 // Implementation above already resets removed[c] at exit, so we start with all zeros. 102 decompose(root, -1, 0); 103 } 104 105 // Activate node u 106 void update(int u) { 107 for (auto [c, d] : paths[u]) { 108 best[c] = min(best[c], d); 109 } 110 } 111 112 // Query nearest active node to u 113 int query(int u) const { 114 int ans = INF; 115 for (auto [c, d] : paths[u]) { 116 if (best[c] < INF) ans = min(ans, best[c] + d); 117 } 118 return ans; 119 } 120 }; 121 122 int main() { 123 ios::sync_with_stdio(false); 124 cin.tie(nullptr); 125 126 int n, q; 127 if (!(cin >> n >> q)) return 0; 128 CentroidDecomposition cd(n); 129 for (int i = 0; i < n-1; ++i) { 130 int u, v; cin >> u >> v; 131 cd.add_edge(u, v); 132 } 133 cd.build(1); 134 135 // Example protocol (like CF 342E): initially node 1 is active 136 cd.update(1); 137 138 while (q--) { 139 int type, v; cin >> type >> v; 140 if (type == 1) { 141 cd.update(v); // activate v 142 } else if (type == 2) { 143 int ans = cd.query(v); 144 cout << ans << '\n'; 145 } 146 } 147 return 0; 148 } 149
This program builds a centroid decomposition, stores for each node u its O(log n) pairs (centroid, distance), and maintains a best distance per centroid to any active node. Updates relax best values along the centroid path of the updated node, and queries combine best[c] with dist(u,c) to get the minimum. It supports only activations (no deletions).
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 /* 5 Centroid Decomposition with full toggles: 6 - toggle_on(v): activate node v 7 - toggle_off(v): deactivate node v 8 - query(u): distance from u to nearest active node 9 10 Uses per-centroid multisets to support deletions. The minimum element of each multiset 11 is the current best distance from that centroid to any active node. 12 13 Complexity: 14 Build: O(n log n) 15 toggle_on/off: O(log^2 n) (O(log n) centroids * O(log n) multiset ops) 16 query: O(log n) 17 */ 18 19 struct CD_Toggle { 20 int n; 21 vector<vector<int>> adj; 22 vector<int> subsz, cpar, level, removed; 23 vector<vector<pair<int,int>>> paths; // (centroid, dist) 24 vector<multiset<int>> bag; // per-centroid multiset of distances 25 vector<int> active; // 0/1 active flag per node for safety 26 27 CD_Toggle(int n = 0) { init(n); } 28 29 void init(int n_) { 30 n = n_; 31 adj.assign(n+1, {}); 32 subsz.assign(n+1, 0); 33 cpar.assign(n+1, -1); 34 level.assign(n+1, 0); 35 removed.assign(n+1, 0); 36 paths.assign(n+1, {}); 37 bag.assign(n+1, {}); 38 active.assign(n+1, 0); 39 } 40 41 void add_edge(int u, int v) { 42 adj[u].push_back(v); 43 adj[v].push_back(u); 44 } 45 46 void dfs_sz(int u, int p) { 47 subsz[u] = 1; 48 for (int v : adj[u]) if (v != p && !removed[v]) { 49 dfs_sz(v, u); 50 subsz[u] += subsz[v]; 51 } 52 } 53 54 int find_centroid(int u, int p, int comp) { 55 for (int v : adj[u]) if (v != p && !removed[v]) if (subsz[v] > comp/2) return find_centroid(v, u, comp); 56 return u; 57 } 58 59 void collect(int u, int p, int dist, int c) { 60 paths[u].push_back({c, dist}); 61 for (int v : adj[u]) if (v != p && !removed[v]) { 62 collect(v, u, dist + 1, c); 63 } 64 } 65 66 void decompose(int u, int p, int d) { 67 dfs_sz(u, -1); 68 int c = find_centroid(u, -1, subsz[u]); 69 removed[c] = 1; 70 cpar[c] = p; 71 level[c] = d; 72 paths[c].push_back({c, 0}); 73 for (int v : adj[c]) if (!removed[v]) collect(v, c, 1, c); 74 for (int v : adj[c]) if (!removed[v]) decompose(v, c, d+1); 75 removed[c] = 0; 76 } 77 78 void build(int root = 1) { decompose(root, -1, 0); } 79 80 void toggle_on(int u) { 81 if (active[u]) return; 82 active[u] = 1; 83 for (auto [c, d] : paths[u]) bag[c].insert(d); 84 } 85 86 void toggle_off(int u) { 87 if (!active[u]) return; 88 active[u] = 0; 89 for (auto [c, d] : paths[u]) { 90 auto it = bag[c].find(d); 91 if (it != bag[c].end()) bag[c].erase(it); 92 } 93 } 94 95 int query(int u) const { 96 const int INF = 1e9; 97 int ans = INF; 98 for (auto [c, d] : paths[u]) { 99 if (!bag[c].empty()) { 100 ans = min(ans, *bag[c].begin() + d); 101 } 102 } 103 return ans; 104 } 105 }; 106 107 int main(){ 108 ios::sync_with_stdio(false); 109 cin.tie(nullptr); 110 111 int n, q; if(!(cin >> n >> q)) return 0; 112 CD_Toggle cd(n); 113 for (int i = 0; i < n-1; ++i) { 114 int u, v; cin >> u >> v; cd.add_edge(u, v); 115 } 116 cd.build(1); 117 118 // Example interactive protocol: 119 // type 1 v: toggle on; type 2 v: toggle off; type 3 v: query 120 while (q--) { 121 int t, v; cin >> t >> v; 122 if (t == 1) cd.toggle_on(v); 123 else if (t == 2) cd.toggle_off(v); 124 else if (t == 3) { 125 int ans = cd.query(v); 126 if (ans >= (int)1e9) ans = -1; // no active nodes 127 cout << ans << '\n'; 128 } 129 } 130 return 0; 131 } 132
This version uses a multiset at each centroid to support both insertions and deletions. Each active node contributes one distance per centroid ancestor to the corresponding multiset. The nearest distance per centroid is the smallest element in its multiset. Querying a node u aggregates these minima with dist(u, c) across its centroid ancestors.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 /* 5 Weighted Tree Centroid Decomposition for nearest active node queries. 6 Distances are long long. Supports only activations; deletions would require multisets. 7 */ 8 9 struct WCD { 10 struct Edge { int to; long long w; }; 11 int n; 12 vector<vector<Edge>> adj; 13 vector<int> subsz, cpar, removed, level; 14 vector<vector<pair<int,long long>>> paths; // (centroid, dist) 15 vector<long long> best; 16 const long long INF = (long long)4e18; 17 18 WCD(int n = 0) { init(n); } 19 20 void init(int n_) { 21 n = n_; 22 adj.assign(n+1, {}); 23 subsz.assign(n+1, 0); 24 cpar.assign(n+1, -1); 25 removed.assign(n+1, 0); 26 level.assign(n+1, 0); 27 paths.assign(n+1, {}); 28 best.assign(n+1, INF); 29 } 30 31 void add_edge(int u, int v, long long w) { 32 adj[u].push_back({v, w}); 33 adj[v].push_back({u, w}); 34 } 35 36 void dfs_sz(int u, int p) { 37 subsz[u] = 1; 38 for (auto e : adj[u]) if (e.to != p && !removed[e.to]) { 39 dfs_sz(e.to, u); 40 subsz[u] += subsz[e.to]; 41 } 42 } 43 44 int find_centroid(int u, int p, int comp) { 45 for (auto e : adj[u]) if (e.to != p && !removed[e.to]) if (subsz[e.to] > comp/2) return find_centroid(e.to, u, comp); 46 return u; 47 } 48 49 void collect(int u, int p, long long dist, int c) { 50 paths[u].push_back({c, dist}); 51 for (auto e : adj[u]) if (e.to != p && !removed[e.to]) { 52 collect(e.to, u, dist + e.w, c); 53 } 54 } 55 56 void decompose(int u, int p, int d) { 57 dfs_sz(u, -1); 58 int c = find_centroid(u, -1, subsz[u]); 59 removed[c] = 1; 60 cpar[c] = p; level[c] = d; 61 paths[c].push_back({c, 0}); 62 for (auto e : adj[c]) if (!removed[e.to]) collect(e.to, c, e.w, c); 63 for (auto e : adj[c]) if (!removed[e.to]) decompose(e.to, c, d+1); 64 removed[c] = 0; 65 } 66 67 void build(int root = 1) { decompose(root, -1, 0); } 68 69 void activate(int u) { 70 for (auto [c, d] : paths[u]) best[c] = min(best[c], d); 71 } 72 73 long long query(int u) const { 74 long long ans = INF; 75 for (auto [c, d] : paths[u]) if (best[c] < INF) ans = min(ans, best[c] + d); 76 return ans; 77 } 78 }; 79 80 int main(){ 81 ios::sync_with_stdio(false); 82 cin.tie(nullptr); 83 84 int n, q; if(!(cin >> n >> q)) return 0; 85 WCD cd(n); 86 for (int i = 0; i < n-1; ++i) { 87 int u, v; long long w; cin >> u >> v >> w; 88 cd.add_edge(u, v, w); 89 } 90 cd.build(1); 91 92 // Initially activate 1 93 cd.activate(1); 94 95 // type 1 v: activate v; type 2 v: query from v 96 while (q--) { 97 int t, v; cin >> t >> v; 98 if (t == 1) cd.activate(v); 99 else if (t == 2) { 100 long long ans = cd.query(v); 101 if (ans >= (long long)4e18) cout << -1 << '\n'; 102 else cout << ans << '\n'; 103 } 104 } 105 return 0; 106 } 107
The same centroid-decomposition idea works on weighted trees. We accumulate long long distances in the collect DFS and maintain a long long best per centroid. The asymptotic bounds are identical to the unweighted version.