Tree Distances and Diameter
Key Points
- •Tree diameter is the longest simple path in a tree and can be found with two BFS/DFS runs.
- •To find the diameter: pick any node A, find the farthest node B from A, then find the farthest node C from B; the path B–C is the diameter.
- •Tree centers are the middle node(s) on a diameter path: one center if the diameter length is even, two centers if it is odd.
- •The radius of a tree equals the ceiling of half the diameter length.
- •All-pairs distances on a rooted tree can be answered quickly using Lowest Common Ancestor (LCA) with binary lifting.
- •The distance formula using LCA is dist(u,v) = depth[u] + depth[v] − 2 × depth[lca(u,v)].
- •Eccentricity of any node u in a tree equals max(dist(u, a), dist(u, b)) where a and b are diameter endpoints.
- •For competitive programming, preprocess LCA in O(n log n) and answer each distance query in O(log n) time.
Prerequisites
- →Graph representation (adjacency lists) — To store and traverse the tree efficiently.
- →BFS and DFS fundamentals — To perform linear-time traversals and compute distances/parents.
- →Tree properties — Understanding unique simple paths and acyclicity is essential for correctness proofs.
- →Big-O notation and logarithms — To analyze time/space complexity, especially for binary lifting (logarithmic factors).
- →Bit operations and powers of two — Binary lifting relies on 2^k jumps and bitwise checks when lifting.
- →Queue and stack usage — BFS requires a queue; iterative DFS may need a stack.
Detailed Explanation
Tap terms for definitions01Overview
Trees are graphs without cycles, and because they are minimally connected, distances in trees behave very cleanly: there is exactly one simple path between any two nodes. This makes two fundamental tasks especially elegant: finding the diameter (the longest simple path) and answering distance queries between arbitrary nodes. The classic and surprisingly simple trick to get the diameter is to run a breadth-first search (BFS) or depth-first search (DFS) from any node to find a farthest node B, then run BFS/DFS again from B to find a farthest node C; the path B–C is the diameter, and its length is the tree’s longest distance. From the diameter, we can read off the tree’s center(s): the middle node(s) on this longest path. Centers minimize the maximum distance to all other nodes (they are the best places to stand if you want to be as close as possible to everyone else). For many queries, especially in competitive programming, rooting the tree and using the Lowest Common Ancestor (LCA) allows fast distance computations using the formula dist(u, v) = depth[u] + depth[v] − 2 × depth[lca(u, v)]. With binary lifting, we preprocess ancestor jumps in O(n log n) time and answer each LCA (and thus distance) query in O(log n). These tools together solve a wide range of problems about paths, radii, centers, eccentricities, and pairwise distances in trees.
02Intuition & Analogies
Imagine a long network of roads without loops—more like tree branches than city blocks. If you start walking from any intersection A as far as you can go, you’ll end up at one of the two ends of the longest road-like stretch in the network. Call that end B. If you now start at B and walk as far as possible, you’ll reach the other end C of that longest stretch. That’s the diameter: the longest straight-shot path you can take without backtracking. The center(s) are the ideal meetup points along this longest stretch: stand in the middle so that the longest person’s walk to you is as short as it can be. If the longest stretch has an odd number of edges, there are two middle intersections; otherwise, there’s a single perfect center. For distances between any two intersections u and v, think of the unique path that goes up from u to the point where their paths meet (their Lowest Common Ancestor), then down to v. Your total steps are just the steps up plus the steps down. LCA is basically the point in the tree where the paths from u and v converge when you move towards the root, like the first common ancestor in a family tree. Binary lifting is like keeping an escalator for each node that can take you up by 1 step, 2 steps, 4 steps, 8 steps, and so on, so you can jump upwards quickly by combining powers of two. With these tools, you can answer distance queries rapidly and reason about the tree’s shape—long spines, balanced subtrees, and the best places to position facilities (centers) to minimize the worst-case travel.
03Formal Definition
04When to Use
Use the two-sweep BFS/DFS technique whenever you need the tree’s diameter or its endpoints; it is optimal and simple for both unweighted and weighted trees (use BFS/DFS for unweighted, and a DFS/Dijkstra accumulation for weighted). If a problem asks for the best placement that minimizes the maximum distance—like where to put a facility on a tree-shaped network—compute the center(s) from the diameter path and use the radius. When you need many distance queries between arbitrary pairs on a fixed tree, preprocess LCA with binary lifting: after O(n log n) preprocessing, each query is O(log n), which is excellent for up to 2e5 queries typical in competitive programming. If a problem asks for every node’s eccentricity (its farthest distance), use the property that for a tree with diameter endpoints a and b, e(u) = max(dist(u, a), dist(u, b)); two BFS runs from a and b let you compute all eccentricities in O(n). Choose BFS over DFS if you want iterative code and to avoid recursion depth issues; both give O(n) on trees. For weighted trees with positive weights, replace BFS with two Dijkstra runs or two DFS passes that accumulate weighted distances. LCA-based distances remain valid regardless of weights, as long as depth means root-to-node path length using the same weight metric.
⚠️Common Mistakes
Common pitfalls include: (1) Mixing node count and edge count distances. In trees, distance usually counts edges, so a path with L edges has L + 1 nodes; off-by-one errors here can misidentify centers. (2) Forgetting to reconstruct the actual diameter path when computing centers; you must keep parent pointers from the second BFS/DFS and walk back from the farthest node to get the path. (3) Using BFS on weighted trees; BFS is correct only for unweighted (or equally weighted) edges. For weighted trees, use Dijkstra or weighted DFS accumulation. (4) Building an LCA table with the wrong LOG size or not initializing parent[0][root] correctly; ensure LOG ≥ \lceil \log_2 n \rceil + 1 and that missing ancestors are set to the root or -1 consistently. (5) Confusing depth with distance: if your root is not a true root (any node is fine), depth is measured from that root, but the LCA distance formula still holds; just ensure depths are computed from the same root. (6) Recursion stack overflows with deep trees when using recursive DFS; prefer BFS or iterative DFS for n up to 2e5 on platforms with limited stack. (7) Not clearing or resizing adjacency lists and arrays between test cases, leading to stale data. (8) Assuming the diameter endpoints are unique; multiple pairs can realize the same diameter. The centers, however, are still the middle node(s) of any diameter path. Carefully handle the even/odd diameter length case when outputting one vs two centers.
Key Formulas
Tree Diameter
Explanation: The diameter D is the largest distance among all pairs of nodes in the tree. It captures the length of the longest simple path.
Tree Radius
Explanation: The radius is the smallest possible eccentricity over all nodes. In any tree, it equals the ceiling of half the diameter length.
Eccentricity
Explanation: The eccentricity of node u is its farthest distance to any other node. Minimizing this over u gives the tree’s center(s).
Eccentricity via Diameter Endpoints
Explanation: If a and b are diameter endpoints, the farthest node from u must be one of them. So computing two BFS arrays from a and b yields all eccentricities.
Distance via LCA
Explanation: The path from u to v goes up from u to the LCA, then down to v. The depth overlap is subtracted twice to avoid double counting.
BFS on a Tree
Explanation: On a tree with n nodes and n−1 edges, BFS runs in linear time because it visits each node and edge once.
Binary Lifting Complexity
Explanation: Precomputing 2^k ancestors for each node takes O(n log n). Each LCA query lifts nodes upward in O(log n) steps.
Centers from Diameter Path
Explanation: If the diameter path has L edges, there are L+1 nodes on it. The middle is one node when L is even and two adjacent nodes when L is odd.
Binary Lifting Transition
Explanation: The 2^k-th ancestor of v is the 2^{k−1}-th ancestor of the 2^{k−1}-th ancestor of v, enabling fast upward jumps.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // BFS on an unweighted tree from a start node. 5 // Fills dist and parent arrays and returns the farthest node from start. 6 int bfs_far(const vector<vector<int>>& g, int start, vector<int>& dist, vector<int>& parent) { 7 int n = (int)g.size() - 1; // assuming 1-indexed nodes 8 dist.assign(n + 1, -1); 9 parent.assign(n + 1, -1); 10 queue<int> q; 11 dist[start] = 0; 12 q.push(start); 13 int far = start; 14 while (!q.empty()) { 15 int u = q.front(); q.pop(); 16 if (dist[u] > dist[far]) far = u; 17 for (int v : g[u]) { 18 if (dist[v] == -1) { 19 dist[v] = dist[u] + 1; // unweighted: each edge increases distance by 1 20 parent[v] = u; 21 q.push(v); 22 } 23 } 24 } 25 return far; 26 } 27 28 int main() { 29 ios::sync_with_stdio(false); 30 cin.tie(nullptr); 31 32 int n; // number of nodes 33 if (!(cin >> n)) return 0; 34 vector<vector<int>> g(n + 1); 35 for (int i = 0; i < n - 1; ++i) { 36 int u, v; cin >> u >> v; 37 g[u].push_back(v); 38 g[v].push_back(u); 39 } 40 41 vector<int> dist, parent; 42 // 1) From an arbitrary node (1), find the farthest node a 43 int a = bfs_far(g, 1, dist, parent); 44 // 2) From a, find the farthest node b and distances 45 int b = bfs_far(g, a, dist, parent); 46 int diameter_length = dist[b]; 47 48 cout << "Diameter length (in edges): " << diameter_length << "\n"; 49 cout << "One pair of diameter endpoints: " << a << " " << b << "\n"; 50 51 return 0; 52 } 53
This program reads an unweighted tree, runs BFS from node 1 to find a farthest node a, then runs BFS from a to find a farthest node b. The distance from a to b is the diameter length, and (a, b) is one pair of endpoints of a longest path.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 int bfs_far(const vector<vector<int>>& g, int start, vector<int>& dist, vector<int>& parent) { 5 int n = (int)g.size() - 1; 6 dist.assign(n + 1, -1); 7 parent.assign(n + 1, -1); 8 queue<int> q; 9 dist[start] = 0; 10 q.push(start); 11 int far = start; 12 while (!q.empty()) { 13 int u = q.front(); q.pop(); 14 if (dist[u] > dist[far]) far = u; 15 for (int v : g[u]) if (dist[v] == -1) { 16 dist[v] = dist[u] + 1; 17 parent[v] = u; 18 q.push(v); 19 } 20 } 21 return far; 22 } 23 24 int main() { 25 ios::sync_with_stdio(false); 26 cin.tie(nullptr); 27 28 int n; cin >> n; 29 vector<vector<int>> g(n + 1); 30 for (int i = 0; i < n - 1; ++i) { 31 int u, v; cin >> u >> v; 32 g[u].push_back(v); 33 g[v].push_back(u); 34 } 35 36 vector<int> dist, parent; 37 int a = bfs_far(g, 1, dist, parent); 38 int b = bfs_far(g, a, dist, parent); // parent[] now stores the tree of BFS from a 39 40 // Reconstruct diameter path from b back to a using parent[] 41 vector<int> path; 42 for (int cur = b; cur != -1; cur = parent[cur]) path.push_back(cur); 43 // path is b -> ... -> a (reverse of the path from a to b) 44 reverse(path.begin(), path.end()); // now a -> ... -> b 45 46 int L = (int)path.size() - 1; // number of edges on the path 47 int radius = (L + 1) / 2; // ceil(L/2) 48 49 cout << "Diameter length: " << L << "\n"; 50 cout << "Radius: " << radius << "\n"; 51 52 if (L % 2 == 0) { 53 // single center at the middle index 54 int center = path[L / 2]; 55 cout << "Number of centers: 1\n"; 56 cout << "Center: " << center << "\n"; 57 } else { 58 // two centers at the two middle indices 59 int c1 = path[L / 2]; 60 int c2 = path[L / 2 + 1]; 61 cout << "Number of centers: 2\n"; 62 cout << "Centers: " << c1 << " " << c2 << "\n"; 63 } 64 65 return 0; 66 } 67
After finding diameter endpoints (a, b) with two BFS runs, we reconstruct the path between them using the parent array from the second BFS. The center(s) are simply the middle node(s) of this path; the radius is ceil(diameter/2).
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct LCA { 5 int n, LOG; 6 vector<vector<int>> g; 7 vector<int> depth; 8 vector<vector<int>> up; // up[k][v] = 2^k-th ancestor of v (or -1) 9 10 LCA(int n) : n(n) { 11 g.assign(n + 1, {}); 12 } 13 14 void add_edge(int u, int v) { 15 g[u].push_back(v); 16 g[v].push_back(u); 17 } 18 19 void build(int root = 1) { 20 // compute LOG = ceil(log2(n)) + 1 21 LOG = 1; 22 while ((1 << LOG) <= n) ++LOG; 23 depth.assign(n + 1, -1); 24 up.assign(LOG, vector<int>(n + 1, -1)); 25 26 // BFS to set depth and immediate parent up[0] 27 queue<int> q; 28 depth[root] = 0; 29 up[0][root] = -1; 30 q.push(root); 31 while (!q.empty()) { 32 int u = q.front(); q.pop(); 33 for (int v : g[u]) if (v != up[0][u]) { 34 if (depth[v] == -1) { 35 depth[v] = depth[u] + 1; 36 up[0][v] = u; 37 q.push(v); 38 } 39 } 40 } 41 // Binary lifting DP 42 for (int k = 1; k < LOG; ++k) { 43 for (int v = 1; v <= n; ++v) { 44 int mid = up[k - 1][v]; 45 up[k][v] = (mid == -1 ? -1 : up[k - 1][mid]); 46 } 47 } 48 } 49 50 int lift(int v, int steps) const { 51 for (int k = 0; k < LOG && v != -1; ++k) { 52 if (steps & (1 << k)) v = up[k][v]; 53 } 54 return v; 55 } 56 57 int lca(int a, int b) const { 58 if (depth[a] < depth[b]) swap(a, b); 59 // Lift a to the depth of b 60 a = lift(a, depth[a] - depth[b]); 61 if (a == b) return a; 62 // Lift both up together 63 for (int k = LOG - 1; k >= 0; --k) { 64 if (up[k][a] != up[k][b]) { 65 a = up[k][a]; 66 b = up[k][b]; 67 } 68 } 69 // Now their parents are the LCA 70 return up[0][a]; 71 } 72 73 int dist(int a, int b) const { 74 int w = lca(a, b); 75 return depth[a] + depth[b] - 2 * depth[w]; 76 } 77 }; 78 79 int main() { 80 ios::sync_with_stdio(false); 81 cin.tie(nullptr); 82 83 int n; cin >> n; 84 LCA solver(n); 85 for (int i = 0; i < n - 1; ++i) { 86 int u, v; cin >> u >> v; 87 solver.add_edge(u, v); 88 } 89 solver.build(1); // root at 1 (any node works) 90 91 int q; cin >> q; 92 while (q--) { 93 int u, v; cin >> u >> v; 94 cout << solver.dist(u, v) << "\n"; 95 } 96 return 0; 97 } 98
We root the tree and build a binary lifting table up[k][v] to jump 2^k ancestors at a time. LCA queries are answered by lifting the deeper node to the same depth, then lifting both until their parents match. Distances follow from the LCA depth formula.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 int bfs_far(const vector<vector<int>>& g, int start, vector<int>& dist) { 5 int n = (int)g.size() - 1; 6 dist.assign(n + 1, -1); 7 queue<int> q; 8 dist[start] = 0; 9 q.push(start); 10 int far = start; 11 while (!q.empty()) { 12 int u = q.front(); q.pop(); 13 if (dist[u] > dist[far]) far = u; 14 for (int v : g[u]) if (dist[v] == -1) { 15 dist[v] = dist[u] + 1; 16 q.push(v); 17 } 18 } 19 return far; 20 } 21 22 int main() { 23 ios::sync_with_stdio(false); 24 cin.tie(nullptr); 25 26 int n; cin >> n; 27 vector<vector<int>> g(n + 1); 28 for (int i = 0; i < n - 1; ++i) { 29 int u, v; cin >> u >> v; 30 g[u].push_back(v); 31 g[v].push_back(u); 32 } 33 34 vector<int> distA, distB; 35 // Find diameter endpoints a and b 36 vector<int> tmp; 37 int a = bfs_far(g, 1, tmp); 38 int b = bfs_far(g, a, distA); // distA now holds distances from a 39 bfs_far(g, b, distB); // distB holds distances from b 40 41 // Eccentricity of each node u is max(dist(u,a), dist(u,b)) 42 for (int u = 1; u <= n; ++u) { 43 int ecc = max(distA[u], distB[u]); 44 cout << ecc << (u == n ? '\n' : ' '); 45 } 46 47 return 0; 48 } 49
After locating diameter endpoints a and b, we perform BFS from a and from b. For each node u, the maximum of these two distances equals its eccentricity, allowing us to output all eccentricities in linear time.