Lowest Common Ancestor (LCA)
Key Points
- •The Lowest Common Ancestor (LCA) of two nodes in a rooted tree is the deepest node that is an ancestor of both.
- •Binary lifting precomputes 2^k-th ancestors for each node, enabling O( n) LCA queries after O(n n) preprocessing.
- •Euler tour + RMQ reduces LCA to a range minimum query over the first occurrences in the Euler traversal, supporting O(1) queries with suitable RMQ structures.
- •Use LCA to compute distances, k-th node on a path, and to power path queries like sums or minima on trees.
- •Choose binary lifting for simplicity and moderate memory; choose Euler tour + sparse table for O(1) queries and fast constants.
- •Tarjan’s offline LCA with DSU answers multiple LCA queries in near-linear time when all queries are known in advance.
- •Carefully handle indexing, rooting, depth equalization, and recursion limits to avoid bugs and runtime errors.
- •For n up to 2e5, binary lifting with LOG about 19–20 levels is typical and memory-feasible in C++.
Prerequisites
- →Trees and Graph Traversal (DFS/BFS) — You must understand how to represent trees, traverse them, and compute parents and depths.
- →Big-O Notation — To compare preprocessing and query complexities and choose the right method for constraints.
- →Binary Representation and Bit Operations — Binary lifting relies on decomposing integers into sums of powers of two.
- →Disjoint Set Union (Union-Find) — Required to understand and implement Tarjan’s offline LCA algorithm efficiently.
- →Range Minimum Query (RMQ) and Sparse Table — Euler tour LCA reduces to RMQ over depths; sparse tables give O(1) queries.
- →Recursion and Iteration Trade-offs — Building Euler tours often uses DFS; large inputs may require iterative alternatives.
Detailed Explanation
Tap terms for definitions01Overview
The Lowest Common Ancestor (LCA) problem asks: given a rooted tree and two nodes u and v, which node is the deepest (closest to the leaves) that is an ancestor of both u and v? LCA is a backbone tool in tree algorithms. Once you can find LCAs efficiently, you can quickly compute distances between nodes, find the k-th node along a path, and accelerate a variety of path queries. Two mainstream techniques dominate competitive programming: binary lifting and Euler tour + RMQ. Binary lifting precomputes, for each node, its 2^k-th ancestor for k = 0, 1, ..., allowing us to jump upward in powers of two. With this table, each LCA query can be answered in O(log n) time by equalizing depths and then jumping both nodes up until their ancestors match. Euler tour + RMQ performs a depth-first traversal that records nodes every time they are visited; LCA(u, v) becomes the node with the minimum depth between the first occurrences of u and v in this tour. With a proper RMQ structure, queries become O(1). These methods are well-suited for static trees, and are widely used in problems in the 1500–1900 Codeforces rating range.
02Intuition & Analogies
Imagine a family tree: given two people, their LCA is their closest shared ancestor—maybe a parent, grandparent, or great-grandparent. If two cousins share a grandparent, that grandparent is their LCA. Now, how do we find it quickly? Binary lifting is like taking an elevator that only stops at floors that are powers of two away. Suppose you need to go up 13 floors: you can hop 8 floors, then 4, then 1. If you can precompute where each elevator jump lands you, you can climb very fast. For LCA, we first move the deeper person up until both are at the same level. Then, we try to move both up together, attempting the biggest equal jumps that keep them as different people. The moment before they become the same, their parents match—the LCA. The Euler tour trick is like recording every step of a museum tour through rooms (nodes) and noting how many stairs deep we are. If you write down the depth every time you enter or leave a room, then the first time you wrote person A and person B, somewhere between those two notes is the shallowest depth—the last common room on the path from A up to the meeting point and down to B. Turning "find the shallowest depth between two indices" into a Range Minimum Query gives instant LCA with an appropriate RMQ structure. Tarjan’s offline method is like coordinating a set of prewritten questions during a single tour; using union-find, we can answer each pair’s LCA right when both ends have been visited.
03Formal Definition
04When to Use
Use LCA whenever a problem involves queries on tree paths or relationships between pairs of nodes: computing distances, checking ancestry, answering k-th node on a path, or as a subroutine in path queries (sum, min, max) by decomposing along u → lca(u, v) and v → lca(u, v). Binary lifting is ideal for static trees with up to about 2e5 nodes: preprocess once in O(n log n) and answer each query in O(log n). Euler tour + sparse table gives O(1) queries with slightly higher memory; it’s helpful when you have very many queries and need the fastest per-query time. Tarjan’s offline DSU approach is great when all queries are known beforehand and you want near-linear processing with low constants, especially when building other heavy structures is overkill. If your tree changes dynamically (links/cuts), you need more advanced structures like Heavy-Light Decomposition (HLD) for path queries or Link-Cut Trees; standard LCA methods assume static structure. For weighted trees, LCA remains the same; you simply compute distances or path aggregates using depths with edge weights or prefix sums, combined with the LCA result.
⚠️Common Mistakes
Common pitfalls include: (1) Forgetting to root the tree or accidentally rooting at a leaf that causes deep recursion stack overflows; prefer iterative BFS/DFS or set a custom stack size when n is large. (2) Mixing 0-based and 1-based indices for nodes or LOG; ensure consistent indexing, especially when filling up[v][k]. (3) Not equalizing depths before trying to lift both nodes; this can lead to incorrect answers because binary lifting assumes both nodes are at the same depth before the synchronized jump phase. (4) Miscomputing LOG; ensure LOG is the smallest integer with 2^{LOG} > n to cover all heights. (5) Using Euler tour but forgetting to record the first occurrence of each node; RMQ must be taken over the first occurrences’ range. (6) Building a sparse table over nodes instead of over the Euler tour pair (depth, node); the RMQ must minimize depth, not node index. (7) For Tarjan’s offline method, forgetting to add symmetric query edges or to check if the counterpart has been visited; answers may remain uninitialized. (8) Memory blowups: up table uses O(n log n) integers; for n = 2e5, LOG ≈ 19–20, so plan memory accordingly. (9) Using recursion for DFS with large n on systems with small default stack can cause runtime errors; consider iterative traversals. (10) For weighted distances, forgetting to subtract twice the LCA’s contribution (depth weights) yields wrong path lengths.
Key Formulas
LCA Definition
Explanation: Among all common ancestors of u and v, pick the one with the largest depth value. This captures the closest shared ancestor to the leaves.
Binary Lifting Recurrence
Explanation: The 2^k-th ancestor of v is obtained by jumping twice the 2^{k-1}-th ancestor. This enables logarithmic jumps upwards in the tree.
Depth Equalization by Lifting
Explanation: To move a node up by d steps, decompose d into powers of two and apply jumps accordingly. This runs in O(log d) time.
Ancestor Check via Times
Explanation: During DFS, entry and exit times nest intervals. A node u is an ancestor of v if u’s time interval fully contains v’s interval.
Euler Tour + RMQ
Explanation: Map the problem to finding the minimum depth between the first appearances of u and v in the Euler tour. The node at that minimum is the LCA.
Distance via LCA
Explanation: The path from u to v goes up from u to their LCA and down to v. Subtracting twice the LCA depth removes the overlapping segment.
Binary Lifting Complexities
Explanation: Precomputation and memory both scale with the number of ancestor levels per node. Each query performs O(log n) jumps.
Euler Tour + Sparse Table
Explanation: Building the sparse table over the Euler tour takes O(n log n) time and space; each RMQ (and hence LCA) is answered in O(1).
Tarjan Offline LCA Complexity
Explanation: Using DSU with path compression, processing all queries during a single DFS is near-linear. The inverse Ackermann function (n) is effectively constant.
Levels Needed
Explanation: To cover heights up to n, you need about log2(n) doubling levels. This sets the dimension of the lifting table.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct LCA { 5 int n, LOG, root; 6 vector<vector<int>> up; // up[v][k] = 2^k-th ancestor of v 7 vector<int> depth; 8 vector<vector<int>> adj; 9 10 LCA(int n=0): n(n), LOG(0), root(1) {} 11 12 void init(int _n, int _root=1) { 13 n = _n; root = _root; 14 LOG = 1; while ((1 << LOG) <= n) ++LOG; 15 up.assign(n + 1, vector<int>(LOG, 0)); 16 depth.assign(n + 1, 0); 17 adj.assign(n + 1, {}); 18 } 19 20 void add_edge(int u, int v) { 21 adj[u].push_back(v); 22 adj[v].push_back(u); 23 } 24 25 void preprocess() { 26 // BFS to compute depth and 2^0-th ancestor 27 queue<int> q; 28 vector<int> vis(n + 1, 0); 29 q.push(root); vis[root] = 1; up[root][0] = 0; depth[root] = 0; 30 while (!q.empty()) { 31 int u = q.front(); q.pop(); 32 for (int v : adj[u]) if (!vis[v]) { 33 vis[v] = 1; depth[v] = depth[u] + 1; up[v][0] = u; q.push(v); 34 } 35 } 36 // Build up-table: up[v][k] = up[ up[v][k-1] ][k-1] 37 for (int k = 1; k < LOG; ++k) { 38 for (int v = 1; v <= n; ++v) { 39 int mid = up[v][k-1]; 40 up[v][k] = (mid ? up[mid][k-1] : 0); 41 } 42 } 43 } 44 45 int lift(int v, int k) const { 46 // Move v up by k steps using binary decomposition; returns 0 if beyond root 47 for (int i = 0; i < LOG && v; ++i) if (k & (1 << i)) v = up[v][i]; 48 return v; 49 } 50 51 int lca(int a, int b) const { 52 if (depth[a] < depth[b]) swap(a, b); 53 // 1) Equalize depths 54 a = lift(a, depth[a] - depth[b]); 55 if (a == b) return a; 56 // 2) Lift both up while their ancestors differ 57 for (int k = LOG - 1; k >= 0; --k) { 58 if (up[a][k] != up[b][k]) { 59 a = up[a][k]; 60 b = up[b][k]; 61 } 62 } 63 // Now a and b are distinct children of LCA 64 return up[a][0]; 65 } 66 67 int dist(int a, int b) const { 68 int c = lca(a, b); 69 return depth[a] + depth[b] - 2 * depth[c]; 70 } 71 72 int kth_on_path(int a, int b, int k) const { 73 // 1-indexed k: k=1 returns a; k=dist+1 returns b 74 int c = lca(a, b); 75 int d1 = depth[a] - depth[c]; 76 if (k - 1 <= d1) { 77 return lift(a, k - 1); 78 } else { 79 int d2 = depth[b] - depth[c]; 80 int rem = k - (d1 + 1); 81 // Move from c down to b by d2 steps -> from b up by (d2 - rem) 82 return lift(b, d2 - rem); 83 } 84 } 85 }; 86 87 int main() { 88 ios::sync_with_stdio(false); 89 cin.tie(nullptr); 90 91 int n, q; cin >> n >> q; 92 LCA solver; solver.init(n, 1); // root at 1 by default 93 for (int i = 0; i < n - 1; ++i) { 94 int u, v; cin >> u >> v; 95 solver.add_edge(u, v); 96 } 97 solver.preprocess(); 98 99 // Example queries: type 1 u v => LCA; type 2 u v => distance; type 3 u v k => k-th node on u->v 100 // Input format (example): 101 // q lines: t u v [k] 102 while (q--) { 103 int t; cin >> t; 104 if (t == 1) { 105 int u, v; cin >> u >> v; 106 cout << solver.lca(u, v) << "\n"; 107 } else if (t == 2) { 108 int u, v; cin >> u >> v; 109 cout << solver.dist(u, v) << "\n"; 110 } else if (t == 3) { 111 int u, v, k; cin >> u >> v >> k; 112 cout << solver.kth_on_path(u, v, k) << "\n"; 113 } 114 } 115 return 0; 116 } 117
This program builds a rooted tree and precomputes a binary lifting table. It supports LCA queries, distances, and k-th node on the path using lifting and depth arithmetic. BFS avoids recursion depth issues, and the lifting logic ensures O(log n) query time.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct LCA_EulerST { 5 int n, root; 6 vector<vector<int>> adj; 7 vector<int> first; // first occurrence in Euler tour 8 vector<int> euler; // nodes in Euler tour 9 vector<int> depthEuler; // depth at each Euler index 10 vector<int> depth; // depth per node 11 vector<int> lg2; // precomputed logs 12 vector<vector<int>> st; // sparse table over indices of euler 13 14 LCA_EulerST(int n=0): n(n), root(1) {} 15 16 void init(int _n, int _root=1) { 17 n = _n; root = _root; 18 adj.assign(n + 1, {}); 19 first.assign(n + 1, -1); 20 depth.assign(n + 1, 0); 21 euler.clear(); depthEuler.clear(); 22 } 23 24 void add_edge(int u, int v) { 25 adj[u].push_back(v); 26 adj[v].push_back(u); 27 } 28 29 void dfs(int u, int p, int d) { 30 depth[u] = d; 31 first[u] = (int)euler.size(); 32 euler.push_back(u); 33 depthEuler.push_back(d); 34 for (int v : adj[u]) if (v != p) { 35 dfs(v, u, d + 1); 36 // add u again when returning 37 euler.push_back(u); 38 depthEuler.push_back(d); 39 } 40 } 41 42 void build() { 43 dfs(root, 0, 0); 44 int m = (int)euler.size(); 45 lg2.assign(m + 1, 0); 46 for (int i = 2; i <= m; ++i) lg2[i] = lg2[i/2] + 1; 47 int K = lg2[m] + 1; 48 st.assign(K, vector<int>(m)); 49 // st[0][i] stores index i (we'll compare by depthEuler) 50 iota(st[0].begin(), st[0].end(), 0); 51 for (int k = 1; k < K; ++k) { 52 int len = 1 << k; 53 int half = len >> 1; 54 for (int i = 0; i + len <= m; ++i) { 55 int i1 = st[k-1][i]; 56 int i2 = st[k-1][i + half]; 57 st[k][i] = (depthEuler[i1] < depthEuler[i2] ? i1 : i2); 58 } 59 } 60 } 61 62 int rmq_index(int L, int R) const { 63 if (L > R) swap(L, R); 64 int len = R - L + 1; 65 int k = lg2[len]; 66 int i1 = st[k][L]; 67 int i2 = st[k][R - (1 << k) + 1]; 68 return (depthEuler[i1] < depthEuler[i2] ? i1 : i2); 69 } 70 71 int lca(int u, int v) const { 72 int L = first[u]; 73 int R = first[v]; 74 int idx = rmq_index(L, R); 75 return euler[idx]; 76 } 77 78 int dist(int u, int v) const { 79 int c = lca(u, v); 80 return depth[u] + depth[v] - 2 * depth[c]; 81 } 82 }; 83 84 int main() { 85 ios::sync_with_stdio(false); 86 cin.tie(nullptr); 87 88 int n, q; cin >> n >> q; 89 LCA_EulerST solver; solver.init(n, 1); 90 for (int i = 0; i < n - 1; ++i) { 91 int u, v; cin >> u >> v; 92 solver.add_edge(u, v); 93 } 94 solver.build(); 95 96 while (q--) { 97 int u, v; cin >> u >> v; 98 cout << solver.lca(u, v) << " " << solver.dist(u, v) << "\n"; 99 } 100 return 0; 101 } 102
This implementation converts LCA to RMQ on the Euler tour. The DFS records the Euler sequence and depths; a sparse table answers RMQ in O(1), returning the node with minimum depth between first occurrences. Distances use the standard LCA depth formula.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct DSU { 5 vector<int> p, sz; 6 DSU(int n=0) { init(n); } 7 void init(int n) { p.resize(n+1); sz.assign(n+1,1); iota(p.begin(), p.end(), 0); } 8 int find(int x){ return p[x]==x? x: p[x]=find(p[x]); } 9 void unite(int a, int b){ a=find(a); b=find(b); if(a==b) return; if(sz[a]<sz[b]) swap(a,b); p[b]=a; sz[a]+=sz[b]; } 10 }; 11 12 struct TarjanLCA { 13 int n, root; 14 vector<vector<int>> adj; 15 vector<vector<pair<int,int>>> qs; // qs[u]: (v, id) 16 vector<int> ans; 17 vector<int> ancestor; 18 vector<int> vis; 19 DSU dsu; 20 21 TarjanLCA(int n=0): n(n), root(1) {} 22 23 void init(int _n, int _root=1) { 24 n = _n; root = _root; 25 adj.assign(n+1, {}); 26 qs.assign(n+1, {}); 27 ans.clear(); 28 ancestor.assign(n+1, 0); 29 vis.assign(n+1, 0); 30 dsu.init(n); 31 } 32 33 void add_edge(int u, int v){ adj[u].push_back(v); adj[v].push_back(u); } 34 35 void add_query(int u, int v, int id){ 36 if ((int)ans.size() <= id) ans.resize(id+1, -1); 37 qs[u].push_back({v, id}); 38 qs[v].push_back({u, id}); 39 } 40 41 void dfs(int u, int p){ 42 dsu.p[u] = u; ancestor[u] = u; vis[u] = 1; 43 for(int v: adj[u]) if(v!=p){ 44 dfs(v, u); 45 dsu.unite(u, v); 46 ancestor[ dsu.find(u) ] = u; 47 } 48 // After visiting children, answer queries where the other endpoint is already visited 49 for (auto [v, id] : qs[u]) if (vis[v]) { 50 int w = ancestor[ dsu.find(v) ]; 51 ans[id] = w; 52 } 53 } 54 55 vector<int> solve(){ 56 dfs(root, 0); 57 return ans; 58 } 59 }; 60 61 int main(){ 62 ios::sync_with_stdio(false); 63 cin.tie(nullptr); 64 65 int n, q; cin >> n >> q; 66 TarjanLCA solver; solver.init(n, 1); 67 for(int i=0;i<n-1;++i){ int u,v; cin>>u>>v; solver.add_edge(u,v); } 68 for(int i=0;i<q;++i){ int u,v; cin>>u>>v; solver.add_query(u,v,i); } 69 vector<int> ans = solver.solve(); 70 for(int i=0;i<q;++i) cout << ans[i] << "\n"; 71 return 0; 72 } 73
Tarjan’s offline algorithm answers all LCAs in one DFS using a DSU. Each node becomes its own set when first visited, and after processing a child subtree, the child is unioned into the parent. When both endpoints of a query have been visited, the DSU’s representative gives an ancestor whose stored value is the LCA.