Heavy-Light Decomposition
Key Points
- â˘Heavy-Light Decomposition (HLD) breaks a tree into O(n) disjoint chains so that any root-to-node path crosses only O(log n) chains.
- â˘We choose a heavy edge from each node to its largest-subtree child; all other edges are light, and each light edge at least halves the remaining subtree size.
- â˘Mapping each chain to a contiguous segment in an array lets us use a global segment tree or Fenwick tree to support fast path and subtree queries.
- â˘Standard HLD path queries take O(lo n): O(log n) chain switches times O(log n) segment tree operations per segment.
- â˘Subtree queries are especially simple: a nodeâs subtree corresponds to a single contiguous array interval [pos[u], pos[u] + size[u] - 1].
- â˘Edge-weighted problems store an edgeâs value at the position of its child endpoint to make path queries consistent.
- â˘With specialized segment tree âwalkâ operations (order-statistics/first-above-threshold) or different data structures (e.g., Link-Cut Trees), some path queries can be reduced to O(log n).
- â˘HLD is a go-to tool for competitive programming when you need dynamic path updates/queries on trees with near-logarithmic performance.
Prerequisites
- âTrees and rooted trees â HLD is defined on rooted trees and relies on parent/child, depth, and subtree concepts.
- âDepth-First Search (DFS) â Both computing subtree sizes and assigning base array positions use DFS traversals.
- âSegment Tree / Fenwick Tree â HLD converts tree operations into range operations on a base array handled by these structures.
- âLowest Common Ancestor (LCA) / Binary Lifting â While HLD can query without explicit LCA, understanding LCA clarifies path splitting and edge/node inclusion.
- âAsymptotic analysis (Big-O) â To reason about O(log n) light edges and O(log^2 n) path query complexity.
- âRecursion and stack management â Both DFS passes are often implemented recursively; knowledge helps avoid stack overflows or convert to iterative.
- âLazy propagation â Needed for efficient range path updates and subtree range operations.
- âC++ STL and implementation basics â Efficiently implementing adjacency lists, vectors, and fast I/O is crucial in CP settings.
Detailed Explanation
Tap terms for definitions01Overview
Heavy-Light Decomposition (HLD) is a way to restructure a tree so that hard âpathâ operations become easy. The key trick is to group many nodes into a small number of chains (paths) called heavy chains, and ensure that any path between two nodes intersects only O(log n) of these chains. Once the tree is decomposed, we flatten the chains into a single base array so that each chain becomes a contiguous array segment. This lets us reuse powerful 1D data structuresâlike segment trees or Fenwick treesâon top of the flattened representation. The upshot: path queries (sum, max, min, xor, etc.) and updates (point or range) can be implemented in polylogarithmic time. Standard HLD gives O(log^2 n) per path query/update because you make O(log n) climbs and do O(log n) work per climb with the segment tree. Subtree queries become even simpler: every nodeâs subtree is a single contiguous slice in the base array, so subtree updates/queries are O(log n) each. HLD is widely used in competitive programming (Codeforces, AtCoder) for dynamic tree problems, especially when offline tricks like Moâs on trees or DSU-on-tree arenât applicable.
02Intuition & Analogies
Imagine a city where intersections are tree nodes and roads are edges. Some roads (heavy) are the main highways that carry most traffic from a junction to the biggest cluster of neighborhoods; the rest (light) are side streets. If you drive from any intersection to the city center, youâll mostly stay on highways and only take a few side streets. In HLD, from any node, we pick the âbusiestâ childâthe one leading to the largest subtreeâand mark the edge to it as heavy. Chaining heavy edges builds long highways; light edges are the occasional exits. A remarkable fact follows: every time you take a side street (a light edge) toward the root, the number of remaining intersections you could still visit at least halves. You canât halve more than about log2(n) times before you run out of nodes, so you pass only O(log n) side streets. Next, we paint each highway with a different color and lay them straight end to end in a long strip (the base array). Now any trip along the cityâs roads becomes a series of short, straight stretches on the strip. Standard highway stretches are contiguous subarrays, so a segment tree (like a traffic counter) can quickly sum or maximize values over those stretches. Subtrees are even nicer: the set of intersections below a junction is a single block in the strip, so one range query handles it all. This mental picture explains why HLD converts tangled tree paths into a small number of simple array intervals.
03Formal Definition
04When to Use
Use HLD when you need online (interleaved) updates and queries along paths or subtrees in a static tree (edges and topology do not change). Common tasks include: (1) path sum/max/min/xor queries with point or range updates; (2) edge-weighted queries by storing each edgeâs value at its child nodeâs position; (3) subtree add/sum queries via lazy segment trees since a subtree is one contiguous range; (4) path increment and path query combinations for problems like dynamic distances (when edge weights change), controlling values along paths, or counting marked nodes. HLD shines when the operation is associative and invertible enough to plug into a segment tree or Fenwick tree. Itâs the default hammer for CF 2000â2400 tasks that say âanswer queries on paths in a treeâ unless the problem hints at offline processing (Mo on trees/DSU-on-tree) or dynamic topology (then prefer Link-Cut Trees or Euler Tour Trees). If queries are purely subtree-based, a simple Euler tour + Fenwick might suffice; but when both path and subtree operations coexist, HLD gives a unified, robust approach.
â ď¸Common Mistakes
- Mixing node and edge values: When the problem gives weights on edges, you must store the weight at the child node of that edge (the deeper endpoint). Forgetting this misaligns queries and includes/excludes the LCA incorrectly. Fix: convert each edge (u, v) to the deeper node x = (depth[u] > depth[v] ? u : v) and store its value at pos[x]. Query [pos[head], pos[u]] segments excluding pos[LCA] for edge-only sums.
- Wrong base array order: The second DFS must visit the heavy child first and keep nodes of a heavy chain contiguous. If you visit a light child before the heavy one, chains wonât be contiguous and queries will break.
- LCA off-by-one: Be careful whether the query is node-based (include LCA) or edge-based (exclude LCA). For edge-based queries, after you get u and v on the same chain, query from pos[lca]+1 to pos[deeper].
- 0/1 indexing inconsistency: Segment trees are simpler with 0-based or 1-based consistently. Mixing causes silent bugs.
- Forgetting subtree ranges: Subtree(u) is exactly [pos[u], pos[u]+size[u]-1]. Recomputing sets of nodes is unnecessary and slow.
- Overusing recursion depth: Deep trees can overflow recursion (e.g., skewed). Use iterative DFS or increase stack size in environments where needed.
- Expecting O(log n) for all operations: Plain HLD + segment tree gives O(log^2 n) for general path queries. Getting O(log n) often requires different DS (Link-Cut Trees) or specialized segment-tree âwalkâ queries (like kth-one/first-above-threshold) that traverse only O(log n) nodes overall.
Key Formulas
Subtree Size Recurrence
Explanation: The size of a nodeâs subtree equals one (the node itself) plus the sizes of all its childrenâs subtrees. This is computed in the first DFS.
Heavy-Light Lemma
Explanation: Each time you cross a light edge, the subtree size at least halves, so you can cross at most floor(log2 n) light edges. This bounds the number of chain switches.
Subtree Interval
Explanation: In the base array, the nodes of uâs subtree form one contiguous range. This enables O(log n) subtree queries/updates with a segment tree or BIT.
Standard HLD Complexity
Explanation: A path splits into O(log n) chain segments, and each segment query on a segment tree or BIT costs O(log n), giving O(lo n).
Edge-to-Node Mapping Rule
Explanation: Store each edgeâs weight at the childâs node position. When querying a path, include nodes excluding the LCA to aggregate edge weights.
LCA Definition
Explanation: The lowest common ancestor is the deepest node that lies on both root-to-u and root-to-v paths. HLD often uses chain heads to synchronize nodes before finishing within one chain.
Arithmetic Series (for checks)
Explanation: Used as a sanity check in examples when summing 1..n or similar sequences over subtrees/paths during testing.
Overall Cost
Explanation: Two DFS passes and building one global segment tree cost linear time; each query/update typically costs O(lo n). Certain specialized queries or alternative DS achieve O(log n).
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct SegTree { 5 int n; 6 vector<long long> st; 7 SegTree() {} 8 SegTree(int n): n(n), st(4*n, 0) {} 9 void build(const vector<long long>& a, int p, int l, int r){ 10 if(l==r){ st[p]=a[l]; return; } 11 int m=(l+r)>>1; build(a, p<<1, l, m); build(a, p<<1|1, m+1, r); 12 st[p]=st[p<<1]+st[p<<1|1]; 13 } 14 void build(const vector<long long>& a){ build(a,1,0,n-1);} 15 void pointUpdate(int p, int l, int r, int idx, long long val){ 16 if(l==r){ st[p]=val; return; } 17 int m=(l+r)>>1; 18 if(idx<=m) pointUpdate(p<<1,l,m,idx,val); else pointUpdate(p<<1|1,m+1,r,idx,val); 19 st[p]=st[p<<1]+st[p<<1|1]; 20 } 21 void pointUpdate(int idx, long long val){ pointUpdate(1,0,n-1,idx,val);} 22 long long query(int p, int l, int r, int ql, int qr){ 23 if(qr<l || r<ql) return 0; 24 if(ql<=l && r<=qr) return st[p]; 25 int m=(l+r)>>1; return query(p<<1,l,m,ql,qr)+query(p<<1|1,m+1,r,ql,qr); 26 } 27 long long query(int l, int r){ if(l>r) return 0; return query(1,0,n-1,l,r);} 28 }; 29 30 struct HLD { 31 int n, root; 32 vector<vector<int>> g; 33 vector<int> parent, depth, heavy, head, pos, sz; 34 int cur_pos; 35 vector<long long> base; // values in base order 36 SegTree st; 37 38 HLD(int n): n(n), root(0), g(n), parent(n,-1), depth(n,0), heavy(n,-1), head(n,0), pos(n,0), sz(n,0), cur_pos(0) {} 39 40 void addEdge(int u, int v){ g[u].push_back(v); g[v].push_back(u);} 41 42 int dfs1(int u, int p){ 43 parent[u]=p; sz[u]=1; int maxSub=0; 44 for(int v: g[u]) if(v!=p){ 45 depth[v]=depth[u]+1; 46 int s=dfs1(v,u); sz[u]+=s; 47 if(s>maxSub){ maxSub=s; heavy[u]=v; } 48 } 49 return sz[u]; 50 } 51 void dfs2(int u, int h, const vector<long long>& val){ 52 head[u]=h; pos[u]=cur_pos++; 53 base[pos[u]] = val[u]; 54 if(heavy[u]!=-1) dfs2(heavy[u], h, val); 55 for(int v: g[u]) if(v!=parent[u] && v!=heavy[u]) dfs2(v, v, val); 56 } 57 58 void build(const vector<long long>& val, int r=0){ 59 root=r; 60 base.assign(n,0); 61 dfs1(root,-1); 62 dfs2(root, root, val); 63 st = SegTree(n); st.build(base); 64 } 65 66 // Query sum on path u-v (node weights, includes both endpoints) 67 long long queryPath(int u, int v){ 68 long long res=0; 69 while(head[u]!=head[v]){ 70 if(depth[head[u]]<depth[head[v]]) swap(u,v); 71 int hu=head[u]; 72 res += st.query(pos[hu], pos[u]); 73 u = parent[hu]; 74 } 75 if(depth[u]>depth[v]) swap(u,v); 76 res += st.query(pos[u], pos[v]); 77 return res; 78 } 79 80 // Set node u's value to x 81 void updateNode(int u, long long x){ st.pointUpdate(pos[u], x); } 82 }; 83 84 int main(){ 85 ios::sync_with_stdio(false); 86 cin.tie(nullptr); 87 88 // Example: build a tree, assign values, do queries 89 int n = 9; 90 HLD hld(n); 91 vector<pair<int,int>> edges={{0,1},{0,2},{1,3},{1,4},{2,5},{5,6},{5,7},{7,8}}; 92 for(auto [u,v]: edges) hld.addEdge(u,v); 93 vector<long long> val={5,1,4,3,2,7,6,9,8}; // node weights 94 hld.build(val, 0); 95 96 // Path sum 3-8 97 cout << hld.queryPath(3,8) << "\n"; // demonstration output 98 99 // Update node 5 to 10 100 hld.updateNode(5,10); 101 102 // Path sum 6-4 103 cout << hld.queryPath(6,4) << "\n"; 104 105 return 0; 106 } 107
This program builds HLD over a static tree and stores node weights in a segment tree over the base array. Path queries decompose into O(log n) chain segments, each answered by a segment tree range sum. Point updates replace a single nodeâs value. The example prints the path sums before and after a point update.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct SegTreeMax { 5 int n; vector<long long> st; 6 SegTreeMax(){} 7 SegTreeMax(int n): n(n), st(4*n, LLONG_MIN){} 8 void build(const vector<long long>& a, int p, int l, int r){ 9 if(l==r){ st[p]=a[l]; return; } 10 int m=(l+r)>>1; build(a,p<<1,l,m); build(a,p<<1|1,m+1,r); 11 st[p]=max(st[p<<1], st[p<<1|1]); 12 } 13 void build(const vector<long long>& a){ build(a,1,0,n-1);} 14 void pointUpdate(int p, int l, int r, int idx, long long val){ 15 if(l==r){ st[p]=val; return; } 16 int m=(l+r)>>1; 17 if(idx<=m) pointUpdate(p<<1,l,m,idx,val); else pointUpdate(p<<1|1,m+1,r,idx,val); 18 st[p]=max(st[p<<1], st[p<<1|1]); 19 } 20 void pointUpdate(int idx, long long val){ pointUpdate(1,0,n-1,idx,val);} 21 long long query(int p, int l, int r, int ql, int qr){ 22 if(qr<l || r<ql) return LLONG_MIN; 23 if(ql<=l && r<=qr) return st[p]; 24 int m=(l+r)>>1; return max(query(p<<1,l,m,ql,qr), query(p<<1|1,m+1,r,ql,qr)); 25 } 26 long long query(int l, int r){ if(l>r) return LLONG_MIN; return query(1,0,n-1,l,r);} 27 }; 28 29 struct HLD { 30 int n, root; 31 vector<vector<pair<int,int>>> g; // (to, edgeId) 32 vector<int> parent, depth, heavy, head, pos, sz; 33 int cur_pos; 34 vector<long long> base; 35 SegTreeMax st; 36 vector<pair<int,int>> edges; // (u,v) 37 38 HLD(int n): n(n), root(0), g(n), parent(n,-1), depth(n,0), heavy(n,-1), head(n,0), pos(n,0), sz(n,0), cur_pos(0) {} 39 40 void addEdge(int id, int u, int v){ 41 if((int)edges.size()<=id) edges.resize(id+1); 42 edges[id]={u,v}; 43 g[u].push_back({v,id}); g[v].push_back({u,id}); 44 } 45 46 int dfs1(int u, int p){ 47 parent[u]=p; sz[u]=1; int maxSub=0; 48 for(auto [v,id]: g[u]) if(v!=p){ 49 depth[v]=depth[u]+1; 50 int s=dfs1(v,u); sz[u]+=s; 51 if(s>maxSub){ maxSub=s; heavy[u]=v; } 52 } 53 return sz[u]; 54 } 55 56 void dfs2(int u, int h, const vector<long long>& edgeW){ 57 head[u]=h; pos[u]=cur_pos++; 58 if(heavy[u]!=-1) dfs2(heavy[u], h, edgeW); 59 for(auto [v,id]: g[u]) if(v!=parent[u] && v!=heavy[u]) dfs2(v, v, edgeW); 60 } 61 62 void build(const vector<long long>& edgeW, int r=0){ 63 root=r; dfs1(root,-1); 64 // Prepare base array that stores edge values at the deeper node 65 base.assign(n, LLONG_MIN); // base[pos[root]] should not contribute (no parent edge) 66 dfs2(root, root, edgeW); 67 for(size_t id=0; id<edges.size(); ++id){ 68 int u=edges[id].first, v=edges[id].second; 69 int child = (parent[u]==v? u : v); 70 base[pos[child]] = edgeW[id]; 71 } 72 st = SegTreeMax(n); st.build(base); 73 } 74 75 // Max over edge weights on path u-v (exclude LCA position) 76 long long queryPathMax(int u, int v){ 77 long long res = LLONG_MIN; 78 while(head[u]!=head[v]){ 79 if(depth[head[u]]<depth[head[v]]) swap(u,v); 80 int hu=head[u]; 81 // Query [pos[hu], pos[u]] includes edges within u's chain up to its head 82 res = max(res, st.query(pos[hu], pos[u])); 83 u = parent[hu]; 84 } 85 if(depth[u]>depth[v]) swap(u,v); 86 // Now same chain; exclude pos[u] because edge to parent(u) is stored at pos[u] 87 res = max(res, st.query(pos[u]+1, pos[v])); 88 return res; 89 } 90 91 // Update an edge's weight to x 92 void updateEdge(int id, long long x){ 93 int u=edges[id].first, v=edges[id].second; 94 int child = (parent[u]==v? u : v); 95 st.pointUpdate(pos[child], x); 96 } 97 }; 98 99 int main(){ 100 ios::sync_with_stdio(false); 101 cin.tie(nullptr); 102 103 // Build a tree with 8 nodes and 7 edges 104 int n=8; HLD hld(n); 105 // edge ids: 0..6 106 vector<tuple<int,int,long long>> E={{0,0,1,5},{1,1,2,3},{2,1,3,4},{3,2,4,7},{4,2,5,6},{5,5,6,8},{6,5,7,2}}; 107 // add edges 108 for(auto [id,u,v,w]: E) hld.addEdge(id,u,v); 109 // edge weights array by id 110 vector<long long> w(7); 111 for(auto [id,u,v,ww]: E) w[id]=ww; 112 113 hld.build(w, 0); 114 115 cout << hld.queryPathMax(6,3) << "\n"; // max edge on path 6-3 116 hld.updateEdge(3, 10); // update edge (2,4) to 10 117 cout << hld.queryPathMax(4,7) << "\n"; // max edge on path 4-7 118 return 0; 119 } 120
This example handles edge-weighted queries by storing each edgeâs weight at the position of its deeper endpoint. Path maximum queries exclude the LCAâs position on the final same-chain segment. Point edge updates become point updates in the segment tree.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct SegTreeLazy { 5 int n; 6 struct Node{ long long sum=0, lazy=0; }; 7 vector<Node> st; 8 SegTreeLazy(){} 9 SegTreeLazy(int n): n(n), st(4*n){} 10 void build(const vector<long long>& a, int p, int l, int r){ 11 if(l==r){ st[p].sum=a[l]; return; } 12 int m=(l+r)>>1; build(a,p<<1,l,m); build(a,p<<1|1,m+1,r); 13 st[p].sum = st[p<<1].sum + st[p<<1|1].sum; 14 } 15 void build(const vector<long long>& a){ build(a,1,0,n-1);} 16 void push(int p, int l, int r){ 17 long long &lz=st[p].lazy; 18 if(lz!=0 && l!=r){ 19 int lc=p<<1, rc=p<<1|1; int m=(l+r)>>1; 20 st[lc].lazy += lz; st[rc].lazy += lz; 21 st[lc].sum += lz * (m-l+1); 22 st[rc].sum += lz * (r-m); 23 lz=0; 24 } else if(lz!=0 && l==r){ st[p].lazy=0; } 25 } 26 void rangeAdd(int p, int l, int r, int ql, int qr, long long v){ 27 if(qr<l || r<ql) return; 28 if(ql<=l && r<=qr){ st[p].lazy += v; st[p].sum += v*(r-l+1); return; } 29 push(p,l,r); 30 int m=(l+r)>>1; 31 rangeAdd(p<<1,l,m,ql,qr,v); rangeAdd(p<<1|1,m+1,r,ql,qr,v); 32 st[p].sum = st[p<<1].sum + st[p<<1|1].sum; 33 } 34 void rangeAdd(int l, int r, long long v){ if(l>r) return; rangeAdd(1,0,n-1,l,r,v);} 35 long long rangeSum(int p, int l, int r, int ql, int qr){ 36 if(qr<l || r<ql) return 0; 37 if(ql<=l && r<=qr) return st[p].sum; 38 push(p,l,r); 39 int m=(l+r)>>1; return rangeSum(p<<1,l,m,ql,qr)+rangeSum(p<<1|1,m+1,r,ql,qr); 40 } 41 long long rangeSum(int l, int r){ if(l>r) return 0; return rangeSum(1,0,n-1,l,r);} 42 }; 43 44 struct HLD { 45 int n, root; 46 vector<vector<int>> g; 47 vector<int> parent, depth, heavy, head, pos, sz; 48 int cur_pos; 49 vector<long long> base; 50 SegTreeLazy st; 51 52 HLD(int n): n(n), root(0), g(n), parent(n,-1), depth(n,0), heavy(n,-1), head(n,0), pos(n,0), sz(n,0), cur_pos(0) {} 53 void addEdge(int u,int v){ g[u].push_back(v); g[v].push_back(u);} 54 55 int dfs1(int u, int p){ 56 parent[u]=p; sz[u]=1; int maxSub=0; 57 for(int v: g[u]) if(v!=p){ depth[v]=depth[u]+1; int s=dfs1(v,u); sz[u]+=s; if(s>maxSub){ maxSub=s; heavy[u]=v; }} 58 return sz[u]; 59 } 60 void dfs2(int u, int h, const vector<long long>& val){ 61 head[u]=h; pos[u]=cur_pos++; base[pos[u]]=val[u]; 62 if(heavy[u]!=-1) dfs2(heavy[u], h, val); 63 for(int v: g[u]) if(v!=parent[u] && v!=heavy[u]) dfs2(v, v, val); 64 } 65 66 void build(const vector<long long>& val, int r=0){ 67 root=r; base.assign(n,0); dfs1(root,-1); dfs2(root,root,val); 68 st = SegTreeLazy(n); st.build(base); 69 } 70 71 // Add delta to all nodes along path u-v (node weights) 72 void pathAdd(int u, int v, long long delta){ 73 while(head[u]!=head[v]){ 74 if(depth[head[u]]<depth[head[v]]) swap(u,v); 75 int hu=head[u]; st.rangeAdd(pos[hu], pos[u], delta); 76 u = parent[hu]; 77 } 78 if(depth[u]>depth[v]) swap(u,v); 79 st.rangeAdd(pos[u], pos[v], delta); 80 } 81 82 // Subtree sum of node u 83 long long subtreeSum(int u){ return st.rangeSum(pos[u], pos[u]+sz[u]-1); } 84 }; 85 86 int main(){ 87 ios::sync_with_stdio(false); 88 cin.tie(nullptr); 89 90 int n=7; HLD hld(n); 91 vector<pair<int,int>> edges={{0,1},{0,2},{1,3},{1,4},{2,5},{5,6}}; 92 for(auto [u,v]: edges) hld.addEdge(u,v); 93 vector<long long> val={1,2,3,4,5,6,7}; 94 hld.build(val, 0); 95 96 // Add +10 to all nodes on path 4-6 97 hld.pathAdd(4,6,10); 98 99 // Query subtree sums 100 cout << hld.subtreeSum(1) << "\n"; // subtree of node 1 101 cout << hld.subtreeSum(2) << "\n"; // subtree of node 2 102 103 return 0; 104 } 105
This program supports range-add along any path and retrieves subtree sums. Subtrees are contiguous intervals; path updates decompose into O(log n) intervals and are applied with lazy propagation in O(log n) each. This pattern generalizes to many range-update/range-query combinations.