Virtual Tree (Auxiliary Tree)
Key Points
- •A Virtual Tree (Auxiliary Tree) compresses a large tree into a much smaller tree that contains only the k important nodes and the LCAs needed to keep them connected.
- •You build it in O(k k) time by sorting the important nodes by DFS entry time and inserting LCAs between consecutive nodes.
- •The resulting virtual tree has at most 2k-1 nodes and O(k) edges, so you can run tree DP on it in near-linear time in k instead of n.
- •LCA preprocessing on the original tree costs O(n n), after which each query over a subset can be answered efficiently with a virtual tree.
- •Distances and contributions on the original tree are preserved on the virtual tree by using edge lengths equal to depth differences (or accumulated weights for weighted trees).
- •Common tasks like total Steiner tree length, pairwise distance sums, and custom DPs over marked subsets become straightforward on the virtual tree.
- •Be careful to deduplicate marked nodes, always include necessary LCAs, and only clear data structures for nodes touched by the current virtual tree.
- •This technique is a staple for multi-query competitive programming problems on trees with subset constraints (CF 2000–2400 level).
Prerequisites
- →Depth-First Search (DFS) and tree rooting — Necessary to compute tin/tout times, depths, and to define ancestor relationships clearly.
- →Lowest Common Ancestor (LCA) — Core operation for inserting required junctions between marked nodes.
- →Binary lifting or Euler tour + RMQ — Efficient LCA implementations that make virtual tree construction fast.
- →Big-O notation — To understand the per-query cost improvements from O(n) to O(k log k).
- →Tree distance (unweighted and weighted) — Virtual tree preserves distances; many DPs rely on accurate distance computations.
- →Stacks and sorting — The construction algorithm is based on sorting by tin and maintaining a stack of ancestors.
- →Tree DP patterns — Once VT is built, typical solutions require DP over the compressed structure.
- →Set/deduplication techniques — Marked nodes must be deduplicated to avoid double-counting or structural errors.
Detailed Explanation
Tap terms for definitions01Overview
A Virtual Tree (also called an Auxiliary Tree) is a powerful technique to handle problems on trees where each query involves only a subset of k important nodes (k ≪ n). Instead of solving the problem on the full tree of size n for each query, we compress the tree to include only the important nodes and the least number of extra nodes (their Lowest Common Ancestors, LCAs) required to keep them connected. The crucial insight is that when you sort the marked nodes by their DFS entry time and connect them appropriately using LCAs, you obtain a much smaller tree that preserves the connectivity and path structure among the important nodes. This compressed structure is the virtual tree. Building it takes O(k log k) time (dominated by sorting), and the number of nodes and edges becomes O(k), enabling fast dynamic programming or greedy computations per query. Preprocessing for LCA on the original tree (via binary lifting or Euler tour + RMQ) is done once in O(n log n) time. The virtual tree is especially beneficial in multi-query settings: the amortized per-query cost drops dramatically from potentially O(n) to O(k log k) or O(k).
02Intuition & Analogies
Imagine a massive road network (the tree), but today you only care about deliveries to k specific cities (the important nodes). If you highlight just those cities and the minimal set of roads that connects them, you get a much smaller subnetwork. However, some junctions (LCAs) that are not on your delivery list still need to appear because they are the necessary intersections where paths between your cities meet. The virtual tree is exactly that: the minimal backbone connecting your cities (important nodes) through required junctions (LCAs). To construct it quickly, think of driving through the cities in the order you would naturally visit them if you always go as deep as possible before backtracking—this is the DFS order. If you sort the cities by this order, then any two consecutive cities are connected by a path that meets at their LCA. By pushing these LCAs in between, you can stitch together a neat route that never makes unnecessary detours. The stack construction mimics a courier’s notebook: you keep a stack of current junctions; when a new city lies outside the region of the top junction, you close that region and draw an edge to its parent in the notebook—this draws exactly the edges of the virtual tree. Because every city and junction is touched only a constant number of times, the total extra work stays linear after sorting the cities. Now your delivery optimization (DP) runs on this tiny notebook graph instead of the giant map.
03Formal Definition
04When to Use
Use a virtual tree when many queries each involve a small subset S of nodes and you need to compute something about paths or connectivity among S, e.g., total length of the minimal connecting subtree (Steiner tree on trees), sum of pairwise distances among marked nodes, or custom DPs where transitions depend on ancestor/descendant relations. Classic competitive programming scenarios include: (1) Processing Q queries, each with k marked nodes, where k is small relative to n; (2) Counting or minimizing some cost that accumulates along unique paths between marked nodes; (3) Weighted or unweighted trees where distance-based computations are required; (4) Situations where per-query O(n) is too slow but O(k log k) is feasible. If you already have LCA preprocessing, virtual trees allow near-linear per-query time in k. They are less useful when each query’s subset is large (k close to n) or when the problem depends on global properties unrelated to the marked subset (e.g., requires traversing the entire tree regardless of S). For single-query settings with large k, constructing a virtual tree is still often helpful because it provides a clean structure for DPs and proofs.
⚠️Common Mistakes
- Forgetting to include LCAs: Sorting marked nodes by DFS order but not inserting LCAs between consecutive nodes yields a forest missing necessary junctions; paths among marked nodes are then broken. Always insert LCAs or use the stack algorithm that generates them implicitly.\n- Not deduplicating: If marked nodes contain duplicates, the stack logic or counts in DP can become incorrect. Deduplicate after sorting by tin.\n- Incorrect ancestor checks: Use the invariant tin[u] \le tin[v] \le tout[u] to test if u is an ancestor of v. Off-by-one or misordered comparisons create wrong edges.\n- Clearing whole arrays each query: Only clear data for nodes present in the current virtual tree (\le 2k-1 nodes). Clearing size-n arrays per query ruins performance. Maintain a "touched" list.\n- Depth vs. distance: In weighted trees, depth differences do not equal distances. Maintain an accumulated distance-from-root array to compute dist(u, v) correctly.\n- Stack linking bugs: When popping from the stack, connect the popped node to the new stack top (its closest ancestor in C). Forgetting to flush the stack at the end leaves edges missing.\n- Recursion limits: Deep recursion can overflow the system stack on adversarial trees. Consider iterative DFS or increase stack size if allowed.
Key Formulas
Unweighted Tree Distance
Explanation: The distance between two nodes in an unweighted tree is the sum of their depths minus twice the depth of their LCA. It counts how many edges you go up from u to the LCA and then down to v.
Weighted Tree Distance
Explanation: With D[x] as distance from root to x using edge weights, the weighted distance is computed by subtracting twice the root-to-LCA distance. This works because paths are unique in trees.
Virtual Tree Size Bound
Explanation: The virtual tree contains at most the k marked nodes plus at most k−1 LCAs, since each insertion of an LCA reduces the number of connected components. Hence the total is bounded by 2k−1.
Build Complexity
Explanation: Construction sorts the k nodes (O(k log k)) and performs O(k) ancestor/LCA operations; with binary lifting = O( n) and with RMQ t_).
Ancestor Test
Explanation: Using DFS entry/exit times, u is an ancestor of v if v’s entry time is within u’s interval. This is a standard invariant from Euler tour timing.
Unweighted VT Edge Length
Explanation: In an unweighted tree, the number of original edges on the path from p to v equals their depth difference. Summing these over VT edges gives the Steiner subtree size.
Pairwise Distance Sum via VT
Explanation: For each VT edge, let \# be the number of marked nodes in the child v’s component after removing (p,v), and the path length from p to v. Each pair split by this edge contributes to the distance sum.
Combinatorial Counting Refresher
Explanation: Often used when converting counts of pairs to sums and vice versa (e.g., choosing pairs across components). It reminds that summing first n integers yields a quadratic expression.
Logarithm Change of Base
Explanation: Relevant when comparing complexities like O(k k) with different log bases; the base only changes complexity by a constant factor.
Steiner Subtree Length on Trees
Explanation: The minimal subtree connecting S has total length equal to the sum of lengths of all edges in the virtual tree, because VT exactly decomposes the union of required paths.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 /* 5 Virtual Tree (Auxiliary Tree) construction on an unweighted tree. 6 - Preprocess LCA with binary lifting. 7 - Given k marked nodes, build the virtual tree in O(k log k). 8 - Compute the total number of original edges in the minimal subtree connecting them 9 (Steiner subtree length) as sum of depth differences across VT edges. 10 */ 11 12 struct LCA { 13 int n, LOG, root; 14 vector<vector<int>> up; // up[v][j] = 2^j-th ancestor of v 15 vector<int> depth, tin, tout; 16 vector<vector<int>> g; 17 int timer = 0; 18 19 LCA(int n=0): n(n) {} 20 21 void init(int _n, int _root, const vector<vector<int>>& _g) { 22 n = _n; root = _root; g = _g; 23 LOG = 1; while ((1<<LOG) <= n) ++LOG; 24 up.assign(n+1, vector<int>(LOG)); 25 depth.assign(n+1, 0); 26 tin.assign(n+1, 0); 27 tout.assign(n+1, 0); 28 timer = 0; 29 dfs(root, root); 30 } 31 32 void dfs(int v, int p) { 33 tin[v] = ++timer; 34 up[v][0] = p; 35 for (int j = 1; j < LOG; ++j) up[v][j] = up[ up[v][j-1] ][j-1]; 36 for (int to : g[v]) if (to != p) { 37 depth[to] = depth[v] + 1; // unweighted: each edge has length 1 38 dfs(to, v); 39 } 40 tout[v] = timer; 41 } 42 43 bool is_ancestor(int u, int v) const { // u is ancestor of v? 44 return tin[u] <= tin[v] && tout[v] <= tout[u]; 45 } 46 47 int lca(int u, int v) const { 48 if (is_ancestor(u, v)) return u; 49 if (is_ancestor(v, u)) return v; 50 int x = u; 51 for (int j = LOG-1; j >= 0; --j) { 52 int a = up[x][j]; 53 if (!is_ancestor(a, v)) x = a; 54 } 55 return up[x][0]; 56 } 57 }; 58 59 int main() { 60 ios::sync_with_stdio(false); 61 cin.tie(nullptr); 62 63 int n; cin >> n; 64 vector<vector<int>> g(n+1); 65 for (int i = 0; i < n-1; ++i) { 66 int u, v; cin >> u >> v; 67 g[u].push_back(v); 68 g[v].push_back(u); 69 } 70 71 int root = 1; 72 LCA solver; solver.init(n, root, g); 73 74 int k; cin >> k; 75 vector<int> a(k); 76 for (int i = 0; i < k; ++i) cin >> a[i]; 77 78 // Step 1: sort by tin and deduplicate 79 sort(a.begin(), a.end(), [&](int x, int y){ return solver.tin[x] < solver.tin[y]; }); 80 a.erase(unique(a.begin(), a.end()), a.end()); 81 82 // Step 2: insert LCAs of consecutive nodes 83 vector<int> nodes = a; 84 for (int i = 0; i+1 < (int)a.size(); ++i) nodes.push_back(solver.lca(a[i], a[i+1])); 85 sort(nodes.begin(), nodes.end(), [&](int x, int y){ return solver.tin[x] < solver.tin[y]; }); 86 nodes.erase(unique(nodes.begin(), nodes.end()), nodes.end()); 87 88 // Step 3: build Virtual Tree adjacency (directed parent->child) 89 auto is_anc = [&](int u, int v){ return solver.is_ancestor(u, v); }; 90 vector<vector<int>> vt(n+1); // allocate big but we'll only touch `nodes` 91 92 vector<int> st; 93 st.reserve(nodes.size()); 94 st.push_back(nodes[0]); 95 for (int i = 1; i < (int)nodes.size(); ++i) { 96 int v = nodes[i]; 97 while ((int)st.size() >= 1 && !is_anc(st.back(), v)) { 98 int u = st.back(); st.pop_back(); 99 // connect u to its closest ancestor now at st.back() 100 int p = st.back(); 101 vt[p].push_back(u); 102 } 103 st.push_back(v); 104 } 105 while ((int)st.size() > 1) { 106 int u = st.back(); st.pop_back(); 107 int p = st.back(); 108 vt[p].push_back(u); 109 } 110 111 // Step 4: Compute Steiner subtree length = sum over edges depth[v]-depth[p] 112 long long steiner_len = 0; 113 for (int p : nodes) { 114 for (int v : vt[p]) { 115 steiner_len += (long long)(solver.depth[v] - solver.depth[p]); 116 } 117 } 118 119 cout << steiner_len << "\n"; 120 return 0; 121 } 122
This program preprocesses LCA using binary lifting, then constructs the virtual tree for a set of marked nodes. It inserts necessary LCAs, sorts by tin, and uses a stack to connect each node to its closest ancestor in the virtual set. The Steiner subtree length in an unweighted tree is the sum of depth differences across virtual edges, which equals the number of original edges in the minimal connecting subtree.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 /* 5 Compute sum of pairwise distances among k marked nodes. 6 Key identity on Virtual Tree: 7 Sum_{pairs (x,y)} dist(x,y) = sum over VT edges (p->v): s_v * (k - s_v) * len(p,v) 8 where s_v is the number of marked nodes in v's component below the edge (p,v), 9 and len(p,v) = depth[v] - depth[p] (unweighted). 10 */ 11 12 struct LCA { 13 int n, LOG, root, timer = 0; 14 vector<vector<int>> up, g; 15 vector<int> tin, tout, depth; 16 void init(int _n, int _root, const vector<vector<int>>& _g){ 17 n=_n; root=_root; g=_g; 18 LOG=1; while((1<<LOG)<=n) ++LOG; 19 up.assign(n+1, vector<int>(LOG)); 20 tin.assign(n+1,0); tout.assign(n+1,0); depth.assign(n+1,0); 21 dfs(root, root); 22 } 23 void dfs(int v, int p){ 24 tin[v]=++timer; up[v][0]=p; 25 for(int j=1;j<LOG;++j) up[v][j]=up[ up[v][j-1] ][j-1]; 26 for(int to: g[v]) if(to!=p){ depth[to]=depth[v]+1; dfs(to,v);} 27 tout[v]=timer; 28 } 29 bool is_ancestor(int u, int v) const { return tin[u]<=tin[v] && tout[v]<=tout[u]; } 30 int lca(int u,int v) const{ 31 if(is_ancestor(u,v)) return u; if(is_ancestor(v,u)) return v; 32 int x=u; for(int j=LOG-1;j>=0;--j){int a=up[x][j]; if(!is_ancestor(a,v)) x=a;} return up[x][0]; 33 } 34 }; 35 36 int main(){ 37 ios::sync_with_stdio(false); 38 cin.tie(nullptr); 39 40 int n; cin>>n; 41 vector<vector<int>> g(n+1); 42 for(int i=0;i<n-1;++i){int u,v;cin>>u>>v; g[u].push_back(v); g[v].push_back(u);} 43 44 LCA lca; lca.init(n, 1, g); 45 46 int k; cin>>k; vector<int> a(k); for(int i=0;i<k;++i) cin>>a[i]; 47 48 // sort & unique by tin 49 sort(a.begin(), a.end(), [&](int x,int y){return lca.tin[x]<lca.tin[y];}); 50 a.erase(unique(a.begin(), a.end()), a.end()); 51 52 // insert LCAs of consecutive 53 vector<int> nodes=a; 54 for(int i=0;i+1<(int)a.size();++i) nodes.push_back(lca.lca(a[i], a[i+1])); 55 sort(nodes.begin(), nodes.end(), [&](int x,int y){return lca.tin[x]<lca.tin[y];}); 56 nodes.erase(unique(nodes.begin(), nodes.end()), nodes.end()); 57 58 // build VT 59 vector<vector<int>> vt(n+1); 60 vector<int> st; st.push_back(nodes[0]); 61 auto is_anc = [&](int u,int v){return lca.is_ancestor(u,v);} ; 62 for(int i=1;i<(int)nodes.size();++i){ 63 int v=nodes[i]; 64 while(!is_anc(st.back(), v)){ 65 int u=st.back(); st.pop_back(); 66 int p=st.back(); vt[p].push_back(u); 67 } 68 st.push_back(v); 69 } 70 while((int)st.size()>1){int u=st.back(); st.pop_back(); int p=st.back(); vt[p].push_back(u);} 71 72 // mark important nodes 73 unordered_set<int> S(a.begin(), a.end()); 74 75 // DFS over VT to compute s_v and accumulate answer 76 long long ans=0; 77 function<int(int)> dfs = [&](int u){ 78 int cnt = S.count(u) ? 1 : 0; 79 for(int v: vt[u]){ 80 int sub = dfs(v); 81 long long len = (long long)(lca.depth[v]-lca.depth[u]); 82 ans += len * (long long)sub * (long long)( (int)a.size() - sub ); 83 cnt += sub; 84 } 85 return cnt; 86 }; 87 88 dfs(nodes[0]); 89 cout<<ans<<"\n"; 90 return 0; 91 } 92
This program computes the sum of pairwise distances among the marked nodes. After building the virtual tree, it runs a single DFS to count how many marked nodes lie in each child subtree. For each virtual edge (p→v), the number of pairs split by that edge is s_v·(k−s_v), and each such pair contributes the edge length to the total distance. Summing over edges yields the total pairwise distance.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 /* 5 Supports multiple queries on a weighted tree using a Virtual Tree per query. 6 - Preprocess LCA with distances-to-root D[v] (weighted depths). 7 - For each query's marked set S: 8 * Build VT. 9 * Compute Steiner subtree total weight. 10 * Compute sum of pairwise weighted distances among S. 11 Efficient clearing: only touch nodes present in VT. 12 */ 13 14 struct LCA { 15 int n, LOG, root, timer = 0; 16 vector<vector<int>> up; 17 vector<int> tin, tout, depth; 18 vector<long long> D; // distance to root (weighted) 19 vector<vector<pair<int,long long>>> g; 20 21 void init(int _n, int _root, const vector<vector<pair<int,long long>>>& _g){ 22 n=_n; root=_root; g=_g; 23 LOG=1; while((1<<LOG)<=n) ++LOG; 24 up.assign(n+1, vector<int>(LOG)); 25 tin.assign(n+1,0); tout.assign(n+1,0); depth.assign(n+1,0); 26 D.assign(n+1,0); 27 dfs(root, root); 28 } 29 void dfs(int v,int p){ 30 tin[v]=++timer; up[v][0]=p; 31 for(int j=1;j<LOG;++j) up[v][j]=up[ up[v][j-1] ][j-1]; 32 for(auto [to,w]: g[v]) if(to!=p){ 33 depth[to]=depth[v]+1; D[to]=D[v]+w; dfs(to,v); 34 } 35 tout[v]=timer; 36 } 37 bool is_ancestor(int u,int v) const {return tin[u]<=tin[v] && tout[v]<=tout[u];} 38 int lca(int u,int v) const{ 39 if(is_ancestor(u,v)) return u; if(is_ancestor(v,u)) return v; 40 int x=u; for(int j=LOG-1;j>=0;--j){int a=up[x][j]; if(!is_ancestor(a,v)) x=a;} return up[x][0]; 41 } 42 long long dist(int u,int v) const { return D[u]+D[v]-2LL*D[lca(u,v)]; } 43 }; 44 45 int main(){ 46 ios::sync_with_stdio(false); 47 cin.tie(nullptr); 48 49 int n; cin>>n; 50 vector<vector<pair<int,long long>>> g(n+1); 51 for(int i=0;i<n-1;++i){ 52 int u,v; long long w; cin>>u>>v>>w; 53 g[u].push_back({v,w}); g[v].push_back({u,w}); 54 } 55 LCA lca; lca.init(n, 1, g); 56 57 int Q; cin>>Q; 58 59 // We'll reuse these per query to avoid reallocation overhead 60 vector<vector<pair<int,long long>>> vt(n+1); 61 vector<int> touched; touched.reserve(2e5); 62 63 while(Q--){ 64 int k; cin>>k; vector<int> a(k); 65 for(int i=0;i<k;++i) cin>>a[i]; 66 sort(a.begin(), a.end(), [&](int x,int y){return lca.tin[x]<lca.tin[y];}); 67 a.erase(unique(a.begin(), a.end()), a.end()); 68 69 vector<int> nodes=a; 70 for(int i=0;i+1<(int)a.size();++i) nodes.push_back(lca.lca(a[i], a[i+1])); 71 sort(nodes.begin(), nodes.end(), [&](int x,int y){return lca.tin[x]<lca.tin[y];}); 72 nodes.erase(unique(nodes.begin(), nodes.end()), nodes.end()); 73 74 // Clean only touched from previous query 75 for(int v: touched) vt[v].clear(); 76 touched.clear(); 77 for(int v: nodes) touched.push_back(v); 78 79 // Build weighted VT: edge weight is dist(parent, child) in original tree 80 vector<int> st; st.push_back(nodes[0]); 81 auto is_anc = [&](int u,int v){return lca.is_ancestor(u,v);} ; 82 for(int i=1;i<(int)nodes.size();++i){ 83 int v=nodes[i]; 84 while(!is_anc(st.back(), v)){ 85 int u=st.back(); st.pop_back(); 86 int p=st.back(); long long w = lca.dist(p,u); 87 vt[p].push_back({u,w}); 88 } 89 st.push_back(v); 90 } 91 while((int)st.size()>1){int u=st.back(); st.pop_back(); int p=st.back(); long long w=lca.dist(p,u); vt[p].push_back({u,w});} 92 93 // Marked set lookup 94 unordered_set<int> S(a.begin(), a.end()); 95 int K = (int)S.size(); 96 97 // Compute Steiner length (weighted) and pairwise sum (weighted) 98 long long steiner_len = 0; 99 long long pair_sum = 0; 100 101 function<int(int)> dfs = [&](int u){ 102 int cnt = S.count(u) ? 1 : 0; 103 for(auto [v,w]: vt[u]){ 104 int sub = dfs(v); 105 steiner_len += w; // each VT edge counted once 106 pair_sum += w * 1LL * sub * 1LL * (K - sub); // contribution formula 107 cnt += sub; 108 } 109 return cnt; 110 }; 111 dfs(nodes[0]); 112 113 cout << steiner_len << " " << pair_sum << "\n"; 114 } 115 return 0; 116 } 117
This multi-query solution on a weighted tree constructs a virtual tree per query and computes two quantities: the total weight of the minimal subtree connecting the marked nodes and the weighted sum of pairwise distances. It cleans only the adjacency lists of touched VT nodes per query, ensuring O(k) memory reset. Weighted distances are handled via an accumulated distance-to-root array D[v].