Tree Isomorphism
Key Points
- •Tree isomorphism asks whether two trees have exactly the same shape, ignoring vertex names.
- •For rooted trees, compute a canonical form bottom-up: encode each node from the sorted encodings of its children; equal canonical forms imply isomorphism.
- •For unrooted trees, first find the center(s) of each tree, root at the center(s), and compare their canonical forms; trees are isomorphic if any pair of center-rooted forms match.
- •The AHU algorithm provides a collision-free canonical string using parentheses; hashing can speed it up but needs care to avoid collisions.
- •Typical implementations are O(n log n) because we sort children encodings at each node; with careful bucket/radix techniques, it can be O(n).
- •Applications include counting distinct tree shapes, subtree pattern checks, and symmetry detection via canonical rooting at centers.
- •Beware of pitfalls: forgetting to sort child labels, hashing collisions, not excluding the parent in recursion, and mishandling the 1-or-2 centers case.
- •In competitive programming, the center+AHU pattern is a robust, easy-to-code solution for n up to around 2e5 with careful recursion or iterative DFS.
Prerequisites
- →Tree basics (nodes, edges, degrees) — Understanding what a tree is and its properties is essential to reason about isomorphism.
- →DFS/BFS — We need traversals to compute distances, centers, and to perform bottom-up encodings.
- →Sorting — Child encodings must be sorted to remove sibling order; sorting dominates runtime.
- →Asymptotic analysis (Big-O) — To analyze O(n log n) vs O(n) variants and memory usage.
- →Hashing and collisions — Hash-based encodings trade small collision risk for speed and memory benefits.
- →Recursion and call stack — AHU encoding is naturally recursive and can hit stack limits on deep trees.
- →Graph distances and diameter — Centers are found via diameter endpoints using BFS distances.
- →String construction and interning — AHU canonical forms use strings; efficient construction prevents overhead.
- →Maps and dictionaries — Mapping child-label multisets to integers enables compact canonical labels.
- →Basic probability — To reason about and bound hash collision probabilities, especially with double hashing.
Detailed Explanation
Tap terms for definitions01Overview
Tree isomorphism is the problem of determining whether two trees have the same structure when vertex names are ignored. Imagine two trees drawn with different labels or orderings of children; if you can redraw one to look like the other by just relabeling nodes and reordering children, they are isomorphic. The core idea to test this efficiently is to assign each subtree a canonical representation that is independent of node IDs and child order. For rooted trees, we do a bottom-up procedure: compute a representation for each node by combining the (sorted) representations of its children. For unrooted trees, trees may have one or two natural centers (nodes minimizing the maximum distance to any leaf). Rooting both trees at their center(s) and comparing canonical forms yields an isomorphism test that is independent of how the tree is drawn. In practice, we use two flavors of canonical representation. The classic AHU algorithm uses strings like '(()(()))' built by sorting children strings and wrapping them in parentheses—this is collision-free and conceptually simple. Alternatively, polynomial hashing maps subtree multisets to integers to reduce memory and speed comparisons, but one must handle collision risks (often by double hashing). The approach is widely used in contest problems for deduplicating trees, matching patterns, and recognizing symmetry. Its runtime is typically O(n log n) due to sorting children; with careful global bucket sorting of labels across the entire tree level-by-level, linear time O(n) is achievable.
02Intuition & Analogies
Think of a tree like a family tree built from Lego blocks. Each node is a Lego piece, and each subtree is a shape made from pieces connected under that node. Two trees are isomorphic if you can rearrange sibling blocks (children) without changing the overall stacked shape. So, instead of comparing raw IDs, we compare the shapes of their subtrees. To do this, give every leaf the same simple tag—say, a small block labeled '()'. For a parent, collect the tags of its children. Since the order of children does not matter (you can reorder Lego pieces on the same level without changing the shape), sort those tags. Then glue them together and wrap the result with a new pair of parentheses to make the parent’s tag. If two parents have the same multiset of child shapes, they will get exactly the same tag. Climb up the tree until the root: its tag describes the entire tree. Two rooted trees are isomorphic if and only if their root tags match. For unrooted trees, choose a neutral place to start—its balance point(s). If you hang the tree by a node, the center is where the tree balances best (minimizes the farthest leaf distance). A tree has one or two centers. Root the tree there and compute the tags. If either center-rooted tag of tree A matches a center-rooted tag of tree B, the two trees have the same shape. The AHU encoding uses literal parentheses so that equal shapes get equal strings; hashing is like compressing the tag into numbers for speed, with a tiny risk of collision unless you take precautions.
03Formal Definition
04When to Use
Use tree isomorphism when node identities are irrelevant, and only structure matters. Typical scenarios: deduplicating generated trees (counting distinct shapes), verifying that two differently-labeled trees represent the same hierarchy, or checking for symmetry by comparing a tree to itself under different rootings. In competitive programming, common tasks include: determining if two given trees are isomorphic, grouping many trees by shape, or verifying whether a tree contains a subtree isomorphic to a pattern (with adaptations). Choose the rooted version when roots are fixed and meaningful (e.g., comparing two organizational charts rooted at CEOs). Choose the unrooted (center-rooted) version when the trees are unlabeled and unrooted (e.g., comparing chemical molecule trees ignoring the choice of starting atom). Use AHU string canonicalization when you need a deterministic, collision-free representation and can afford O(n log n) time and O(n) space. Use hashing (possibly double hashing) when performance and memory are tight and a negligible collision risk is acceptable; this is often fine in contests with 64-bit or double-mod hashing and careful seed choices. For very large trees or strict guarantees, consider linear-time variants (bucket/radix sorting child labels globally) or iterative implementations to avoid recursion depth limits.
⚠️Common Mistakes
- Forgetting to sort child encodings: sibling order is irrelevant; failing to sort makes the representation order-dependent and incorrect.
- Not excluding the parent in DFS: you must pass the parent to avoid walking back up and creating cycles in recursion or double-counting children.
- Mishandling centers: unrooted trees can have 1 or 2 centers. You must consider both centers; comparing only one may miss isomorphic cases (e.g., even-diameter paths).
- Hash collisions: polynomial hashing is fast but can collide. Use large moduli, 64-bit mixing, or double hashing. If you need absolute correctness, use AHU parentheses or map-to-integers with deterministic canonicalization.
- Deep recursion limits: in very deep trees (like paths), recursion depth can reach O(n). In C++, either raise stack limits, use iterative DFS, or enable tail recursion transformations cautiously.
- Memory blow-up with strings: naive string concatenation at every node may create large intermediate strings. Mitigate by reserving capacity, interning labels to integers, or hashing. For guaranteed correctness with less memory, compress each sorted child label vector to an integer label via a global dictionary.
- Assuming it works on general graphs: these methods exploit acyclicity. For graphs with cycles, tree isomorphism algorithms don’t apply directly; you need graph isomorphism techniques.
Key Formulas
Graph Isomorphism Definition
Explanation: Two trees are isomorphic if there is a one-to-one relabeling of vertices that preserves edges. This formalizes the idea of same shape ignoring labels.
Rooted Isomorphism Constraint
Explanation: For rooted trees, the isomorphism must map the root to the root and respect parent-child relationships. This fixes the orientation of edges from the root.
Eccentricity and Center
Explanation: Eccentricity measures how far the farthest node is from v. Centers are nodes with minimal eccentricity; a tree has one or two centers.
Canonical Form Recurrence
Explanation: The canonical form of a node depends only on the multiset of its children’s canonical forms. The encoding function sorts and wraps to eliminate order.
Typical Time Complexity
Explanation: We sort the child encodings for each node v; sorting deg(v) items costs deg(v) log deg(v). Summing over all nodes yields O(n log n) in the worst case.
Polynomial Hash Combine
Explanation: After sorting child hashes , combine them with a polynomial using base p, offset b, and modulus M. This produces an order-independent numeric label.
Double Hash Collision Bound
Explanation: When using two independent moduli, the chance that two different structures collide in both is roughly the sum of their inverse moduli, typically negligible with large primes.
Tree Diameter
Explanation: The diameter is the longest shortest-path length in the tree. Centers lie on any diameter path and occupy the middle one or two positions.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Compute AHU canonical form for a rooted tree. 5 // Representation: "(" + sorted(concat(children)) + ")" 6 7 string canonical_form_rooted(const vector<vector<int>>& g, int v, int parent) { 8 vector<string> child_forms; 9 child_forms.reserve(g[v].size()); 10 for (int u : g[v]) if (u != parent) { 11 child_forms.push_back(canonical_form_rooted(g, u, v)); 12 } 13 sort(child_forms.begin(), child_forms.end()); // order-invariance among siblings 14 // Concatenate with parentheses wrapper 15 string res; 16 size_t total_len = 2; // for '(' and ')' 17 for (auto &s : child_forms) total_len += s.size(); 18 res.reserve(total_len); 19 res.push_back('('); 20 for (auto &s : child_forms) res += s; 21 res.push_back(')'); 22 return res; 23 } 24 25 int main() { 26 ios::sync_with_stdio(false); 27 cin.tie(nullptr); 28 29 // Example: two rooted trees with n=7 nodes each (0-indexed) 30 // Tree A rooted at 0 31 int n = 7; 32 vector<vector<int>> A(n), B(n); 33 auto add_edge = [&](vector<vector<int>>& G, int x, int y){ G[x].push_back(y); G[y].push_back(x); }; 34 35 // Tree A edges 36 add_edge(A,0,1); add_edge(A,0,2); add_edge(A,1,3); add_edge(A,1,4); add_edge(A,2,5); add_edge(A,2,6); 37 // Tree B edges (same shape but different labeling and child order) 38 add_edge(B,6,5); add_edge(B,5,2); add_edge(B,2,0); add_edge(B,0,1); add_edge(B,1,3); add_edge(B,1,4); 39 40 int rootA = 0; 41 int rootB = 2; // arbitrary; rooted iso requires fixed roots 42 43 string fa = canonical_form_rooted(A, rootA, -1); 44 string fb = canonical_form_rooted(B, rootB, -1); 45 46 cout << (fa == fb ? "Rooted-isomorphic\n" : "Not rooted-isomorphic\n"); 47 // Prints "Rooted-isomorphic" 48 return 0; 49 } 50
This program computes a collision-free canonical string for each rooted tree using the AHU method. It recursively encodes each node from its children’s encodings, sorted to remove sibling-order effects, then wraps them in parentheses. Two rooted trees are isomorphic if their root encodings match exactly.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // BFS to find farthest node and parent array 5 pair<int, vector<int>> bfs_far(const vector<vector<int>>& g, int s) { 6 int n = (int)g.size(); 7 vector<int> dist(n, -1), par(n, -1); 8 queue<int> q; 9 q.push(s); dist[s] = 0; 10 while (!q.empty()) { 11 int v = q.front(); q.pop(); 12 for (int u : g[v]) if (dist[u] == -1) { 13 dist[u] = dist[v] + 1; par[u] = v; q.push(u); 14 } 15 } 16 int far = s; 17 for (int i = 0; i < n; ++i) if (dist[i] > dist[far]) far = i; 18 return {far, par}; 19 } 20 21 vector<int> tree_centers(const vector<vector<int>>& g) { 22 int n = (int)g.size(); 23 auto [a, pa] = bfs_far(g, 0); 24 auto [b, pb] = bfs_far(g, a); 25 // Reconstruct diameter path from b back to a 26 vector<int> path; int cur = b; 27 while (cur != -1) { path.push_back(cur); if (cur == a) break; cur = pb[cur]; } 28 int m = (int)path.size(); 29 if (m % 2 == 1) return { path[m/2] }; 30 else return { path[m/2 - 1], path[m/2] }; 31 } 32 33 string canonical_form_rooted(const vector<vector<int>>& g, int v, int parent) { 34 vector<string> child_forms; 35 child_forms.reserve(g[v].size()); 36 for (int u : g[v]) if (u != parent) child_forms.push_back(canonical_form_rooted(g, u, v)); 37 sort(child_forms.begin(), child_forms.end()); 38 string res; size_t total_len = 2; for (auto &s : child_forms) total_len += s.size(); 39 res.reserve(total_len); res.push_back('('); 40 for (auto &s : child_forms) res += s; res.push_back(')'); 41 return res; 42 } 43 44 string canonical_form_unrooted(const vector<vector<int>>& g) { 45 vector<int> centers = tree_centers(g); 46 vector<string> forms; 47 for (int c : centers) forms.push_back(canonical_form_rooted(g, c, -1)); 48 // Unique canonical form: min over possible center roots 49 if (forms.size() == 1) return forms[0]; 50 return min(forms[0], forms[1]); 51 } 52 53 int main(){ 54 ios::sync_with_stdio(false); 55 cin.tie(nullptr); 56 57 // Two unrooted trees that are isomorphic 58 int n = 8; 59 vector<vector<int>> T1(n), T2(n); 60 auto add = [&](vector<vector<int>>& G, int x, int y){ G[x].push_back(y); G[y].push_back(x); }; 61 62 // T1: a balanced binary-like shape 63 add(T1,0,1); add(T1,0,2); add(T1,1,3); add(T1,1,4); add(T1,2,5); add(T1,2,6); add(T1,6,7); 64 65 // T2: same shape, permuted labels 66 add(T2,7,6); add(T2,6,2); add(T2,2,0); add(T2,0,1); add(T2,1,3); add(T2,1,4); add(T2,2,5); 67 68 string f1 = canonical_form_unrooted(T1); 69 string f2 = canonical_form_unrooted(T2); 70 cout << (f1 == f2 ? "Isomorphic\n" : "Not isomorphic\n"); 71 72 // A non-isomorphic example 73 vector<vector<int>> T3(5); 74 add(T3,0,1); add(T3,1,2); add(T3,2,3); add(T3,3,4); // a path of 5 75 cout << (canonical_form_unrooted(T1) == canonical_form_unrooted(T3) ? "Isomorphic\n" : "Not isomorphic\n"); 76 return 0; 77 } 78
We locate the center(s) of each unrooted tree using two BFS passes along the diameter. Then we root the tree at each center, compute the AHU canonical form, and pick the lexicographically smallest as the tree’s canonical representation. Two unrooted trees are isomorphic if these canonical representations match.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 pair<int, vector<int>> bfs_far(const vector<vector<int>>& g, int s) { 5 int n = (int)g.size(); 6 vector<int> dist(n, -1), par(n, -1); 7 queue<int> q; q.push(s); dist[s] = 0; 8 while (!q.empty()) { 9 int v = q.front(); q.pop(); 10 for (int u : g[v]) if (dist[u] == -1) { dist[u] = dist[v] + 1; par[u] = v; q.push(u);} } 11 int far = s; for (int i = 0; i < n; ++i) if (dist[i] > dist[far]) far = i; return {far, par}; 12 } 13 14 vector<int> centers(const vector<vector<int>>& g){ 15 auto [a, pa] = bfs_far(g, 0); 16 auto [b, pb] = bfs_far(g, a); 17 vector<int> path; int cur = b; while (cur != -1){ path.push_back(cur); if (cur == a) break; cur = pb[cur]; } 18 int m = (int)path.size(); 19 if (m % 2) return {path[m/2]}; 20 return {path[m/2 - 1], path[m/2]}; 21 } 22 23 string canon_rooted(const vector<vector<int>>& g, int v, int p){ 24 vector<string> cs; cs.reserve(g[v].size()); 25 for (int u : g[v]) if (u != p) cs.push_back(canon_rooted(g, u, v)); 26 sort(cs.begin(), cs.end()); 27 string r; size_t L = 2; for (auto &s: cs) L += s.size(); r.reserve(L); 28 r.push_back('('); for (auto &s: cs) r += s; r.push_back(')'); return r; 29 } 30 31 string canon_unrooted(const vector<vector<int>>& g){ 32 auto c = centers(g); vector<string> forms; forms.reserve(c.size()); 33 for (int x : c) forms.push_back(canon_rooted(g, x, -1)); 34 return forms.size() == 1 ? forms[0] : min(forms[0], forms[1]); 35 } 36 37 int main(){ 38 ios::sync_with_stdio(false); 39 cin.tie(nullptr); 40 41 // Build a batch of trees and count distinct shapes 42 vector<vector<vector<int>>> batch; 43 44 // Tree 1: star with 4 leaves 45 { 46 int n = 5; vector<vector<int>> G(n); 47 auto add=[&](int x,int y){G[x].push_back(y);G[y].push_back(x);} ; 48 add(0,1); add(0,2); add(0,3); add(0,4); batch.push_back(G); 49 } 50 // Tree 2: another star with 4 leaves (different labels) 51 { 52 int n = 5; vector<vector<int>> G(n); 53 auto add=[&](int x,int y){G[x].push_back(y);G[y].push_back(x);} ; 54 add(3,1); add(3,0); add(3,2); add(3,4); batch.push_back(G); 55 } 56 // Tree 3: path of 5 57 { 58 int n = 5; vector<vector<int>> G(n); 59 auto add=[&](int x,int y){G[x].push_back(y);G[y].push_back(x);} ; 60 add(0,1); add(1,2); add(2,3); add(3,4); batch.push_back(G); 61 } 62 63 unordered_set<string> shapes; 64 for (auto &G : batch) shapes.insert(canon_unrooted(G)); 65 66 cout << "Distinct shapes: " << shapes.size() << "\n"; // Expected 2 67 return 0; 68 } 69
This example groups multiple input trees by structure. We compute the center-rooted canonical form for each and insert it into an unordered_set of strings. Equal strings collapse to one representative, so the set’s size equals the number of distinct unlabeled shapes.