⚙️AlgorithmIntermediate

Tree Isomorphism

Key Points

  • Tree isomorphism asks whether two trees have exactly the same shape, ignoring vertex names.
  • For rooted trees, compute a canonical form bottom-up: encode each node from the sorted encodings of its children; equal canonical forms imply isomorphism.
  • For unrooted trees, first find the center(s) of each tree, root at the center(s), and compare their canonical forms; trees are isomorphic if any pair of center-rooted forms match.
  • The AHU algorithm provides a collision-free canonical string using parentheses; hashing can speed it up but needs care to avoid collisions.
  • Typical implementations are O(n log n) because we sort children encodings at each node; with careful bucket/radix techniques, it can be O(n).
  • Applications include counting distinct tree shapes, subtree pattern checks, and symmetry detection via canonical rooting at centers.
  • Beware of pitfalls: forgetting to sort child labels, hashing collisions, not excluding the parent in recursion, and mishandling the 1-or-2 centers case.
  • In competitive programming, the center+AHU pattern is a robust, easy-to-code solution for n up to around 2e5 with careful recursion or iterative DFS.

Prerequisites

  • Tree basics (nodes, edges, degrees)Understanding what a tree is and its properties is essential to reason about isomorphism.
  • DFS/BFSWe need traversals to compute distances, centers, and to perform bottom-up encodings.
  • SortingChild encodings must be sorted to remove sibling order; sorting dominates runtime.
  • Asymptotic analysis (Big-O)To analyze O(n log n) vs O(n) variants and memory usage.
  • Hashing and collisionsHash-based encodings trade small collision risk for speed and memory benefits.
  • Recursion and call stackAHU encoding is naturally recursive and can hit stack limits on deep trees.
  • Graph distances and diameterCenters are found via diameter endpoints using BFS distances.
  • String construction and interningAHU canonical forms use strings; efficient construction prevents overhead.
  • Maps and dictionariesMapping child-label multisets to integers enables compact canonical labels.
  • Basic probabilityTo reason about and bound hash collision probabilities, especially with double hashing.

Detailed Explanation

Tap terms for definitions

01Overview

Tree isomorphism is the problem of determining whether two trees have the same structure when vertex names are ignored. Imagine two trees drawn with different labels or orderings of children; if you can redraw one to look like the other by just relabeling nodes and reordering children, they are isomorphic. The core idea to test this efficiently is to assign each subtree a canonical representation that is independent of node IDs and child order. For rooted trees, we do a bottom-up procedure: compute a representation for each node by combining the (sorted) representations of its children. For unrooted trees, trees may have one or two natural centers (nodes minimizing the maximum distance to any leaf). Rooting both trees at their center(s) and comparing canonical forms yields an isomorphism test that is independent of how the tree is drawn. In practice, we use two flavors of canonical representation. The classic AHU algorithm uses strings like '(()(()))' built by sorting children strings and wrapping them in parentheses—this is collision-free and conceptually simple. Alternatively, polynomial hashing maps subtree multisets to integers to reduce memory and speed comparisons, but one must handle collision risks (often by double hashing). The approach is widely used in contest problems for deduplicating trees, matching patterns, and recognizing symmetry. Its runtime is typically O(n log n) due to sorting children; with careful global bucket sorting of labels across the entire tree level-by-level, linear time O(n) is achievable.

02Intuition & Analogies

Think of a tree like a family tree built from Lego blocks. Each node is a Lego piece, and each subtree is a shape made from pieces connected under that node. Two trees are isomorphic if you can rearrange sibling blocks (children) without changing the overall stacked shape. So, instead of comparing raw IDs, we compare the shapes of their subtrees. To do this, give every leaf the same simple tag—say, a small block labeled '()'. For a parent, collect the tags of its children. Since the order of children does not matter (you can reorder Lego pieces on the same level without changing the shape), sort those tags. Then glue them together and wrap the result with a new pair of parentheses to make the parent’s tag. If two parents have the same multiset of child shapes, they will get exactly the same tag. Climb up the tree until the root: its tag describes the entire tree. Two rooted trees are isomorphic if and only if their root tags match. For unrooted trees, choose a neutral place to start—its balance point(s). If you hang the tree by a node, the center is where the tree balances best (minimizes the farthest leaf distance). A tree has one or two centers. Root the tree there and compute the tags. If either center-rooted tag of tree A matches a center-rooted tag of tree B, the two trees have the same shape. The AHU encoding uses literal parentheses so that equal shapes get equal strings; hashing is like compressing the tag into numbers for speed, with a tiny risk of collision unless you take precautions.

03Formal Definition

Let T1 = (V1, E1) and T2 = (V2, E2) be trees. They are isomorphic if there exists a bijection \(: \) such that \((u, v) ((u), (v)) \). For rooted trees with roots r1 and r2, we further require \(() = \) and that parent-child adjacency is preserved. A canonical form F(T, r) for a rooted tree (T, r) is a mapping from nodes to strings or integers such that for any nodes u, v, F(u) = F(v) if and only if the rooted subtrees at u and v are isomorphic. The AHU canonical form is defined recursively: for a leaf, F = "()"; for an internal node, F = "(" + concatenation of the sorted list of its children’s F + ")". The sorting ensures order-invariance among siblings. For unrooted trees, let \((v) = (v,u)\) be the eccentricity, and let the center set be \(C = (v)\); it is known that {1, 2}. We define the canonical form of an unrooted tree T as \(\{ F(T, c) : c C\}\) under lexicographic order, or as the set of forms at the centers. Two unrooted trees are isomorphic if and only if these canonical forms match.

04When to Use

Use tree isomorphism when node identities are irrelevant, and only structure matters. Typical scenarios: deduplicating generated trees (counting distinct shapes), verifying that two differently-labeled trees represent the same hierarchy, or checking for symmetry by comparing a tree to itself under different rootings. In competitive programming, common tasks include: determining if two given trees are isomorphic, grouping many trees by shape, or verifying whether a tree contains a subtree isomorphic to a pattern (with adaptations). Choose the rooted version when roots are fixed and meaningful (e.g., comparing two organizational charts rooted at CEOs). Choose the unrooted (center-rooted) version when the trees are unlabeled and unrooted (e.g., comparing chemical molecule trees ignoring the choice of starting atom). Use AHU string canonicalization when you need a deterministic, collision-free representation and can afford O(n log n) time and O(n) space. Use hashing (possibly double hashing) when performance and memory are tight and a negligible collision risk is acceptable; this is often fine in contests with 64-bit or double-mod hashing and careful seed choices. For very large trees or strict guarantees, consider linear-time variants (bucket/radix sorting child labels globally) or iterative implementations to avoid recursion depth limits.

⚠️Common Mistakes

  • Forgetting to sort child encodings: sibling order is irrelevant; failing to sort makes the representation order-dependent and incorrect.
  • Not excluding the parent in DFS: you must pass the parent to avoid walking back up and creating cycles in recursion or double-counting children.
  • Mishandling centers: unrooted trees can have 1 or 2 centers. You must consider both centers; comparing only one may miss isomorphic cases (e.g., even-diameter paths).
  • Hash collisions: polynomial hashing is fast but can collide. Use large moduli, 64-bit mixing, or double hashing. If you need absolute correctness, use AHU parentheses or map-to-integers with deterministic canonicalization.
  • Deep recursion limits: in very deep trees (like paths), recursion depth can reach O(n). In C++, either raise stack limits, use iterative DFS, or enable tail recursion transformations cautiously.
  • Memory blow-up with strings: naive string concatenation at every node may create large intermediate strings. Mitigate by reserving capacity, interning labels to integers, or hashing. For guaranteed correctness with less memory, compress each sorted child label vector to an integer label via a global dictionary.
  • Assuming it works on general graphs: these methods exploit acyclicity. For graphs with cycles, tree isomorphism algorithms don’t apply directly; you need graph isomorphism techniques.

Key Formulas

Graph Isomorphism Definition

Explanation: Two trees are isomorphic if there is a one-to-one relabeling of vertices that preserves edges. This formalizes the idea of same shape ignoring labels.

Rooted Isomorphism Constraint

Explanation: For rooted trees, the isomorphism must map the root to the root and respect parent-child relationships. This fixes the orientation of edges from the root.

Eccentricity and Center

Explanation: Eccentricity measures how far the farthest node is from v. Centers are nodes with minimal eccentricity; a tree has one or two centers.

Canonical Form Recurrence

Explanation: The canonical form of a node depends only on the multiset of its children’s canonical forms. The encoding function sorts and wraps to eliminate order.

Typical Time Complexity

Explanation: We sort the child encodings for each node v; sorting deg(v) items costs deg(v) log deg(v). Summing over all nodes yields O(n log n) in the worst case.

Polynomial Hash Combine

Explanation: After sorting child hashes , combine them with a polynomial using base p, offset b, and modulus M. This produces an order-independent numeric label.

Double Hash Collision Bound

Explanation: When using two independent moduli, the chance that two different structures collide in both is roughly the sum of their inverse moduli, typically negligible with large primes.

Tree Diameter

Explanation: The diameter is the longest shortest-path length in the tree. Centers lie on any diameter path and occupy the middle one or two positions.

Complexity Analysis

For rooted tree isomorphism via AHU parentheses, each node builds its encoding from its children’s encodings. The dominant cost is sorting the list of child encodings at each node v. If a node has degree deg(v), sorting costs O(deg(v) log deg(v)). Summed over all nodes, the total cost is O(∑ deg(v) log deg(v)) = O(n log n) in the worst case (e.g., a star where one node sorts n−1 items). String concatenations, if done naively, can add overhead; however, since each node contributes a constant number of parentheses and appears exactly once in a parent’s concatenation, total output size is O(n). With careful allocation (e.g., building strings bottom-up and avoiding repeated copying), the total still fits within O(n log n) time and O(n) space. Using integer labels (canonical labeling) instead of strings can speed up comparisons and save memory. We map each sorted vector of child labels to a new integer ID via a global unordere or ordered map; constructing the vector still requires sorting. If we globally bucket-sort vectors of labels by length and content (i.e., radix/bucket sort where labels are small integers that increase monotonically per level), we can avoid log factors and achieve O(n) time overall, as in the linear-time AHU variant. For unrooted trees, finding centers by two BFS runs costs O(n). Then we run the rooted isomorphism from 1 or 2 centers. Thus, the total time is O(n log n) typically, or O(n) with linear-time labeling. Space complexity is O(n) for storing the adjacency lists, recursion/stack, and the labels/strings. When using double hashing, space stays O(n), storing a pair of 64-bit integers per node and possibly a dictionary of seen child-multisets, whose total size is linear in n.

Code Examples

Rooted tree isomorphism via AHU parentheses encoding
1#include <bits/stdc++.h>
2using namespace std;
3
4// Compute AHU canonical form for a rooted tree.
5// Representation: "(" + sorted(concat(children)) + ")"
6
7string canonical_form_rooted(const vector<vector<int>>& g, int v, int parent) {
8 vector<string> child_forms;
9 child_forms.reserve(g[v].size());
10 for (int u : g[v]) if (u != parent) {
11 child_forms.push_back(canonical_form_rooted(g, u, v));
12 }
13 sort(child_forms.begin(), child_forms.end()); // order-invariance among siblings
14 // Concatenate with parentheses wrapper
15 string res;
16 size_t total_len = 2; // for '(' and ')'
17 for (auto &s : child_forms) total_len += s.size();
18 res.reserve(total_len);
19 res.push_back('(');
20 for (auto &s : child_forms) res += s;
21 res.push_back(')');
22 return res;
23}
24
25int main() {
26 ios::sync_with_stdio(false);
27 cin.tie(nullptr);
28
29 // Example: two rooted trees with n=7 nodes each (0-indexed)
30 // Tree A rooted at 0
31 int n = 7;
32 vector<vector<int>> A(n), B(n);
33 auto add_edge = [&](vector<vector<int>>& G, int x, int y){ G[x].push_back(y); G[y].push_back(x); };
34
35 // Tree A edges
36 add_edge(A,0,1); add_edge(A,0,2); add_edge(A,1,3); add_edge(A,1,4); add_edge(A,2,5); add_edge(A,2,6);
37 // Tree B edges (same shape but different labeling and child order)
38 add_edge(B,6,5); add_edge(B,5,2); add_edge(B,2,0); add_edge(B,0,1); add_edge(B,1,3); add_edge(B,1,4);
39
40 int rootA = 0;
41 int rootB = 2; // arbitrary; rooted iso requires fixed roots
42
43 string fa = canonical_form_rooted(A, rootA, -1);
44 string fb = canonical_form_rooted(B, rootB, -1);
45
46 cout << (fa == fb ? "Rooted-isomorphic\n" : "Not rooted-isomorphic\n");
47 // Prints "Rooted-isomorphic"
48 return 0;
49}
50

This program computes a collision-free canonical string for each rooted tree using the AHU method. It recursively encodes each node from its children’s encodings, sorted to remove sibling-order effects, then wraps them in parentheses. Two rooted trees are isomorphic if their root encodings match exactly.

Time: O(n log n) due to sorting child encodings at each nodeSpace: O(n) for adjacency, recursion stack, and strings
Unrooted tree isomorphism using centers + AHU encoding
1#include <bits/stdc++.h>
2using namespace std;
3
4// BFS to find farthest node and parent array
5pair<int, vector<int>> bfs_far(const vector<vector<int>>& g, int s) {
6 int n = (int)g.size();
7 vector<int> dist(n, -1), par(n, -1);
8 queue<int> q;
9 q.push(s); dist[s] = 0;
10 while (!q.empty()) {
11 int v = q.front(); q.pop();
12 for (int u : g[v]) if (dist[u] == -1) {
13 dist[u] = dist[v] + 1; par[u] = v; q.push(u);
14 }
15 }
16 int far = s;
17 for (int i = 0; i < n; ++i) if (dist[i] > dist[far]) far = i;
18 return {far, par};
19}
20
21vector<int> tree_centers(const vector<vector<int>>& g) {
22 int n = (int)g.size();
23 auto [a, pa] = bfs_far(g, 0);
24 auto [b, pb] = bfs_far(g, a);
25 // Reconstruct diameter path from b back to a
26 vector<int> path; int cur = b;
27 while (cur != -1) { path.push_back(cur); if (cur == a) break; cur = pb[cur]; }
28 int m = (int)path.size();
29 if (m % 2 == 1) return { path[m/2] };
30 else return { path[m/2 - 1], path[m/2] };
31}
32
33string canonical_form_rooted(const vector<vector<int>>& g, int v, int parent) {
34 vector<string> child_forms;
35 child_forms.reserve(g[v].size());
36 for (int u : g[v]) if (u != parent) child_forms.push_back(canonical_form_rooted(g, u, v));
37 sort(child_forms.begin(), child_forms.end());
38 string res; size_t total_len = 2; for (auto &s : child_forms) total_len += s.size();
39 res.reserve(total_len); res.push_back('(');
40 for (auto &s : child_forms) res += s; res.push_back(')');
41 return res;
42}
43
44string canonical_form_unrooted(const vector<vector<int>>& g) {
45 vector<int> centers = tree_centers(g);
46 vector<string> forms;
47 for (int c : centers) forms.push_back(canonical_form_rooted(g, c, -1));
48 // Unique canonical form: min over possible center roots
49 if (forms.size() == 1) return forms[0];
50 return min(forms[0], forms[1]);
51}
52
53int main(){
54 ios::sync_with_stdio(false);
55 cin.tie(nullptr);
56
57 // Two unrooted trees that are isomorphic
58 int n = 8;
59 vector<vector<int>> T1(n), T2(n);
60 auto add = [&](vector<vector<int>>& G, int x, int y){ G[x].push_back(y); G[y].push_back(x); };
61
62 // T1: a balanced binary-like shape
63 add(T1,0,1); add(T1,0,2); add(T1,1,3); add(T1,1,4); add(T1,2,5); add(T1,2,6); add(T1,6,7);
64
65 // T2: same shape, permuted labels
66 add(T2,7,6); add(T2,6,2); add(T2,2,0); add(T2,0,1); add(T2,1,3); add(T2,1,4); add(T2,2,5);
67
68 string f1 = canonical_form_unrooted(T1);
69 string f2 = canonical_form_unrooted(T2);
70 cout << (f1 == f2 ? "Isomorphic\n" : "Not isomorphic\n");
71
72 // A non-isomorphic example
73 vector<vector<int>> T3(5);
74 add(T3,0,1); add(T3,1,2); add(T3,2,3); add(T3,3,4); // a path of 5
75 cout << (canonical_form_unrooted(T1) == canonical_form_unrooted(T3) ? "Isomorphic\n" : "Not isomorphic\n");
76 return 0;
77}
78

We locate the center(s) of each unrooted tree using two BFS passes along the diameter. Then we root the tree at each center, compute the AHU canonical form, and pick the lexicographically smallest as the tree’s canonical representation. Two unrooted trees are isomorphic if these canonical representations match.

Time: O(n log n) for AHU encoding plus O(n) for finding centers; overall O(n log n)Space: O(n) for adjacency, BFS arrays, recursion, and strings
Counting distinct unlabeled trees using center-rooted canonical forms
1#include <bits/stdc++.h>
2using namespace std;
3
4pair<int, vector<int>> bfs_far(const vector<vector<int>>& g, int s) {
5 int n = (int)g.size();
6 vector<int> dist(n, -1), par(n, -1);
7 queue<int> q; q.push(s); dist[s] = 0;
8 while (!q.empty()) {
9 int v = q.front(); q.pop();
10 for (int u : g[v]) if (dist[u] == -1) { dist[u] = dist[v] + 1; par[u] = v; q.push(u);} }
11 int far = s; for (int i = 0; i < n; ++i) if (dist[i] > dist[far]) far = i; return {far, par};
12}
13
14vector<int> centers(const vector<vector<int>>& g){
15 auto [a, pa] = bfs_far(g, 0);
16 auto [b, pb] = bfs_far(g, a);
17 vector<int> path; int cur = b; while (cur != -1){ path.push_back(cur); if (cur == a) break; cur = pb[cur]; }
18 int m = (int)path.size();
19 if (m % 2) return {path[m/2]};
20 return {path[m/2 - 1], path[m/2]};
21}
22
23string canon_rooted(const vector<vector<int>>& g, int v, int p){
24 vector<string> cs; cs.reserve(g[v].size());
25 for (int u : g[v]) if (u != p) cs.push_back(canon_rooted(g, u, v));
26 sort(cs.begin(), cs.end());
27 string r; size_t L = 2; for (auto &s: cs) L += s.size(); r.reserve(L);
28 r.push_back('('); for (auto &s: cs) r += s; r.push_back(')'); return r;
29}
30
31string canon_unrooted(const vector<vector<int>>& g){
32 auto c = centers(g); vector<string> forms; forms.reserve(c.size());
33 for (int x : c) forms.push_back(canon_rooted(g, x, -1));
34 return forms.size() == 1 ? forms[0] : min(forms[0], forms[1]);
35}
36
37int main(){
38 ios::sync_with_stdio(false);
39 cin.tie(nullptr);
40
41 // Build a batch of trees and count distinct shapes
42 vector<vector<vector<int>>> batch;
43
44 // Tree 1: star with 4 leaves
45 {
46 int n = 5; vector<vector<int>> G(n);
47 auto add=[&](int x,int y){G[x].push_back(y);G[y].push_back(x);} ;
48 add(0,1); add(0,2); add(0,3); add(0,4); batch.push_back(G);
49 }
50 // Tree 2: another star with 4 leaves (different labels)
51 {
52 int n = 5; vector<vector<int>> G(n);
53 auto add=[&](int x,int y){G[x].push_back(y);G[y].push_back(x);} ;
54 add(3,1); add(3,0); add(3,2); add(3,4); batch.push_back(G);
55 }
56 // Tree 3: path of 5
57 {
58 int n = 5; vector<vector<int>> G(n);
59 auto add=[&](int x,int y){G[x].push_back(y);G[y].push_back(x);} ;
60 add(0,1); add(1,2); add(2,3); add(3,4); batch.push_back(G);
61 }
62
63 unordered_set<string> shapes;
64 for (auto &G : batch) shapes.insert(canon_unrooted(G));
65
66 cout << "Distinct shapes: " << shapes.size() << "\n"; // Expected 2
67 return 0;
68}
69

This example groups multiple input trees by structure. We compute the center-rooted canonical form for each and insert it into an unordered_set of strings. Equal strings collapse to one representative, so the set’s size equals the number of distinct unlabeled shapes.

Time: O(total_n log total_n) across all trees, where total_n is the sum of nodesSpace: O(total_n) for adjacency, recursion, and canonical strings