Disjoint Set Union (Union-Find)
Key Points
- •Disjoint Set Union (Union-Find) maintains a collection of non-overlapping sets and supports fast merging and membership queries.
- •Path compression flattens trees during finds by pointing nodes directly to their set's root, greatly speeding up future queries.
- •Union by size/rank attaches the smaller tree under the larger, keeping trees shallow and operations efficient.
- •With both heuristics, m operations on n elements run in O(m (n)), which is effectively constant time per operation in practice.
- •DSU powers Kruskal’s Minimum Spanning Tree algorithm, cycle detection, and connected components in undirected graphs.
- •Rollback (undoable) DSU supports offline dynamic connectivity by reverting unions; it disables path compression to allow undo.
- •DSU is simple to implement with arrays and works well up to millions of elements when optimized.
- •Common pitfalls include forgetting to initialize parents, mixing 0/1-based indices, or combining rollback with path compression.
Prerequisites
- →Arrays and Indexing — DSU stores parent and size/rank in arrays or vectors; correct indexing is essential.
- →Recursion and Iteration — Find with path compression is often recursive; understanding loops enables iterative variants.
- →Asymptotic Notation — To interpret O(m \alpha(n)) and compare heuristics’ effects on performance.
- →Graphs: Undirected Connectivity — DSU is widely used for connected components, cycle detection, and MSTs.
- →Sorting — Kruskal’s algorithm requires sorting edges by weight.
- →Stacks — Rollback DSU uses a stack of changes to support undo operations.
- →Amortized Analysis — Explains why DSU operations are effectively constant time on average.
Detailed Explanation
Tap terms for definitions01Overview
Disjoint Set Union (DSU), also called Union-Find, is a data structure that manages a partition of elements into disjoint (non-overlapping) sets. It supports two core operations: Find, which returns a representative (root) of the set containing an element, and Union, which merges the sets of two elements. Internally, DSU typically represents each set as a rooted tree in a forest, where each node stores a parent pointer and the root points to itself. Two powerful heuristics—path compression and union by size/rank—keep these trees extremely shallow. Path compression makes every node on a Find path point directly to the root, while union by size/rank always attaches the smaller (or lower-rank) tree under the larger one. Together, they give near-constant amortized performance: a sequence of m operations on n elements runs in O(m \alpha(n)), where \alpha is the inverse Ackermann function that grows slower than any practical function. DSU underpins many graph algorithms, including Kruskal’s algorithm for computing a minimum spanning tree, online and offline connectivity queries, and cycle detection. It is compact, straightforward to code in C++, and exceptionally fast for competitive programming and systems tasks involving dynamic grouping.
02Intuition & Analogies
Imagine a school with many clubs. Initially, every student forms their own single-person club. When two clubs decide to collaborate, they merge into one bigger club. If you want to know whether two students are in the same club, you simply check who their club leader is. In DSU terms, each club is a tree, and the club leader is the root. The Find operation asks: who is this student’s leader? The Union operation says: combine the clubs of student A and student B under a single leader. Path compression is like distributing updated membership cards whenever you ask who the leader is. Suppose you stop a student in the hallway and ask, “Who’s your club leader?” If they don’t know directly, they ask their mentor, who asks their mentor, and so on, until someone knows. After you finally learn the true leader, you go back and update everyone’s card along the path to point straight to the leader. Next time, no one in this chain will need to ask multiple people—the answer is immediate. Union by size/rank is a smart merging policy: when two clubs merge, keep the bigger club’s leader as the new leader. That way, the smaller club’s members only need to update to point to a leader who already manages more people, and we avoid making tall, skinny hierarchies that slow down queries. Together, these ideas ensure that most membership checks and merges are nearly instant, even after performing millions of operations. The structure stays flat and fast, similar to how an efficiently managed organization reduces red tape by shortening reporting lines and minimizing unnecessary handoffs.
03Formal Definition
04When to Use
Use DSU whenever you need to maintain connectivity or grouping information under merges, with no requirement to split sets online. Classic applications include:
- Kruskal’s Minimum Spanning Tree: Sort edges by weight and add an edge if its endpoints are in different components (checked by DSU). This yields O(m \log m + m \alpha(n)) time.
- Connected Components in static or incrementally built undirected graphs: As you add edges, union endpoints; Find answers “same component?” queries.
- Cycle Detection in undirected graphs: Processing edges; a new edge (u, v) forms a cycle if Find(u) == Find(v).
- Grid/Percolation connectivity: Merge adjacent open cells to answer region/area queries.
- Offline Dynamic Connectivity or Undo operations: With Rollback DSU, you can add edges in certain time intervals and undo them via divide-and-conquer on time without supporting online splits. Avoid DSU if you need to delete edges online or split sets frequently; standard DSU cannot handle splits efficiently. Also, DSU does not directly handle directed connectivity or path queries with weights; other structures (e.g., dynamic trees, link-cut trees) or graph algorithms are more suitable there.
⚠️Common Mistakes
- Skipping heuristics: Implementing DSU without path compression or union by size/rank leads to tall trees and O(n) worst-case per operation, which can TLE on large inputs. Always include at least one heuristic; both is best.
- Incorrect union logic: Forgetting to union by root (i.e., linking arbitrary nodes rather than their roots) or failing to update size/rank correctly causes corrupted sets. Always compute r_x = Find(x), r_y = Find(y), check r_x != r_y, then link.
- Indexing errors: Mixing 0-based and 1-based indices or failing to initialize Make-Set for all elements leads to out-of-bounds or undefined parents. Initialize parent[i] = i and size[i] = 1 for all i in [0..n-1] or [1..n].
- Recursion depth worries: Find with recursion is safe in practice due to path compression keeping trees shallow; but in specialized environments, an iterative Find avoids any stack concerns.
- Using path compression with rollback: Rollback DSU relies on undoing changes; path compression makes many small parent changes that are expensive to track. For rollback, disable full path compression and record only union operations (and any size/rank changes) on a stack.
- Misapplying DSU: DSU answers connectivity and set membership, not shortest paths, directed reachability, or weighted aggregates along paths. Choose algorithms like BFS/DFS, Dijkstra, or segment trees when appropriate.
Key Formulas
DSU Total Time
Explanation: With path compression and union by rank/size, a sequence of m operations on n elements runs in O(m alpha(n)) time. The function alpha(n) is the inverse Ackermann function and is less than 5 for all practical n.
Inverse Ackermann (one definition)
Explanation: This defines alpha(n) in terms of the (very fast-growing) Ackermann function A(k, 4). Because A grows so quickly, alpha grows extremely slowly, making DSU effectively constant-time per operation.
Tree Height with Union-by-Size Only
Explanation: If we use union by size without path compression, the height h(n) of any tree is bounded by floor(log2 n). Each time a node’s height increases, the size of its set at least doubles.
Operation Count Decomposition
Explanation: In analyses and implementations, the total number of operations m is the sum of Find and Union calls. Complexity statements like O(m alpha(n)) apply to the whole sequence.
Space Complexity
Explanation: DSU stores a constant number of arrays (parent, size or rank) of length n, so its space use scales linearly with the number of elements.
Component Count Update
Explanation: When uniting u and v from different components, the number of connected components decreases by one; otherwise, it stays the same. The Iverson bracket [P] is 1 if P is true, 0 otherwise.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct DSU { 5 int n; 6 vector<int> parent, sz; // parent[i] is parent of i; sz[i] is size of root i 7 DSU(int n): n(n), parent(n), sz(n, 1) { 8 iota(parent.begin(), parent.end(), 0); // parent[i] = i 9 } 10 // Find with path compression 11 int find(int x) { 12 if (parent[x] == x) return x; 13 return parent[x] = find(parent[x]); // compress path 14 } 15 // Union by size; returns true if merged, false if already in same set 16 bool unite(int a, int b) { 17 a = find(a); b = find(b); 18 if (a == b) return false; 19 if (sz[a] < sz[b]) swap(a, b); // ensure a is larger root 20 parent[b] = a; 21 sz[a] += sz[b]; 22 return true; 23 } 24 int size(int x) { return sz[find(x)]; } 25 bool same(int a, int b) { return find(a) == find(b); } 26 }; 27 28 int main() { 29 ios::sync_with_stdio(false); 30 cin.tie(nullptr); 31 32 DSU dsu(10); // elements 0..9 33 34 // Merge some sets 35 dsu.unite(0, 1); 36 dsu.unite(2, 3); 37 dsu.unite(1, 2); // merges {0,1} with {2,3} -> {0,1,2,3} 38 39 cout << boolalpha; 40 cout << "same(0,3) = " << dsu.same(0,3) << "\n"; // true 41 cout << "same(4,5) = " << dsu.same(4,5) << "\n"; // false 42 43 // Sizes 44 cout << "size(0) = " << dsu.size(0) << "\n"; // 4 45 cout << "size(4) = " << dsu.size(4) << "\n"; // 1 46 47 // Further unions 48 dsu.unite(3, 4); 49 cout << "same(0,4) = " << dsu.same(0,4) << "\n"; // true 50 51 return 0; 52 } 53
This implementation maintains an array parent and a size array. Find uses path compression to flatten the tree, and unite attaches the smaller set under the larger one. The example merges a few sets, queries connectivity, and checks component sizes.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct DSU { 5 vector<int> p, r; // parent and rank 6 DSU(int n): p(n), r(n, 0) { iota(p.begin(), p.end(), 0); } 7 int find(int x) { return p[x] == x ? x : p[x] = find(p[x]); } 8 bool unite(int a, int b) { 9 a = find(a); b = find(b); 10 if (a == b) return false; 11 if (r[a] < r[b]) swap(a, b); 12 p[b] = a; 13 if (r[a] == r[b]) r[a]++; 14 return true; 15 } 16 }; 17 18 struct Edge { int u, v; long long w; }; 19 20 int main() { 21 ios::sync_with_stdio(false); 22 cin.tie(nullptr); 23 24 // Example graph: 5 nodes, 7 edges 25 int n = 5; 26 vector<Edge> edges = { 27 {0,1,10}, {0,2,5}, {1,2,6}, {1,3,15}, {2,3,4}, {2,4,11}, {3,4,9} 28 }; 29 30 // Sort edges by weight 31 sort(edges.begin(), edges.end(), [](const Edge& a, const Edge& b){ return a.w < b.w; }); 32 33 DSU dsu(n); 34 long long total = 0; 35 vector<Edge> mst; 36 37 for (const auto& e : edges) { 38 if (dsu.unite(e.u, e.v)) { // if endpoints are in different components 39 total += e.w; 40 mst.push_back(e); 41 } 42 } 43 44 cout << "MST total weight = " << total << "\n"; 45 cout << "Edges in MST:" << "\n"; 46 for (auto &e : mst) cout << e.u << " - " << e.v << " (w=" << e.w << ")\n"; 47 48 return 0; 49 } 50
Kruskal’s algorithm sorts edges by weight and repeatedly takes the next lightest edge that connects two different components. DSU ensures we only add edges that do not create cycles. The sample builds an MST and prints its total weight and edges.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct DSU { 5 vector<int> p, sz; 6 DSU(int n): p(n), sz(n,1) { iota(p.begin(), p.end(), 0); } 7 int find(int x) { return p[x] == x ? x : p[x] = find(p[x]); } 8 bool unite(int a, int b) { 9 a = find(a); b = find(b); 10 if (a == b) return false; 11 if (sz[a] < sz[b]) swap(a, b); 12 p[b] = a; sz[a] += sz[b]; 13 return true; 14 } 15 }; 16 17 int main(){ 18 ios::sync_with_stdio(false); 19 cin.tie(nullptr); 20 21 // Example: 4 nodes with edges; the last edge creates a cycle 22 int n = 4; 23 vector<pair<int,int>> edges = {{0,1}, {1,2}, {2,3}, {3,0}}; 24 25 DSU dsu(n); 26 bool hasCycle = false; 27 for (auto [u,v] : edges) { 28 if (!dsu.unite(u, v)) { // already in same component 29 hasCycle = true; 30 cout << "Cycle found when adding edge (" << u << ", " << v << ")\n"; 31 break; 32 } 33 } 34 35 cout << (hasCycle ? "Graph has a cycle" : "Graph is acyclic") << "\n"; 36 return 0; 37 } 38
When processing undirected edges, an edge (u, v) creates a cycle iff u and v are already connected. DSU tracks components and quickly detects this condition.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Rollback DSU: no path compression; union by size; supports undo() 5 struct RollbackDSU { 6 int n, comps; // number of components 7 vector<int> p, sz; 8 // Stack of changes to undo: (child, old_parent, parent, old_size_of_parent) 9 struct Change { int child, oldp, par, oldsz; }; 10 vector<Change> history; 11 12 RollbackDSU(int n): n(n), comps(n), p(n), sz(n,1) { iota(p.begin(), p.end(), 0); } 13 14 int find(int x) { 15 // No path compression: preserves structure for rollback 16 while (p[x] != x) x = p[x]; 17 return x; 18 } 19 20 bool unite(int a, int b) { 21 a = find(a); b = find(b); 22 if (a == b) { 23 // record a no-op marker? Not needed; simply return false 24 return false; 25 } 26 if (sz[a] < sz[b]) swap(a, b); 27 // Record change to undo later 28 history.push_back({b, p[b], a, sz[a]}); 29 p[b] = a; 30 sz[a] += sz[b]; 31 comps--; 32 return true; 33 } 34 35 // Roll back to a previous history size (checkpoint) 36 void rollback(int checkpoint) { 37 while ((int)history.size() > checkpoint) { 38 auto c = history.back(); history.pop_back(); 39 // Undo parent and size changes 40 p[c.child] = c.oldp; // child was root; oldp == child 41 sz[c.par] = c.oldsz; 42 comps++; 43 } 44 } 45 46 int checkpoint() const { return (int)history.size(); } 47 }; 48 49 int main(){ 50 ios::sync_with_stdio(false); 51 cin.tie(nullptr); 52 53 int n = 6; 54 RollbackDSU dsu(n); 55 56 // Phase 1 unions 57 int cp1 = dsu.checkpoint(); 58 dsu.unite(0,1); 59 dsu.unite(2,3); 60 cout << "Components after phase 1: " << dsu.comps << "\n"; // expect 4 61 62 // Phase 2 unions 63 int cp2 = dsu.checkpoint(); 64 dsu.unite(1,2); // merges {0,1} with {2,3} 65 dsu.unite(4,5); 66 cout << "Components after phase 2: " << dsu.comps << "\n"; // expect 2 67 68 // Roll back phase 2 69 dsu.rollback(cp2); 70 cout << "Components after rollback to cp2: " << dsu.comps << "\n"; // back to 4 71 72 // Further unions and rollback to beginning of phase 1 73 dsu.unite(3,4); 74 cout << "Components before full rollback: " << dsu.comps << "\n"; // expect 3 75 dsu.rollback(cp1); 76 cout << "Components after rollback to cp1: " << dsu.comps << "\n"; // back to 6 77 78 return 0; 79 } 80
Rollback DSU disables path compression and records each union’s reversible changes on a stack. Checkpoints capture the current stack size; rollback pops changes to restore a previous state. This is useful for offline dynamic connectivity (e.g., divide-and-conquer on time intervals) where you need to add edges in subproblems and then undo them when returning.