🗂️Data StructureIntermediate

Disjoint Set Union (Union-Find)

Key Points

  • Disjoint Set Union (Union-Find) maintains a collection of non-overlapping sets and supports fast merging and membership queries.
  • Path compression flattens trees during finds by pointing nodes directly to their set's root, greatly speeding up future queries.
  • Union by size/rank attaches the smaller tree under the larger, keeping trees shallow and operations efficient.
  • With both heuristics, m operations on n elements run in O(m (n)), which is effectively constant time per operation in practice.
  • DSU powers Kruskal’s Minimum Spanning Tree algorithm, cycle detection, and connected components in undirected graphs.
  • Rollback (undoable) DSU supports offline dynamic connectivity by reverting unions; it disables path compression to allow undo.
  • DSU is simple to implement with arrays and works well up to millions of elements when optimized.
  • Common pitfalls include forgetting to initialize parents, mixing 0/1-based indices, or combining rollback with path compression.

Prerequisites

  • Arrays and IndexingDSU stores parent and size/rank in arrays or vectors; correct indexing is essential.
  • Recursion and IterationFind with path compression is often recursive; understanding loops enables iterative variants.
  • Asymptotic NotationTo interpret O(m \alpha(n)) and compare heuristics’ effects on performance.
  • Graphs: Undirected ConnectivityDSU is widely used for connected components, cycle detection, and MSTs.
  • SortingKruskal’s algorithm requires sorting edges by weight.
  • StacksRollback DSU uses a stack of changes to support undo operations.
  • Amortized AnalysisExplains why DSU operations are effectively constant time on average.

Detailed Explanation

Tap terms for definitions

01Overview

Disjoint Set Union (DSU), also called Union-Find, is a data structure that manages a partition of elements into disjoint (non-overlapping) sets. It supports two core operations: Find, which returns a representative (root) of the set containing an element, and Union, which merges the sets of two elements. Internally, DSU typically represents each set as a rooted tree in a forest, where each node stores a parent pointer and the root points to itself. Two powerful heuristics—path compression and union by size/rank—keep these trees extremely shallow. Path compression makes every node on a Find path point directly to the root, while union by size/rank always attaches the smaller (or lower-rank) tree under the larger one. Together, they give near-constant amortized performance: a sequence of m operations on n elements runs in O(m \alpha(n)), where \alpha is the inverse Ackermann function that grows slower than any practical function. DSU underpins many graph algorithms, including Kruskal’s algorithm for computing a minimum spanning tree, online and offline connectivity queries, and cycle detection. It is compact, straightforward to code in C++, and exceptionally fast for competitive programming and systems tasks involving dynamic grouping.

02Intuition & Analogies

Imagine a school with many clubs. Initially, every student forms their own single-person club. When two clubs decide to collaborate, they merge into one bigger club. If you want to know whether two students are in the same club, you simply check who their club leader is. In DSU terms, each club is a tree, and the club leader is the root. The Find operation asks: who is this student’s leader? The Union operation says: combine the clubs of student A and student B under a single leader. Path compression is like distributing updated membership cards whenever you ask who the leader is. Suppose you stop a student in the hallway and ask, “Who’s your club leader?” If they don’t know directly, they ask their mentor, who asks their mentor, and so on, until someone knows. After you finally learn the true leader, you go back and update everyone’s card along the path to point straight to the leader. Next time, no one in this chain will need to ask multiple people—the answer is immediate. Union by size/rank is a smart merging policy: when two clubs merge, keep the bigger club’s leader as the new leader. That way, the smaller club’s members only need to update to point to a leader who already manages more people, and we avoid making tall, skinny hierarchies that slow down queries. Together, these ideas ensure that most membership checks and merges are nearly instant, even after performing millions of operations. The structure stays flat and fast, similar to how an efficiently managed organization reduces red tape by shortening reporting lines and minimizing unnecessary handoffs.

03Formal Definition

Let U be a finite universe of n elements. A disjoint set data structure maintains a partition P = {, , , } of U such that = for i j and . It supports the following operations: - Make-Set(x): Create a singleton set {x} with representative x. - Find(x): Return an identifier representative(S) for the unique set S P such that x S. In tree-based DSU, this is the root of x’s tree. - Union(x, y): Merge the sets containing x and y, i.e., replace and by when . Implementation uses Link(, ) on roots ), ). We store a parent array parent[1..n] (with parent[r] = r for roots) and optionally size[.] or rank[.]. Union-by-size ensures that when linking roots and with sizes and , we attach the smaller root under the larger: if then parent[] := and update size[] := + . Path compression sets parent[v] := root during Find to ensure subsequent queries are faster. Tarjan (1975) proved that with both heuristics, any sequence of m n operations runs in O(m (n)) time, where (n) is the inverse Ackermann function, which is less than 5 for all practical n. Space usage is O(n) for the parent and auxiliary arrays.

04When to Use

Use DSU whenever you need to maintain connectivity or grouping information under merges, with no requirement to split sets online. Classic applications include:

  • Kruskal’s Minimum Spanning Tree: Sort edges by weight and add an edge if its endpoints are in different components (checked by DSU). This yields O(m \log m + m \alpha(n)) time.
  • Connected Components in static or incrementally built undirected graphs: As you add edges, union endpoints; Find answers “same component?” queries.
  • Cycle Detection in undirected graphs: Processing edges; a new edge (u, v) forms a cycle if Find(u) == Find(v).
  • Grid/Percolation connectivity: Merge adjacent open cells to answer region/area queries.
  • Offline Dynamic Connectivity or Undo operations: With Rollback DSU, you can add edges in certain time intervals and undo them via divide-and-conquer on time without supporting online splits. Avoid DSU if you need to delete edges online or split sets frequently; standard DSU cannot handle splits efficiently. Also, DSU does not directly handle directed connectivity or path queries with weights; other structures (e.g., dynamic trees, link-cut trees) or graph algorithms are more suitable there.

⚠️Common Mistakes

  1. Skipping heuristics: Implementing DSU without path compression or union by size/rank leads to tall trees and O(n) worst-case per operation, which can TLE on large inputs. Always include at least one heuristic; both is best.
  2. Incorrect union logic: Forgetting to union by root (i.e., linking arbitrary nodes rather than their roots) or failing to update size/rank correctly causes corrupted sets. Always compute r_x = Find(x), r_y = Find(y), check r_x != r_y, then link.
  3. Indexing errors: Mixing 0-based and 1-based indices or failing to initialize Make-Set for all elements leads to out-of-bounds or undefined parents. Initialize parent[i] = i and size[i] = 1 for all i in [0..n-1] or [1..n].
  4. Recursion depth worries: Find with recursion is safe in practice due to path compression keeping trees shallow; but in specialized environments, an iterative Find avoids any stack concerns.
  5. Using path compression with rollback: Rollback DSU relies on undoing changes; path compression makes many small parent changes that are expensive to track. For rollback, disable full path compression and record only union operations (and any size/rank changes) on a stack.
  6. Misapplying DSU: DSU answers connectivity and set membership, not shortest paths, directed reachability, or weighted aggregates along paths. Choose algorithms like BFS/DFS, Dijkstra, or segment trees when appropriate.

Key Formulas

DSU Total Time

Explanation: With path compression and union by rank/size, a sequence of m operations on n elements runs in O(m alpha(n)) time. The function alpha(n) is the inverse Ackermann function and is less than 5 for all practical n.

Inverse Ackermann (one definition)

Explanation: This defines alpha(n) in terms of the (very fast-growing) Ackermann function A(k, 4). Because A grows so quickly, alpha grows extremely slowly, making DSU effectively constant-time per operation.

Tree Height with Union-by-Size Only

Explanation: If we use union by size without path compression, the height h(n) of any tree is bounded by floor(log2 n). Each time a node’s height increases, the size of its set at least doubles.

Operation Count Decomposition

Explanation: In analyses and implementations, the total number of operations m is the sum of Find and Union calls. Complexity statements like O(m alpha(n)) apply to the whole sequence.

Space Complexity

Explanation: DSU stores a constant number of arrays (parent, size or rank) of length n, so its space use scales linearly with the number of elements.

Component Count Update

Explanation: When uniting u and v from different components, the number of connected components decreases by one; otherwise, it stays the same. The Iverson bracket [P] is 1 if P is true, 0 otherwise.

Complexity Analysis

Let n be the number of elements and m the total number of operations (Find and Union). Using union by size or rank guarantees that trees remain shallow: without path compression, the height is O(log n) because a node’s depth increases only when its set at least doubles in size. Adding path compression dramatically tightens the bound. Tarjan showed that with both path compression and union by rank/size, the total time for any sequence of m operations (with ) is O(m where is the inverse Ackermann function, which is less than 5 for any n in practice. This means the amortized time per operation is effectively constant. Space consumption is O(n), as we store parent for each element and optionally size or rank arrays. If implementing rollback DSU, we also maintain a stack of changes; each successful union records O(1) information (previous parent and size/rank), so additional space is O(number of unions), which is O(n) to O(m) depending on workload. Note that rollback DSU disables full path compression to keep undo information manageable and predictable; thus its per-operation time becomes O(log n) amortized with union-by-size (or O(1) amortized if used in structured offline divide-and-conquer where each element’s parent changes are limited). In all cases, DSU’s constant factors are small, making it highly competitive. Beware worst-case O(n) per operation if both heuristics are omitted or incorrectly implemented.

Code Examples

Basic DSU with Path Compression and Union by Size
1#include <bits/stdc++.h>
2using namespace std;
3
4struct DSU {
5 int n;
6 vector<int> parent, sz; // parent[i] is parent of i; sz[i] is size of root i
7 DSU(int n): n(n), parent(n), sz(n, 1) {
8 iota(parent.begin(), parent.end(), 0); // parent[i] = i
9 }
10 // Find with path compression
11 int find(int x) {
12 if (parent[x] == x) return x;
13 return parent[x] = find(parent[x]); // compress path
14 }
15 // Union by size; returns true if merged, false if already in same set
16 bool unite(int a, int b) {
17 a = find(a); b = find(b);
18 if (a == b) return false;
19 if (sz[a] < sz[b]) swap(a, b); // ensure a is larger root
20 parent[b] = a;
21 sz[a] += sz[b];
22 return true;
23 }
24 int size(int x) { return sz[find(x)]; }
25 bool same(int a, int b) { return find(a) == find(b); }
26};
27
28int main() {
29 ios::sync_with_stdio(false);
30 cin.tie(nullptr);
31
32 DSU dsu(10); // elements 0..9
33
34 // Merge some sets
35 dsu.unite(0, 1);
36 dsu.unite(2, 3);
37 dsu.unite(1, 2); // merges {0,1} with {2,3} -> {0,1,2,3}
38
39 cout << boolalpha;
40 cout << "same(0,3) = " << dsu.same(0,3) << "\n"; // true
41 cout << "same(4,5) = " << dsu.same(4,5) << "\n"; // false
42
43 // Sizes
44 cout << "size(0) = " << dsu.size(0) << "\n"; // 4
45 cout << "size(4) = " << dsu.size(4) << "\n"; // 1
46
47 // Further unions
48 dsu.unite(3, 4);
49 cout << "same(0,4) = " << dsu.same(0,4) << "\n"; // true
50
51 return 0;
52}
53

This implementation maintains an array parent and a size array. Find uses path compression to flatten the tree, and unite attaches the smaller set under the larger one. The example merges a few sets, queries connectivity, and checks component sizes.

Time: O(\alpha(n)) amortized per operation; O(m \alpha(n)) for m opsSpace: O(n)
Kruskal's Minimum Spanning Tree using DSU
1#include <bits/stdc++.h>
2using namespace std;
3
4struct DSU {
5 vector<int> p, r; // parent and rank
6 DSU(int n): p(n), r(n, 0) { iota(p.begin(), p.end(), 0); }
7 int find(int x) { return p[x] == x ? x : p[x] = find(p[x]); }
8 bool unite(int a, int b) {
9 a = find(a); b = find(b);
10 if (a == b) return false;
11 if (r[a] < r[b]) swap(a, b);
12 p[b] = a;
13 if (r[a] == r[b]) r[a]++;
14 return true;
15 }
16};
17
18struct Edge { int u, v; long long w; };
19
20int main() {
21 ios::sync_with_stdio(false);
22 cin.tie(nullptr);
23
24 // Example graph: 5 nodes, 7 edges
25 int n = 5;
26 vector<Edge> edges = {
27 {0,1,10}, {0,2,5}, {1,2,6}, {1,3,15}, {2,3,4}, {2,4,11}, {3,4,9}
28 };
29
30 // Sort edges by weight
31 sort(edges.begin(), edges.end(), [](const Edge& a, const Edge& b){ return a.w < b.w; });
32
33 DSU dsu(n);
34 long long total = 0;
35 vector<Edge> mst;
36
37 for (const auto& e : edges) {
38 if (dsu.unite(e.u, e.v)) { // if endpoints are in different components
39 total += e.w;
40 mst.push_back(e);
41 }
42 }
43
44 cout << "MST total weight = " << total << "\n";
45 cout << "Edges in MST:" << "\n";
46 for (auto &e : mst) cout << e.u << " - " << e.v << " (w=" << e.w << ")\n";
47
48 return 0;
49}
50

Kruskal’s algorithm sorts edges by weight and repeatedly takes the next lightest edge that connects two different components. DSU ensures we only add edges that do not create cycles. The sample builds an MST and prints its total weight and edges.

Time: O(m \log m + m \alpha(n))Space: O(n + m)
Cycle Detection in an Undirected Graph using DSU
1#include <bits/stdc++.h>
2using namespace std;
3
4struct DSU {
5 vector<int> p, sz;
6 DSU(int n): p(n), sz(n,1) { iota(p.begin(), p.end(), 0); }
7 int find(int x) { return p[x] == x ? x : p[x] = find(p[x]); }
8 bool unite(int a, int b) {
9 a = find(a); b = find(b);
10 if (a == b) return false;
11 if (sz[a] < sz[b]) swap(a, b);
12 p[b] = a; sz[a] += sz[b];
13 return true;
14 }
15};
16
17int main(){
18 ios::sync_with_stdio(false);
19 cin.tie(nullptr);
20
21 // Example: 4 nodes with edges; the last edge creates a cycle
22 int n = 4;
23 vector<pair<int,int>> edges = {{0,1}, {1,2}, {2,3}, {3,0}};
24
25 DSU dsu(n);
26 bool hasCycle = false;
27 for (auto [u,v] : edges) {
28 if (!dsu.unite(u, v)) { // already in same component
29 hasCycle = true;
30 cout << "Cycle found when adding edge (" << u << ", " << v << ")\n";
31 break;
32 }
33 }
34
35 cout << (hasCycle ? "Graph has a cycle" : "Graph is acyclic") << "\n";
36 return 0;
37}
38

When processing undirected edges, an edge (u, v) creates a cycle iff u and v are already connected. DSU tracks components and quickly detects this condition.

Time: O(m \alpha(n))Space: O(n)
Rollback (Undoable) DSU for Offline Dynamic Connectivity
1#include <bits/stdc++.h>
2using namespace std;
3
4// Rollback DSU: no path compression; union by size; supports undo()
5struct RollbackDSU {
6 int n, comps; // number of components
7 vector<int> p, sz;
8 // Stack of changes to undo: (child, old_parent, parent, old_size_of_parent)
9 struct Change { int child, oldp, par, oldsz; };
10 vector<Change> history;
11
12 RollbackDSU(int n): n(n), comps(n), p(n), sz(n,1) { iota(p.begin(), p.end(), 0); }
13
14 int find(int x) {
15 // No path compression: preserves structure for rollback
16 while (p[x] != x) x = p[x];
17 return x;
18 }
19
20 bool unite(int a, int b) {
21 a = find(a); b = find(b);
22 if (a == b) {
23 // record a no-op marker? Not needed; simply return false
24 return false;
25 }
26 if (sz[a] < sz[b]) swap(a, b);
27 // Record change to undo later
28 history.push_back({b, p[b], a, sz[a]});
29 p[b] = a;
30 sz[a] += sz[b];
31 comps--;
32 return true;
33 }
34
35 // Roll back to a previous history size (checkpoint)
36 void rollback(int checkpoint) {
37 while ((int)history.size() > checkpoint) {
38 auto c = history.back(); history.pop_back();
39 // Undo parent and size changes
40 p[c.child] = c.oldp; // child was root; oldp == child
41 sz[c.par] = c.oldsz;
42 comps++;
43 }
44 }
45
46 int checkpoint() const { return (int)history.size(); }
47};
48
49int main(){
50 ios::sync_with_stdio(false);
51 cin.tie(nullptr);
52
53 int n = 6;
54 RollbackDSU dsu(n);
55
56 // Phase 1 unions
57 int cp1 = dsu.checkpoint();
58 dsu.unite(0,1);
59 dsu.unite(2,3);
60 cout << "Components after phase 1: " << dsu.comps << "\n"; // expect 4
61
62 // Phase 2 unions
63 int cp2 = dsu.checkpoint();
64 dsu.unite(1,2); // merges {0,1} with {2,3}
65 dsu.unite(4,5);
66 cout << "Components after phase 2: " << dsu.comps << "\n"; // expect 2
67
68 // Roll back phase 2
69 dsu.rollback(cp2);
70 cout << "Components after rollback to cp2: " << dsu.comps << "\n"; // back to 4
71
72 // Further unions and rollback to beginning of phase 1
73 dsu.unite(3,4);
74 cout << "Components before full rollback: " << dsu.comps << "\n"; // expect 3
75 dsu.rollback(cp1);
76 cout << "Components after rollback to cp1: " << dsu.comps << "\n"; // back to 6
77
78 return 0;
79}
80

Rollback DSU disables path compression and records each union’s reversible changes on a stack. Checkpoints capture the current stack size; rollback pops changes to restore a previous state. This is useful for offline dynamic connectivity (e.g., divide-and-conquer on time intervals) where you need to add edges in subproblems and then undo them when returning.

Time: Unite and Find are O(log n) amortized with union-by-size (no path compression); Rollback is O(1) per undone unionSpace: O(n + u) where u is the number of successful unions recorded