Persistent DSU (Fully Persistent Union-Find)
Key Points
- •A persistent DSU (Union-Find) keeps all historical versions so you can query connectivity at any past version and even branch new futures from old states.
- •To make DSU persistent, disable path compression and use union by rank/size to keep trees shallow, then store parent/rank in a persistent array (e.g., persistent segment tree).
- •Each union creates a new version by changing only O(1) cells (the parent of one root and maybe the rank/size of the other), which costs O(log n) writes in a persistent array.
- •Find at version t climbs parent pointers using version-t reads; with union by rank the height is O(log n), so the naive persistent-array approach costs O((log n)^2) per find (often treated as O(log n) in practice).
- •Space usage is O(n + q log n) for q unions/updates due to path copying in the persistent array.
- •Rollback DSU is simpler but only partially persistent; it answers offline queries over time by adding/removing edges with a segment-tree-over-time and rolling back state.
- •For fully dynamic online connectivity with edge insertions/removals, use Link-Cut Trees or Euler Tour Trees; DSU cannot handle deletions online without offline tricks.
- •Applications include time-travel connectivity queries, branching simulations (like git), and answering graph connectivity at any time snapshot.
Prerequisites
- →Static and persistent arrays — Persistent DSU relies on representing parent/rank as persistent arrays with path copying or fat nodes.
- →Segment trees and persistence — A persistent segment tree provides O(log n) point updates/queries and structural sharing among versions.
- →Classic DSU with union by rank and path compression — Understanding the standard DSU helps motivate why we disable path compression in the persistent setting.
- →Asymptotic analysis — You must reason about the logarithmic factors in time and memory across versions.
- →Offline processing and divide-and-conquer on time — Rollback DSU uses a segment-tree-over-time approach to answer time-indexed queries offline.
- →Balanced BSTs and path copying — Alternative persistent array implementations use balanced BSTs with path copying or fat nodes.
Detailed Explanation
Tap terms for definitions01Overview
A persistent DSU (also called fully persistent Union-Find) is a versioned data structure that supports make-set, find, and union while preserving all past states. Instead of mutating the single parent[] and rank[] arrays, each union produces a new version that shares most of its memory with previous versions. Queries can be issued against any version, enabling time-travel queries like “were u and v connected after the 57th update?” or branching new sequences of unions from an older version. The core challenge is that the usual DSU optimization, path compression, mutates many parents along a path, which contradicts persistence. The fix is to disable path compression and rely on union by rank/size to keep tree height logarithmic. Technically, we implement parent[] and rank[] as persistent arrays—commonly via a persistent segment tree or a fat-node structure—so that changing O(1) cells per union creates a new root pointer representing the new version. Reads are done at a chosen version, and because union by rank ensures logarithmic height, find remains fast. In competitive programming, this technique enables online queries on arbitrary past versions and branching histories while staying within O(log n)–O((log n)^2) per operation and O(n + q log n) memory for q updates.
02Intuition & Analogies
Imagine maintaining friend groups in a school over the year. Each time two groups merge (a union), you want to remember exactly how the groups looked at that moment, because later you might need to answer questions like “were Alice and Bob in the same group after the midterm?” or even ask what would have happened if you continued from that midterm state but made different merges. That’s persistence: you never overwrite history; you just create a new snapshot that shares most of the unchanged information with the past. Think of a persistent DSU like a collaborative document with version control (like git). Each union is a commit. The content (parent and rank arrays) mostly stays the same, but a few lines change. Instead of copying the whole file, the system stores just the diffs and a pointer to the previous version, so creating a new commit is cheap and all past commits remain readable. Now, why turn off path compression? Path compression is like auto-rewriting many lines across the document to speed up future reads; that’s great for a single evolving document but terrible when you must preserve every historical version exactly as it was. If we avoid that global rewrite and only attach one tree under another using union by rank (always attaching the smaller height under the larger), the tallest tree remains modest in height (logarithmic). When we later ask “who is the leader of student X at version t?”, we just walk parent pointers at version t until we reach the representative. Each parent lookup reads the versioned array at that time, just like reading a file at a particular commit.
03Formal Definition
04When to Use
Use a persistent DSU when you must support connectivity queries at arbitrary historical states and possibly branch new updates from any past state. Typical scenarios include: (1) Time-travel queries: given a sequence of edge insertions in a graph, answer whether u and v are connected at time t. (2) Snapshot analytics: maintain snapshots of a clustering process as thresholds change and query any snapshot. (3) Branching simulations: explore multiple “what-if” union sequences from older checkpoints (git-like branching). (4) Interactive problems where the next query can reference any previous version. If queries are offline and only involve additions/removals over time ranges, DSU with rollback plus a segment-tree-over-time is simpler and often faster. If you need online fully dynamic connectivity with edge deletions and insertions intermixed and queries on the current state only, consider Link-Cut Trees or Euler Tour Trees instead of DSU. If you only need partial persistence (query old versions but modify only the newest), rollback DSU or fat-node arrays may be enough. Finally, if memory is tight or n and q are huge, verify that O(n + q log n) space for persistence fits within limits.
⚠️Common Mistakes
• Using path compression in a fully persistent DSU. Path compression mutates many parent entries and breaks the guarantee that older versions remain unchanged. Disable it and rely on union by rank/size. • Assuming O(1) parent/rank access in a persistent array. If parent[] is stored in a persistent segment tree or balanced BST, each read/write costs O(log n) (or O(log q) with fat nodes), so a find that climbs O(log n) parents costs O((log n)^2) in worst-case. Many sources round this to O(log n) in practice; be explicit in your analysis. • Forgetting to update rank/size when roots have equal rank. If you skip the rank increment, tree height can degrade, hurting performance. • Not normalizing edges (u,v) in rollback solutions. Without sorting/normalizing endpoints, add/remove matching fails and intervals are wrong. • Memory blow-ups by copying arrays per version. You must use structural sharing (persistent segment tree or fat nodes); never allocate O(n) per version. • Confusing fully persistent with partially persistent. Rollback DSU lets you query past states while exploring a recursion but does not let you modify arbitrary past versions online; for arbitrary branching and online queries you need full persistence. • Using recursion without tail checks in deep segment trees and blowing the stack. Prefer iterative traversals where possible or increase stack limits if the tree is deep.
Key Formulas
Union-by-rank height bound
Explanation: Without path compression, union by rank guarantees the height h of any tree is at most logarithmic in n. This keeps finds efficient even in persistent settings.
Persistent array point update
Explanation: A point update in a persistent segment tree performs path copying of O(log n) nodes. This bounds the time to write parent/rank when creating a new version.
Persistent-find (naive) cost
Explanation: A find follows O(log n) parent steps, and each parent read from a persistent segment tree is O(log n), yielding O((log n)^2) worst-case. In practice constants are small; some sources report it as roughly O(log n).
Space of persistent DSU
Explanation: The initial build costs O(n), and each of q unions touches O(1) cells which become O(log n) persistent nodes via path copying, hence total space is O(n + q log n).
Rollback DSU with segment-tree-over-time
Explanation: Each edge interval is inserted into O(log q) segment tree nodes; with O(log n) union cost and rollbacks, the total time across the DFS over time is near-linear with a logarithmic factor.
Inverse Ackermann reminder
Explanation: Classic DSU with path compression and union by rank has near-constant amortized time O((n)), but path compression is not compatible with full persistence.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Persistent segment tree storing pairs {parent, rank} 5 struct Node { 6 Node *l, *r; 7 int parent; 8 int rnk; // rank (or size-log) 9 Node(int p=0, int rk=0) : l(nullptr), r(nullptr), parent(p), rnk(rk) {} 10 }; 11 12 struct PersistentArray { 13 int n; 14 vector<Node*> roots; // roots[version] 15 16 PersistentArray(int n): n(n) { 17 roots.clear(); 18 roots.push_back(build(1, n)); // version 0 19 } 20 21 Node* build(int L, int R) { 22 Node* node = new Node(); 23 if (L == R) { 24 node->parent = L; // initially parent[i] = i 25 node->rnk = 0; // rank[i] = 0 26 return node; 27 } 28 int M = (L + R) >> 1; 29 node->l = build(L, M); 30 node->r = build(M+1, R); 31 return node; 32 } 33 34 // Read at index idx from version v 35 pair<int,int> get(Node* root, int L, int R, int idx) const { 36 if (L == R) return {root->parent, root->rnk}; 37 int M = (L + R) >> 1; 38 if (idx <= M) return get(root->l, L, M, idx); 39 else return get(root->r, M+1, R, idx); 40 } 41 42 pair<int,int> get(int version, int idx) const { 43 return get(roots[version], 1, n, idx); 44 } 45 46 // Point-update at idx: set parent or rank or both; returns new root 47 Node* set(Node* root, int L, int R, int idx, int newParent, int newRank) { 48 Node* node = new Node(); 49 if (L == R) { 50 node->parent = newParent; 51 node->rnk = newRank; 52 return node; 53 } 54 int M = (L + R) >> 1; 55 if (idx <= M) { 56 node->l = set(root->l, L, M, idx, newParent, newRank); 57 node->r = root->r; 58 } else { 59 node->l = root->l; 60 node->r = set(root->r, M+1, R, idx, newParent, newRank); 61 } 62 return node; 63 } 64 65 // Convenience: write only parent or only rank by first reading 66 Node* write_parent(Node* root, int idx, int L, int R, int newParent) { 67 auto cur = get(root, L, R, idx); 68 return set(root, L, R, idx, newParent, cur.second); 69 } 70 Node* write_rank(Node* root, int idx, int L, int R, int newRank) { 71 auto cur = get(root, L, R, idx); 72 return set(root, L, R, idx, cur.first, newRank); 73 } 74 75 // Find representative of x in given root version; no path compression 76 int find_rep(Node* root, int x) const { 77 while (true) { 78 auto pr = get(root, 1, n, x); 79 int p = pr.first; 80 if (p == x) return x; 81 x = p; 82 } 83 } 84 85 // Union by rank on baseVersion; returns index of new version 86 int unite(int baseVersion, int a, int b) { 87 Node* baseRoot = roots[baseVersion]; 88 int ra = find_rep(baseRoot, a); 89 int rb = find_rep(baseRoot, b); 90 if (ra == rb) { 91 // No change; just reuse the base root as a new version (branching) 92 roots.push_back(baseRoot); 93 return (int)roots.size() - 1; 94 } 95 auto pa = get(baseRoot, 1, n, ra); 96 auto pb = get(baseRoot, 1, n, rb); 97 int rka = pa.second; 98 int rkb = pb.second; 99 Node* newRoot = baseRoot; 100 if (rka < rkb) { 101 // Attach ra under rb 102 newRoot = write_parent(newRoot, ra, 1, n, rb); 103 } else if (rka > rkb) { 104 // Attach rb under ra 105 newRoot = write_parent(newRoot, rb, 1, n, ra); 106 } else { 107 // Equal rank: attach rb under ra, increment rank[ra] 108 newRoot = write_parent(newRoot, rb, 1, n, ra); 109 newRoot = write_rank(newRoot, ra, 1, n, rka + 1); 110 } 111 roots.push_back(newRoot); 112 return (int)roots.size() - 1; 113 } 114 115 // Connectivity query on a given version 116 bool connected(int version, int a, int b) const { 117 Node* root = roots[version]; 118 return find_rep(root, a) == find_rep(root, b); 119 } 120 }; 121 122 int main() { 123 ios::sync_with_stdio(false); 124 cin.tie(nullptr); 125 126 // Demo: n elements, build version 0; then create branches 127 int n = 8; 128 PersistentArray dsu(n); 129 130 // Version timeline: 131 // v0: initial 132 int v1 = dsu.unite(0, 1, 2); // v1 = v0 + union(1,2) 133 int v2 = dsu.unite(v1, 2, 3); // v2 = v1 + union(2,3) 134 int v3 = dsu.unite(0, 5, 6); // branch from v0: v3 = v0 + union(5,6) 135 int v4 = dsu.unite(v2, 1, 3); // union on already-connected nodes; version still created 136 137 cout << boolalpha; 138 cout << "v0: conn(1,3) = " << dsu.connected(0, 1, 3) << "\n"; // false 139 cout << "v1: conn(1,3) = " << dsu.connected(v1, 1, 3) << "\n"; // false 140 cout << "v2: conn(1,3) = " << dsu.connected(v2, 1, 3) << "\n"; // true 141 cout << "v3: conn(5,6) = " << dsu.connected(v3, 5, 6) << "\n"; // true 142 cout << "v4: conn(1,3) = " << dsu.connected(v4, 1, 3) << "\n"; // true 143 144 // You can continue branching from any version index stored in dsu.roots 145 return 0; 146 } 147
This program implements a fully persistent DSU by storing parent and rank in a persistent segment tree. Each union produces a new version (a new root pointer) by point-updating at most two positions: parent of one root and possibly rank of the other. The find operation climbs parent pointers at the target version without path compression. The demo creates multiple versions, including branching from an older version (v0 to v3), and queries connectivity at specific versions.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct DSURollback { 5 int n; 6 vector<int> parent, sz; 7 // Stack of changes: (v, old_parent, u, old_size). We use a small struct. 8 struct Change { int v, old_parent; int u, old_size; bool merged; }; 9 vector<Change> st; 10 11 DSURollback(int n=0): n(n), parent(n+1), sz(n+1) { 12 iota(parent.begin(), parent.end(), 0); 13 fill(sz.begin(), sz.end(), 1); 14 } 15 16 int find(int x) { 17 while (x != parent[x]) x = parent[x]; 18 return x; 19 } 20 21 int snapshot() const { return (int)st.size(); } 22 23 void rollback(int snap) { 24 while ((int)st.size() > snap) { 25 auto c = st.back(); st.pop_back(); 26 if (!c.merged) continue; // no-op union 27 parent[c.v] = c.old_parent; 28 sz[c.u] = c.old_size; 29 } 30 } 31 32 void unite(int a, int b) { 33 a = find(a); b = find(b); 34 if (a == b) { st.push_back({0,0,0,0,false}); return; } 35 if (sz[a] < sz[b]) swap(a,b); 36 // attach b under a 37 st.push_back({b, parent[b], a, sz[a], true}); 38 parent[b] = a; 39 sz[a] += sz[b]; 40 } 41 }; 42 43 // Offline dynamic connectivity: supports add/remove edges and connectivity queries 44 // Input format (example in comments): 45 // Q queries. Types: 46 // + u v : add edge (u,v) 47 // - u v : remove edge (u,v) 48 // ? u v : query connectivity of u and v at current time 49 // Assumption: Each edge added will be removed at most once (standard), unmatched adds persist to the end. 50 51 struct OfflineDC { 52 int n, Q; 53 vector<vector<pair<int,int>>> seg; // segment tree over time storing edges 54 vector<pair<char, pair<int,int>>> ops; // queries in order 55 map<pair<int,int>, int> open; // edge -> start time 56 vector<string> answers; 57 58 OfflineDC(int n, int Q): n(n), Q(Q) { 59 seg.assign(4*Q+5, {}); 60 ops.reserve(Q+1); 61 } 62 63 static pair<int,int> norm(int u, int v) { if (u>v) swap(u,v); return {u,v}; } 64 65 void add_op(char t, int u, int v) { ops.push_back({t, {u,v}}); } 66 67 void add_interval(int idx, int L, int R, int ql, int qr, pair<int,int> e) { 68 if (ql>R || qr<L) return; 69 if (ql<=L && R<=qr) { seg[idx].push_back(e); return; } 70 int M=(L+R)>>1; 71 add_interval(idx<<1, L, M, ql, qr, e); 72 add_interval(idx<<1|1, M+1, R, ql, qr, e); 73 } 74 75 void build_intervals() { 76 for (int t=1; t<=Q; ++t) { 77 char c = ops[t-1].first; 78 int u = ops[t-1].second.first; 79 int v = ops[t-1].second.second; 80 if (c == '+') { 81 open[norm(u,v)] = t; 82 } else if (c == '-') { 83 auto e = norm(u,v); 84 auto it = open.find(e); 85 if (it != open.end()) { 86 int s = it->second; 87 add_interval(1, 1, Q, s, t-1, e); 88 open.erase(it); 89 } 90 } 91 } 92 for (auto &kv : open) { 93 add_interval(1, 1, Q, kv.second, Q, kv.first); 94 } 95 open.clear(); 96 } 97 98 void dfs(int idx, int L, int R, DSURollback &dsu) { 99 int snap = dsu.snapshot(); 100 for (auto &e : seg[idx]) dsu.unite(e.first, e.second); 101 if (L == R) { 102 char c = ops[L-1].first; 103 int u = ops[L-1].second.first; 104 int v = ops[L-1].second.second; 105 if (c == '?') { 106 bool ans = (dsu.find(u) == dsu.find(v)); 107 answers.push_back(ans ? "YES" : "NO"); 108 } 109 dsu.rollback(snap); 110 return; 111 } 112 int M=(L+R)>>1; 113 dfs(idx<<1, L, M, dsu); 114 dfs(idx<<1|1, M+1, R, dsu); 115 dsu.rollback(snap); 116 } 117 118 vector<string> solve() { 119 build_intervals(); 120 DSURollback dsu(n); 121 answers.clear(); 122 dfs(1, 1, Q, dsu); 123 return answers; 124 } 125 }; 126 127 int main(){ 128 ios::sync_with_stdio(false); 129 cin.tie(nullptr); 130 131 // Example usage: 132 // n=5, Q=7 133 // + 1 2 134 // + 2 3 135 // ? 1 3 -> YES 136 // - 2 3 137 // ? 1 3 -> NO 138 // + 4 5 139 // ? 4 5 -> YES 140 141 int n = 5, Q = 7; 142 OfflineDC solver(n, Q); 143 vector<tuple<char,int,int>> seq = { 144 {'+',1,2}, {'+',2,3}, {'?',1,3}, {'-',2,3}, {'?',1,3}, {'+',4,5}, {'?',4,5} 145 }; 146 for (auto [c,u,v] : seq) solver.add_op(c,u,v); 147 148 auto ans = solver.solve(); 149 for (auto &s : ans) cout << s << '\n'; 150 return 0; 151 } 152
This template solves offline dynamic connectivity using DSU with rollback. Edges are active over time intervals; we place each active interval into a segment tree over the query timeline. A DFS over the segment tree applies all edges relevant to the subtree, answers queries at leaves, and rolls back to the previous snapshot when returning. This achieves partial persistence suitable for offline queries with additions/removals.