🗂️Data StructureAdvanced

Persistent DSU (Fully Persistent Union-Find)

Key Points

  • •
    A persistent DSU (Union-Find) keeps all historical versions so you can query connectivity at any past version and even branch new futures from old states.
  • •
    To make DSU persistent, disable path compression and use union by rank/size to keep trees shallow, then store parent/rank in a persistent array (e.g., persistent segment tree).
  • •
    Each union creates a new version by changing only O(1) cells (the parent of one root and maybe the rank/size of the other), which costs O(log n) writes in a persistent array.
  • •
    Find at version t climbs parent pointers using version-t reads; with union by rank the height is O(log n), so the naive persistent-array approach costs O((log n)^2) per find (often treated as O(log n) in practice).
  • •
    Space usage is O(n + q log n) for q unions/updates due to path copying in the persistent array.
  • •
    Rollback DSU is simpler but only partially persistent; it answers offline queries over time by adding/removing edges with a segment-tree-over-time and rolling back state.
  • •
    For fully dynamic online connectivity with edge insertions/removals, use Link-Cut Trees or Euler Tour Trees; DSU cannot handle deletions online without offline tricks.
  • •
    Applications include time-travel connectivity queries, branching simulations (like git), and answering graph connectivity at any time snapshot.

Prerequisites

  • →Static and persistent arrays — Persistent DSU relies on representing parent/rank as persistent arrays with path copying or fat nodes.
  • →Segment trees and persistence — A persistent segment tree provides O(log n) point updates/queries and structural sharing among versions.
  • →Classic DSU with union by rank and path compression — Understanding the standard DSU helps motivate why we disable path compression in the persistent setting.
  • →Asymptotic analysis — You must reason about the logarithmic factors in time and memory across versions.
  • →Offline processing and divide-and-conquer on time — Rollback DSU uses a segment-tree-over-time approach to answer time-indexed queries offline.
  • →Balanced BSTs and path copying — Alternative persistent array implementations use balanced BSTs with path copying or fat nodes.

Detailed Explanation

Tap terms for definitions

01Overview

A persistent DSU (also called fully persistent Union-Find) is a versioned data structure that supports make-set, find, and union while preserving all past states. Instead of mutating the single parent[] and rank[] arrays, each union produces a new version that shares most of its memory with previous versions. Queries can be issued against any version, enabling time-travel queries like “were u and v connected after the 57th update?” or branching new sequences of unions from an older version. The core challenge is that the usual DSU optimization, path compression, mutates many parents along a path, which contradicts persistence. The fix is to disable path compression and rely on union by rank/size to keep tree height logarithmic. Technically, we implement parent[] and rank[] as persistent arrays—commonly via a persistent segment tree or a fat-node structure—so that changing O(1) cells per union creates a new root pointer representing the new version. Reads are done at a chosen version, and because union by rank ensures logarithmic height, find remains fast. In competitive programming, this technique enables online queries on arbitrary past versions and branching histories while staying within O(log n)–O((log n)^2) per operation and O(n + q log n) memory for q updates.

02Intuition & Analogies

Imagine maintaining friend groups in a school over the year. Each time two groups merge (a union), you want to remember exactly how the groups looked at that moment, because later you might need to answer questions like “were Alice and Bob in the same group after the midterm?” or even ask what would have happened if you continued from that midterm state but made different merges. That’s persistence: you never overwrite history; you just create a new snapshot that shares most of the unchanged information with the past. Think of a persistent DSU like a collaborative document with version control (like git). Each union is a commit. The content (parent and rank arrays) mostly stays the same, but a few lines change. Instead of copying the whole file, the system stores just the diffs and a pointer to the previous version, so creating a new commit is cheap and all past commits remain readable. Now, why turn off path compression? Path compression is like auto-rewriting many lines across the document to speed up future reads; that’s great for a single evolving document but terrible when you must preserve every historical version exactly as it was. If we avoid that global rewrite and only attach one tree under another using union by rank (always attaching the smaller height under the larger), the tallest tree remains modest in height (logarithmic). When we later ask “who is the leader of student X at version t?”, we just walk parent pointers at version t until we reach the representative. Each parent lookup reads the versioned array at that time, just like reading a file at a particular commit.

03Formal Definition

Let V be the set of items 1..n. A classic DSU maintains arrays parent[1..n] and rank[1..n]. Initially, parent[i] = i and rank[i] = 0 for all i. The find operation returns the representative of i’s set by following parent pointers to a root r with parent[r] = r. The union operation takes two representatives r1 and r2 and attaches one root under the other; with union by rank, if rank[r1] < rank[r2], set parent[r1] := r2; if rank[r1] > rank[r2], set parent[r2] := r1; otherwise set parent[r2] := r1 and rank[r1] := rank[r1] + 1. A fully persistent DSU is a family of versions indexed by an integer ,1,2,... such that for each t there exist arrays and and: (1) Version 0 is the initial state; (2) For any version t and any union on elements a,b, a new version t' is created with and identical to version t except possibly at O(1) indices (the two roots and one rank), and find queries can be answered with respect to any version t independently. A common implementation realizes and as persistent arrays via path copying (e.g., a persistent segment tree), so that writing to position i returns a new root pointer representing a new version while sharing unchanged structure with previous versions. Path compression is disallowed because it would require retroactively changing many values along a path, violating persistence.

04When to Use

Use a persistent DSU when you must support connectivity queries at arbitrary historical states and possibly branch new updates from any past state. Typical scenarios include: (1) Time-travel queries: given a sequence of edge insertions in a graph, answer whether u and v are connected at time t. (2) Snapshot analytics: maintain snapshots of a clustering process as thresholds change and query any snapshot. (3) Branching simulations: explore multiple “what-if” union sequences from older checkpoints (git-like branching). (4) Interactive problems where the next query can reference any previous version. If queries are offline and only involve additions/removals over time ranges, DSU with rollback plus a segment-tree-over-time is simpler and often faster. If you need online fully dynamic connectivity with edge deletions and insertions intermixed and queries on the current state only, consider Link-Cut Trees or Euler Tour Trees instead of DSU. If you only need partial persistence (query old versions but modify only the newest), rollback DSU or fat-node arrays may be enough. Finally, if memory is tight or n and q are huge, verify that O(n + q log n) space for persistence fits within limits.

⚠️Common Mistakes

• Using path compression in a fully persistent DSU. Path compression mutates many parent entries and breaks the guarantee that older versions remain unchanged. Disable it and rely on union by rank/size. • Assuming O(1) parent/rank access in a persistent array. If parent[] is stored in a persistent segment tree or balanced BST, each read/write costs O(log n) (or O(log q) with fat nodes), so a find that climbs O(log n) parents costs O((log n)^2) in worst-case. Many sources round this to O(log n) in practice; be explicit in your analysis. • Forgetting to update rank/size when roots have equal rank. If you skip the rank increment, tree height can degrade, hurting performance. • Not normalizing edges (u,v) in rollback solutions. Without sorting/normalizing endpoints, add/remove matching fails and intervals are wrong. • Memory blow-ups by copying arrays per version. You must use structural sharing (persistent segment tree or fat nodes); never allocate O(n) per version. • Confusing fully persistent with partially persistent. Rollback DSU lets you query past states while exploring a recursion but does not let you modify arbitrary past versions online; for arbitrary branching and online queries you need full persistence. • Using recursion without tail checks in deep segment trees and blowing the stack. Prefer iterative traversals where possible or increase stack limits if the tree is deep.

Key Formulas

Union-by-rank height bound

Explanation: Without path compression, union by rank guarantees the height h of any tree is at most logarithmic in n. This keeps finds efficient even in persistent settings.

Persistent array point update

Explanation: A point update in a persistent segment tree performs path copying of O(log n) nodes. This bounds the time to write parent/rank when creating a new version.

Persistent-find (naive) cost

Explanation: A find follows O(log n) parent steps, and each parent read from a persistent segment tree is O(log n), yielding O((log n)^2) worst-case. In practice constants are small; some sources report it as roughly O(log n).

Space of persistent DSU

Explanation: The initial build costs O(n), and each of q unions touches O(1) cells which become O(log n) persistent nodes via path copying, hence total space is O(n + q log n).

Rollback DSU with segment-tree-over-time

Explanation: Each edge interval is inserted into O(log q) segment tree nodes; with O(log n) union cost and rollbacks, the total time across the DFS over time is near-linear with a logarithmic factor.

Inverse Ackermann reminder

Explanation: Classic DSU with path compression and union by rank has near-constant amortized time O((n)), but path compression is not compatible with full persistence.

Complexity Analysis

For the fully persistent DSU implemented with a persistent segment tree (path copying), each version maintains two persistent arrays: parent[1..n] and rank/size[1..n]. The initial build is O(n). A union on version v finds the roots of a and b and performs up to two point updates: setting the parent of one root and possibly incrementing the rank/size of the other. Each point update is O(log n) time and space via path copying, so the update part is O(log n). The dominating cost is the two finds to locate the roots on the chosen version. Without path compression, union by rank bounds the tree height by O(log n). Each parent lookup is a persistent array read which costs O(log n), and we perform O(log n) such steps per find, so a straightforward implementation yields O((log n)^2) time per find and thus O((log n)^2) per connectivity query and O((log n)^2 + log n) per union. Many implementations informally state O(log n) because constants are small and practical heights are very low; this is acceptable in contests if justified. Space-wise, we allocate O(1) new leaves per union, each replicated across O(log n) nodes by path copying, so for q unions we use O(n + q log n) memory (including the initial build). Each version is represented by a root pointer, so storing k versions is O(k). For partially persistent (rollback) DSU with a segment-tree-over-time, each edge interval is added to O(log q) nodes, and each addition does a union costing O(log n). The DFS over time performs O(total edge insertions) unions and an equal number of rollbacks (O(1) per reverted change), giving O((n + q) log q) total time and O(n + q) auxiliary memory plus the stack of changes.

Code Examples

Fully Persistent DSU using a Persistent Segment Tree (union by rank, no path compression)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Persistent segment tree storing pairs {parent, rank}
5struct Node {
6 Node *l, *r;
7 int parent;
8 int rnk; // rank (or size-log)
9 Node(int p=0, int rk=0) : l(nullptr), r(nullptr), parent(p), rnk(rk) {}
10};
11
12struct PersistentArray {
13 int n;
14 vector<Node*> roots; // roots[version]
15
16 PersistentArray(int n): n(n) {
17 roots.clear();
18 roots.push_back(build(1, n)); // version 0
19 }
20
21 Node* build(int L, int R) {
22 Node* node = new Node();
23 if (L == R) {
24 node->parent = L; // initially parent[i] = i
25 node->rnk = 0; // rank[i] = 0
26 return node;
27 }
28 int M = (L + R) >> 1;
29 node->l = build(L, M);
30 node->r = build(M+1, R);
31 return node;
32 }
33
34 // Read at index idx from version v
35 pair<int,int> get(Node* root, int L, int R, int idx) const {
36 if (L == R) return {root->parent, root->rnk};
37 int M = (L + R) >> 1;
38 if (idx <= M) return get(root->l, L, M, idx);
39 else return get(root->r, M+1, R, idx);
40 }
41
42 pair<int,int> get(int version, int idx) const {
43 return get(roots[version], 1, n, idx);
44 }
45
46 // Point-update at idx: set parent or rank or both; returns new root
47 Node* set(Node* root, int L, int R, int idx, int newParent, int newRank) {
48 Node* node = new Node();
49 if (L == R) {
50 node->parent = newParent;
51 node->rnk = newRank;
52 return node;
53 }
54 int M = (L + R) >> 1;
55 if (idx <= M) {
56 node->l = set(root->l, L, M, idx, newParent, newRank);
57 node->r = root->r;
58 } else {
59 node->l = root->l;
60 node->r = set(root->r, M+1, R, idx, newParent, newRank);
61 }
62 return node;
63 }
64
65 // Convenience: write only parent or only rank by first reading
66 Node* write_parent(Node* root, int idx, int L, int R, int newParent) {
67 auto cur = get(root, L, R, idx);
68 return set(root, L, R, idx, newParent, cur.second);
69 }
70 Node* write_rank(Node* root, int idx, int L, int R, int newRank) {
71 auto cur = get(root, L, R, idx);
72 return set(root, L, R, idx, cur.first, newRank);
73 }
74
75 // Find representative of x in given root version; no path compression
76 int find_rep(Node* root, int x) const {
77 while (true) {
78 auto pr = get(root, 1, n, x);
79 int p = pr.first;
80 if (p == x) return x;
81 x = p;
82 }
83 }
84
85 // Union by rank on baseVersion; returns index of new version
86 int unite(int baseVersion, int a, int b) {
87 Node* baseRoot = roots[baseVersion];
88 int ra = find_rep(baseRoot, a);
89 int rb = find_rep(baseRoot, b);
90 if (ra == rb) {
91 // No change; just reuse the base root as a new version (branching)
92 roots.push_back(baseRoot);
93 return (int)roots.size() - 1;
94 }
95 auto pa = get(baseRoot, 1, n, ra);
96 auto pb = get(baseRoot, 1, n, rb);
97 int rka = pa.second;
98 int rkb = pb.second;
99 Node* newRoot = baseRoot;
100 if (rka < rkb) {
101 // Attach ra under rb
102 newRoot = write_parent(newRoot, ra, 1, n, rb);
103 } else if (rka > rkb) {
104 // Attach rb under ra
105 newRoot = write_parent(newRoot, rb, 1, n, ra);
106 } else {
107 // Equal rank: attach rb under ra, increment rank[ra]
108 newRoot = write_parent(newRoot, rb, 1, n, ra);
109 newRoot = write_rank(newRoot, ra, 1, n, rka + 1);
110 }
111 roots.push_back(newRoot);
112 return (int)roots.size() - 1;
113 }
114
115 // Connectivity query on a given version
116 bool connected(int version, int a, int b) const {
117 Node* root = roots[version];
118 return find_rep(root, a) == find_rep(root, b);
119 }
120};
121
122int main() {
123 ios::sync_with_stdio(false);
124 cin.tie(nullptr);
125
126 // Demo: n elements, build version 0; then create branches
127 int n = 8;
128 PersistentArray dsu(n);
129
130 // Version timeline:
131 // v0: initial
132 int v1 = dsu.unite(0, 1, 2); // v1 = v0 + union(1,2)
133 int v2 = dsu.unite(v1, 2, 3); // v2 = v1 + union(2,3)
134 int v3 = dsu.unite(0, 5, 6); // branch from v0: v3 = v0 + union(5,6)
135 int v4 = dsu.unite(v2, 1, 3); // union on already-connected nodes; version still created
136
137 cout << boolalpha;
138 cout << "v0: conn(1,3) = " << dsu.connected(0, 1, 3) << "\n"; // false
139 cout << "v1: conn(1,3) = " << dsu.connected(v1, 1, 3) << "\n"; // false
140 cout << "v2: conn(1,3) = " << dsu.connected(v2, 1, 3) << "\n"; // true
141 cout << "v3: conn(5,6) = " << dsu.connected(v3, 5, 6) << "\n"; // true
142 cout << "v4: conn(1,3) = " << dsu.connected(v4, 1, 3) << "\n"; // true
143
144 // You can continue branching from any version index stored in dsu.roots
145 return 0;
146}
147

This program implements a fully persistent DSU by storing parent and rank in a persistent segment tree. Each union produces a new version (a new root pointer) by point-updating at most two positions: parent of one root and possibly rank of the other. The find operation climbs parent pointers at the target version without path compression. The demo creates multiple versions, including branching from an older version (v0 to v3), and queries connectivity at specific versions.

Time: Union: O((log n)^2) for two finds plus O(log n) updates; Connectivity query: O((log n)^2) for two finds. In practice often close to O(log n).Space: O(n + q log n) for q unions (versions), due to path copying in the persistent segment tree.
DSU with Rollback + Segment-Tree-Over-Time (offline partial persistence)
1#include <bits/stdc++.h>
2using namespace std;
3
4struct DSURollback {
5 int n;
6 vector<int> parent, sz;
7 // Stack of changes: (v, old_parent, u, old_size). We use a small struct.
8 struct Change { int v, old_parent; int u, old_size; bool merged; };
9 vector<Change> st;
10
11 DSURollback(int n=0): n(n), parent(n+1), sz(n+1) {
12 iota(parent.begin(), parent.end(), 0);
13 fill(sz.begin(), sz.end(), 1);
14 }
15
16 int find(int x) {
17 while (x != parent[x]) x = parent[x];
18 return x;
19 }
20
21 int snapshot() const { return (int)st.size(); }
22
23 void rollback(int snap) {
24 while ((int)st.size() > snap) {
25 auto c = st.back(); st.pop_back();
26 if (!c.merged) continue; // no-op union
27 parent[c.v] = c.old_parent;
28 sz[c.u] = c.old_size;
29 }
30 }
31
32 void unite(int a, int b) {
33 a = find(a); b = find(b);
34 if (a == b) { st.push_back({0,0,0,0,false}); return; }
35 if (sz[a] < sz[b]) swap(a,b);
36 // attach b under a
37 st.push_back({b, parent[b], a, sz[a], true});
38 parent[b] = a;
39 sz[a] += sz[b];
40 }
41};
42
43// Offline dynamic connectivity: supports add/remove edges and connectivity queries
44// Input format (example in comments):
45// Q queries. Types:
46// + u v : add edge (u,v)
47// - u v : remove edge (u,v)
48// ? u v : query connectivity of u and v at current time
49// Assumption: Each edge added will be removed at most once (standard), unmatched adds persist to the end.
50
51struct OfflineDC {
52 int n, Q;
53 vector<vector<pair<int,int>>> seg; // segment tree over time storing edges
54 vector<pair<char, pair<int,int>>> ops; // queries in order
55 map<pair<int,int>, int> open; // edge -> start time
56 vector<string> answers;
57
58 OfflineDC(int n, int Q): n(n), Q(Q) {
59 seg.assign(4*Q+5, {});
60 ops.reserve(Q+1);
61 }
62
63 static pair<int,int> norm(int u, int v) { if (u>v) swap(u,v); return {u,v}; }
64
65 void add_op(char t, int u, int v) { ops.push_back({t, {u,v}}); }
66
67 void add_interval(int idx, int L, int R, int ql, int qr, pair<int,int> e) {
68 if (ql>R || qr<L) return;
69 if (ql<=L && R<=qr) { seg[idx].push_back(e); return; }
70 int M=(L+R)>>1;
71 add_interval(idx<<1, L, M, ql, qr, e);
72 add_interval(idx<<1|1, M+1, R, ql, qr, e);
73 }
74
75 void build_intervals() {
76 for (int t=1; t<=Q; ++t) {
77 char c = ops[t-1].first;
78 int u = ops[t-1].second.first;
79 int v = ops[t-1].second.second;
80 if (c == '+') {
81 open[norm(u,v)] = t;
82 } else if (c == '-') {
83 auto e = norm(u,v);
84 auto it = open.find(e);
85 if (it != open.end()) {
86 int s = it->second;
87 add_interval(1, 1, Q, s, t-1, e);
88 open.erase(it);
89 }
90 }
91 }
92 for (auto &kv : open) {
93 add_interval(1, 1, Q, kv.second, Q, kv.first);
94 }
95 open.clear();
96 }
97
98 void dfs(int idx, int L, int R, DSURollback &dsu) {
99 int snap = dsu.snapshot();
100 for (auto &e : seg[idx]) dsu.unite(e.first, e.second);
101 if (L == R) {
102 char c = ops[L-1].first;
103 int u = ops[L-1].second.first;
104 int v = ops[L-1].second.second;
105 if (c == '?') {
106 bool ans = (dsu.find(u) == dsu.find(v));
107 answers.push_back(ans ? "YES" : "NO");
108 }
109 dsu.rollback(snap);
110 return;
111 }
112 int M=(L+R)>>1;
113 dfs(idx<<1, L, M, dsu);
114 dfs(idx<<1|1, M+1, R, dsu);
115 dsu.rollback(snap);
116 }
117
118 vector<string> solve() {
119 build_intervals();
120 DSURollback dsu(n);
121 answers.clear();
122 dfs(1, 1, Q, dsu);
123 return answers;
124 }
125};
126
127int main(){
128 ios::sync_with_stdio(false);
129 cin.tie(nullptr);
130
131 // Example usage:
132 // n=5, Q=7
133 // + 1 2
134 // + 2 3
135 // ? 1 3 -> YES
136 // - 2 3
137 // ? 1 3 -> NO
138 // + 4 5
139 // ? 4 5 -> YES
140
141 int n = 5, Q = 7;
142 OfflineDC solver(n, Q);
143 vector<tuple<char,int,int>> seq = {
144 {'+',1,2}, {'+',2,3}, {'?',1,3}, {'-',2,3}, {'?',1,3}, {'+',4,5}, {'?',4,5}
145 };
146 for (auto [c,u,v] : seq) solver.add_op(c,u,v);
147
148 auto ans = solver.solve();
149 for (auto &s : ans) cout << s << '\n';
150 return 0;
151}
152

This template solves offline dynamic connectivity using DSU with rollback. Edges are active over time intervals; we place each active interval into a segment tree over the query timeline. A DFS over the segment tree applies all edges relevant to the subtree, answers queries at leaves, and rolls back to the previous snapshot when returning. This achieves partial persistence suitable for offline queries with additions/removals.

Time: O((n + Q) log Q) total; each edge interval touches O(log Q) nodes, each union is O(log n), and rollbacks are O(1) per change.Space: O(n + Q) for DSU arrays, change stack, and segment tree storage of edges.