Persistent Array and Treap
Key Points
- β’Persistence lets you keep every past version of a data structure while making O(log n) updates and queries on any version.
- β’A persistent array can be implemented with a path-copy segment tree, giving O(log n) point updates and O(log n) access while sharing most nodes across versions.
- β’A persistent treap supports full binary-search-tree operations (insert, erase, find, k-th, orde_key) with expected O(log n) time per operation.
- β’Copy-on-write clones only nodes on the root-to-leaf path you touch, so each update adds O(log n) new nodes and reuses the rest.
- β’With q operations, total memory is roughly n + Ξ(q log n) nodes, where n is the initial size.
- β’Treaps maintain a BST by key and a heap by a random priority, yielding expected height O(log n).
- β’Version roots (one pointer per version) are the handles to time-travel: queries take a root and behave as if they run on that historical state.
- β’Memory management matters: use pools for speed, and consider reference counting or lifetime rules if you need to reclaim shared nodes safely.
Prerequisites
- βBinary search trees and segment trees β Persistent arrays rely on segment tree structure; treaps are randomized BSTs, so you need the basic traversal and update logic.
- βBig-O and logarithms β Understanding why updates and queries are O(log n) and how total memory grows with q requires comfort with asymptotic analysis and logs.
- βPointer semantics and recursion in C++ β Persistence is implemented by cloning nodes and passing pointers; operations are typically recursive.
- βRandomization β Treaps depend on random priorities to guarantee expected logarithmic height.
- βAugmented trees (subtree sizes, aggregates) β Order statistics in treaps and range sums in arrays come from maintaining augmented data in nodes.
- βMemory management strategies β Using pools for allocation speed or reference counting for reclamation helps keep performance and memory under control.
Detailed Explanation
Tap terms for definitions01Overview
Persistent data structures preserve history: instead of mutating a structure in-place, every update returns a new version that shares most of its memory with the old one. This is crucial in problems that require time-travel queries (e.g., answer using the array as it was after the 7th update) or undo/redo functionality. Two workhorses for persistence in competitive programming are the persistent array and the persistent treap. A persistent array can be realized as a functional segment tree where each point update clones O(log n) nodes along a path, leaving the rest shared. A persistent treap augments a binary search tree with random priorities (heap property) and performs updates via split/merge; cloning touched nodes yields persistence with expected O(log n) time per operation. These techniques scale well: per operation cost is logarithmic and memory is linear in the number of versions times the height touched. They also compose: you can store aggregates (like sums) in nodes for range queries, and order-statistics (like k-th) in treaps via subtree sizes. For Codeforces-level problems, these structures enable elegant solutions to offline range queries, rollback, and multi-version indexing while keeping code and complexity manageable.
02Intuition & Analogies
Imagine a library where each update to a book must be archived. Instead of reprinting the entire book for every change, you slip a new page into a binder that points to the unchanged old pages. The table of contents points to these pages, so each edition shares most pages with prior editions. That is copy-on-write persistence: you copy only what you modify and reuse everything else. A persistent array works like a decision tree that guides you to the correct page (leaf). To change a single entry, you only replace the guides you traverse (the nodes on your path), leaving the rest as-is. Accessing an old edition is simply using the old table of contents. A treap is like a shelf sorted by title (BST order) but also stacked by a random stickiness (priority) so that items tend to form balanced stacks. To insert or remove an item, you split or merge shelves, rearranging only a skinny column of stacks. With persistence, you donβt throw old stacks away; you make fresh copies of the few stacks you touch and leave the rest pointing to the original stacks. Version roots are the labels for each shelf configuration through time. Queries just say, "Start from this label," and everything behaves as if the world were frozen at that moment.
03Formal Definition
04When to Use
Use a persistent array when you need random access and point updates across time, particularly for: (1) offline queries over prefixes or versions (e.g., number of elements \le x in [l, r] via version differences), (2) rollback or undo functionality without auxiliary stacks, and (3) maintaining historical snapshots cheaply. Use a persistent treap when you need full BST operations with versioning: dynamic ordered sets supporting insert, erase, find, k-th smallest, and order_of_key across time. Typical applications include: (a) dynamic order statistics with historical queries, (b) keeping multiple branches of states in search/DP over subsets or timelines, (c) problems like 'k-th number in a subarray' using a value-compressed persistent array or treap, and (d) implementing ropes (sequence editors) with undo/redo by using an implicit persistent treap. Avoid persistence if constant factors and memory are too tight (e.g., extreme online constraints with very large q), or if simple rollback via a change stack is easier (and you do not need random access to arbitrary time points).
β οΈCommon Mistakes
Common pitfalls include: (1) Forgetting to clone on the path when updating and accidentally mutating shared nodes, which corrupts old versions; always treat nodes as immutable and allocate fresh nodes for any change. (2) Incorrect version bookkeeping: losing or overwriting a root pointer means that version is gone; always store roots in a vector. (3) Treap priority bugs: using deterministic priorities or small ranges can cause unbalanced trees; use a high-quality RNG (e.g., std::mt19937) and 32-bit or 64-bit priorities. (4) Mishandling duplicates in treaps: either augment with a count per key or disambiguate with (key, id) pairs; ensure order_of_key and k-th respect counts. (5) Stack overflows due to deep recursion when n is extremely large; consider tail-recursive patterns, iterative split/merge, or increasing stack size when required. (6) Memory blowups: each update costs O(log n) nodes; with millions of operations, pre-allocate a pool, and consider reference counting only if you truly need to reclaim memory for discarded versions. (7) Off-by-one errors from mixing 0-based and 1-based indexing, especially in implicit treaps (split by position) and in segment tree ranges; consistently document and assert your conventions.
Key Formulas
Tree Height
Explanation: The height of a complete binary tree over n elements is the ceiling of log base 2 of n. This bounds the path length for updates and queries in persistent arrays.
Segment Tree Build
Explanation: Building a (persistent) segment tree visits each array position once and does constant work per node. Total build time is linear in n.
Path-Copy Update Cost
Explanation: Each point update in a persistent array or treap touches only a root-to-leaf (or recursion) path. Time and number of newly allocated nodes are both logarithmic.
Total Node Count Over q Updates
Explanation: Starting from n-leaf structure, each of the q updates adds O(log n) fresh nodes while reusing the rest. Total memory grows linearly with q log n.
Expected Treap Height
Explanation: Random priorities imply the expected height of a treap is logarithmic in n. More precisely, the expected height is at most about 2 ln n plus a constant.
Order Statistics
Explanation: The rank of x counts how many elements are strictly smaller. The k-th smallest is the minimal x whose rank reaches k. Treaps maintain subtree sizes to support these in O(log n).
Range Sum via Prefix
Explanation: Range sums can be computed by subtracting two prefix sums. With persistent arrays, you can compute across versions by combining prefixes from different roots.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Persistent array implemented as a functional segment tree. 5 // Supports: build, point update (set), point get, and range sum query. 6 // All operations are O(log n). Each update allocates O(log n) nodes. 7 8 struct Node { 9 int sum; // aggregate (here: sum); for pure array access, leaf holds the value 10 Node *l, *r; // children 11 Node(int s=0, Node* L=nullptr, Node* R=nullptr) : sum(s), l(L), r(R) {} 12 }; 13 14 // Build from initial array a[tl..tr] 15 Node* build(const vector<int>& a, int tl, int tr) { 16 if (tl == tr) return new Node(a[tl], nullptr, nullptr); 17 int tm = (tl + tr) >> 1; 18 Node* L = build(a, tl, tm); 19 Node* R = build(a, tm+1, tr); 20 return new Node(L->sum + R->sum, L, R); 21 } 22 23 // Create a new version where position pos is set to val 24 Node* set_point(Node* v, int tl, int tr, int pos, int val) { 25 if (tl == tr) { 26 return new Node(val, nullptr, nullptr); 27 } 28 int tm = (tl + tr) >> 1; 29 if (pos <= tm) { 30 Node* L = set_point(v->l, tl, tm, pos, val); // clone path on left 31 Node* R = v->r; // reuse right child 32 return new Node(L->sum + R->sum, L, R); 33 } else { 34 Node* L = v->l; // reuse left child 35 Node* R = set_point(v->r, tm+1, tr, pos, val); // clone path on right 36 return new Node(L->sum + R->sum, L, R); 37 } 38 } 39 40 // Range sum query on version rooted at v 41 int sum_query(Node* v, int tl, int tr, int l, int r) { 42 if (!v || l > r) return 0; 43 if (l == tl && r == tr) return v->sum; 44 int tm = (tl + tr) >> 1; 45 return sum_query(v->l, tl, tm, l, min(r, tm)) + 46 sum_query(v->r, tm+1, tr, max(l, tm+1), r); 47 } 48 49 // Point get by querying [pos, pos] 50 int get_point(Node* v, int tl, int tr, int pos) { 51 if (tl == tr) return v->sum; 52 int tm = (tl + tr) >> 1; 53 if (pos <= tm) return get_point(v->l, tl, tm, pos); 54 return get_point(v->r, tm+1, tr, pos); 55 } 56 57 int main() { 58 ios::sync_with_stdio(false); 59 cin.tie(nullptr); 60 61 int n = 8; 62 vector<int> a = {5, 1, 4, 2, 7, 3, 6, 0}; 63 64 vector<Node*> roots; // roots[i] is the root of version i 65 66 Node* root0 = build(a, 0, n-1); // version 0 67 roots.push_back(root0); 68 69 // Make some updates, each creating a new version 70 Node* root1 = set_point(roots.back(), 0, n-1, 2, 10); // a[2] = 10 71 roots.push_back(root1); 72 73 Node* root2 = set_point(roots.back(), 0, n-1, 5, -3); // a[5] = -3 74 roots.push_back(root2); 75 76 // Queries on different versions 77 cout << "Version 0, sum[0..7] = " << sum_query(roots[0], 0, n-1, 0, 7) << "\n"; // original 78 cout << "Version 1, a[2] = " << get_point(roots[1], 0, n-1, 2) << "\n"; 79 cout << "Version 2, sum[2..6] = " << sum_query(roots[2], 0, n-1, 2, 6) << "\n"; 80 81 // Time-travel check: version 0 remains unchanged 82 cout << "Version 0, a[2] (should be 4) = " << get_point(roots[0], 0, n-1, 2) << "\n"; 83 84 return 0; 85 } 86
This program implements a persistent array using a segment tree with structural sharing. Each set_point clones only the nodes along the path to the target index, recomputing sums on the way up while reusing the untouched subtrees. range sum and point access both follow O(log n) paths from the root of the chosen version.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Persistent treap implementing an ordered multiset. 5 // Supports: insert(x), erase(x) (one occurrence), find(x), kth(k), order_of_key(x). 6 // Duplicates are handled via a count stored in each node. 7 // Persistence: split and merge clone only nodes on the recursion path. 8 9 struct Node { 10 int key; // BST key 11 uint32_t pri; // heap priority (random) 12 int cnt; // multiplicity of key at this node 13 int sz; // size of subtree counting multiplicities 14 Node *l, *r; 15 }; 16 17 static const uint32_t FIXED_SEED = 712367u; 18 static mt19937 rng(FIXED_SEED); 19 20 inline int getsz(Node* t) { return t ? t->sz : 0; } 21 22 inline Node* clone(Node* t) { 23 if (!t) return nullptr; 24 Node* u = new Node(*t); // shallow copy of fields; children pointers kept 25 return u; 26 } 27 28 inline Node* make_node(int key) { 29 Node* t = new Node(); 30 t->key = key; 31 t->pri = rng(); 32 t->cnt = 1; 33 t->sz = 1; 34 t->l = t->r = nullptr; 35 return t; 36 } 37 38 inline void pull(Node* t) { 39 if (!t) return; 40 t->sz = t->cnt + getsz(t->l) + getsz(t->r); 41 } 42 43 // Split by key: L has keys <= key, R has keys > key 44 pair<Node*, Node*> split(Node* t, int key) { 45 if (!t) return {nullptr, nullptr}; 46 if (t->key <= key) { 47 Node* u = clone(t); // clone path node 48 auto [l2, r2] = split(t->r, key); 49 u->r = l2; 50 pull(u); 51 return {u, r2}; 52 } else { 53 Node* u = clone(t); 54 auto [l2, r2] = split(t->l, key); 55 u->l = r2; 56 pull(u); 57 return {l2, u}; 58 } 59 } 60 61 // Merge: all keys in a <= keys in b 62 Node* merge(Node* a, Node* b) { 63 if (!a) return b; 64 if (!b) return a; 65 if (a->pri > b->pri) { 66 Node* u = clone(a); 67 u->r = merge(u->r, b); 68 pull(u); 69 return u; 70 } else { 71 Node* u = clone(b); 72 u->l = merge(a, u->l); 73 pull(u); 74 return u; 75 } 76 } 77 78 // Insert one occurrence of x 79 Node* insert(Node* root, int x) { 80 // Isolate equal keys via two splits: (<= x-1), (= x), (> x) 81 auto [A, B] = split(root, x - 1); 82 auto [E, C] = split(B, x); 83 if (E) { 84 Node* u = clone(E); 85 u->cnt += 1; 86 pull(u); 87 E = u; 88 } else { 89 E = make_node(x); 90 } 91 return merge(merge(A, E), C); 92 } 93 94 // Erase one occurrence of x (if present) 95 Node* erase(Node* root, int x) { 96 auto [A, B] = split(root, x - 1); 97 auto [E, C] = split(B, x); 98 if (E) { 99 if (E->cnt > 1) { 100 Node* u = clone(E); 101 u->cnt -= 1; 102 pull(u); 103 E = u; 104 } else { 105 E = nullptr; // drop this key entirely 106 } 107 } 108 return merge(merge(A, E), C); 109 } 110 111 bool find_key(Node* t, int x) { 112 while (t) { 113 if (x == t->key) return true; 114 if (x < t->key) t = t->l; else t = t->r; 115 } 116 return false; 117 } 118 119 // Count of elements strictly less than x 120 int order_of_key(Node* t, int x) { 121 int ans = 0; 122 while (t) { 123 if (x <= t->key) { 124 t = t->l; 125 } else { 126 ans += getsz(t->l) + t->cnt; 127 t = t->r; 128 } 129 } 130 return ans; 131 } 132 133 // k-th (1-indexed). Returns pair(found, value) 134 pair<bool,int> kth(Node* t, int k) { 135 if (k <= 0 || k > getsz(t)) return {false, 0}; 136 while (t) { 137 int left = getsz(t->l); 138 if (k <= left) { 139 t = t->l; 140 } else if (k <= left + t->cnt) { 141 return {true, t->key}; 142 } else { 143 k -= left + t->cnt; 144 t = t->r; 145 } 146 } 147 return {false, 0}; 148 } 149 150 int main() { 151 ios::sync_with_stdio(false); 152 cin.tie(nullptr); 153 154 vector<Node*> ver; // version roots 155 ver.push_back(nullptr); // version 0: empty set 156 157 // Insert sequence: [5, 2, 8, 5] 158 ver.push_back(insert(ver.back(), 5)); // v1 159 ver.push_back(insert(ver.back(), 2)); // v2 160 ver.push_back(insert(ver.back(), 8)); // v3 161 ver.push_back(insert(ver.back(), 5)); // v4 (multiset has two 5's) 162 163 // Erase one 5 in a new version 164 ver.push_back(erase(ver.back(), 5)); // v5 165 166 // Queries across versions 167 cout << boolalpha; 168 cout << "v2 find 5: " << find_key(ver[2], 5) << "\n"; // true 169 cout << "v2 order_of_key(5): " << order_of_key(ver[2], 5) << "\n"; // count < 5 170 auto [ok1, x1] = kth(ver[4], 2); // in v4, 2nd smallest 171 cout << "v4 kth(2): found=" << ok1 << ", val=" << x1 << "\n"; 172 auto [ok2, x2] = kth(ver[5], 2); // after erasing one 5 173 cout << "v5 kth(2): found=" << ok2 << ", val=" << x2 << "\n"; 174 175 // Time travel: older versions unaffected 176 cout << "v1 size: " << getsz(ver[1]) << ", v4 size: " << getsz(ver[4]) << ", v5 size: " << getsz(ver[5]) << "\n"; 177 178 return 0; 179 } 180
This persistent treap stores multiplicities via a count per node and maintains subtree sizes for order statistics. split/merge are purely functional: they clone only touched nodes and return new roots, enabling persistence. insert and erase isolate the key via splits, adjust its count (creating or deleting the node), and merge back. Queries (find, order_of_key, kth) traverse as in a standard augmented BST.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Minimal persistent array (point set/get) with reference counting 5 // to reclaim nodes of discarded versions safely. 6 7 struct Node { 8 int val; // leaf value when tl==tr; internal nodes ignore val 9 Node *l, *r; 10 int ref; // reference count 11 Node(int v=0, Node* L=nullptr, Node* R=nullptr) : val(v), l(L), r(R), ref(1) {} 12 }; 13 14 void acquire(Node* x) { if (x) ++x->ref; } 15 void release(Node* x) { 16 if (!x) return; 17 if (--x->ref == 0) { 18 release(x->l); 19 release(x->r); 20 delete x; 21 } 22 } 23 24 Node* build(const vector<int>& a, int tl, int tr) { 25 if (tl == tr) return new Node(a[tl]); 26 int tm = (tl + tr) >> 1; 27 Node* L = build(a, tl, tm); 28 Node* R = build(a, tm+1, tr); 29 Node* v = new Node(0, L, R); 30 return v; 31 } 32 33 // Persistent point update with reference counting: 34 // Reuse untouched child by acquiring it; clone the path node. 35 Node* set_point(Node* v, int tl, int tr, int pos, int x) { 36 if (tl == tr) { 37 return new Node(x); // fresh leaf, ref=1 38 } 39 int tm = (tl + tr) >> 1; 40 Node* L; Node* R; 41 if (pos <= tm) { 42 L = set_point(v->l, tl, tm, pos, x); 43 R = v->r; acquire(R); // reuse right child 44 } else { 45 L = v->l; acquire(L); // reuse left child 46 R = set_point(v->r, tm+1, tr, pos, x); 47 } 48 Node* u = new Node(0, L, R); // internal node, ref=1 49 return u; 50 } 51 52 int get_point(Node* v, int tl, int tr, int pos) { 53 if (tl == tr) return v->val; 54 int tm = (tl + tr) >> 1; 55 if (pos <= tm) return get_point(v->l, tl, tm, pos); 56 return get_point(v->r, tm+1, tr, pos); 57 } 58 59 int main() { 60 ios::sync_with_stdio(false); 61 cin.tie(nullptr); 62 63 int n = 5; 64 vector<int> a = {7, 9, 3, 1, 4}; 65 66 // Build version 0 67 Node* v0 = build(a, 0, n-1); 68 69 // Create version 1: set a[2] = 100 70 Node* v1 = set_point(v0, 0, n-1, 2, 100); 71 72 // Create version 2 from v1: set a[4] = 0 73 Node* v2 = set_point(v1, 0, n-1, 4, 0); 74 75 cout << get_point(v0, 0, n-1, 2) << "\n"; // 3 76 cout << get_point(v1, 0, n-1, 2) << "\n"; // 100 77 cout << get_point(v2, 0, n-1, 4) << "\n"; // 0 78 79 // Suppose we no longer need v0 and v1; we can release them. 80 release(v0); 81 release(v1); 82 83 // v2 remains usable; shared subtrees whose ref hit zero were freed. 84 cout << get_point(v2, 0, n-1, 0) << "\n"; // 7 85 86 // Finally release v2 to avoid leaks. 87 release(v2); 88 return 0; 89 } 90
This example shows safe memory reclamation for a persistent array using reference counting. Each new node starts with ref=1. When a new version reuses an existing child subtree, acquire increments the reference. When a version is discarded, release recursively decrements refs and deletes nodes whose count reaches zero, without touching still-shared nodes.