πŸ—‚οΈData StructureAdvanced

Persistent Array and Treap

Key Points

  • β€’
    Persistence lets you keep every past version of a data structure while making O(log n) updates and queries on any version.
  • β€’
    A persistent array can be implemented with a path-copy segment tree, giving O(log n) point updates and O(log n) access while sharing most nodes across versions.
  • β€’
    A persistent treap supports full binary-search-tree operations (insert, erase, find, k-th, orde_key) with expected O(log n) time per operation.
  • β€’
    Copy-on-write clones only nodes on the root-to-leaf path you touch, so each update adds O(log n) new nodes and reuses the rest.
  • β€’
    With q operations, total memory is roughly n + Θ(q log n) nodes, where n is the initial size.
  • β€’
    Treaps maintain a BST by key and a heap by a random priority, yielding expected height O(log n).
  • β€’
    Version roots (one pointer per version) are the handles to time-travel: queries take a root and behave as if they run on that historical state.
  • β€’
    Memory management matters: use pools for speed, and consider reference counting or lifetime rules if you need to reclaim shared nodes safely.

Prerequisites

  • β†’Binary search trees and segment trees β€” Persistent arrays rely on segment tree structure; treaps are randomized BSTs, so you need the basic traversal and update logic.
  • β†’Big-O and logarithms β€” Understanding why updates and queries are O(log n) and how total memory grows with q requires comfort with asymptotic analysis and logs.
  • β†’Pointer semantics and recursion in C++ β€” Persistence is implemented by cloning nodes and passing pointers; operations are typically recursive.
  • β†’Randomization β€” Treaps depend on random priorities to guarantee expected logarithmic height.
  • β†’Augmented trees (subtree sizes, aggregates) β€” Order statistics in treaps and range sums in arrays come from maintaining augmented data in nodes.
  • β†’Memory management strategies β€” Using pools for allocation speed or reference counting for reclamation helps keep performance and memory under control.

Detailed Explanation

Tap terms for definitions

01Overview

Persistent data structures preserve history: instead of mutating a structure in-place, every update returns a new version that shares most of its memory with the old one. This is crucial in problems that require time-travel queries (e.g., answer using the array as it was after the 7th update) or undo/redo functionality. Two workhorses for persistence in competitive programming are the persistent array and the persistent treap. A persistent array can be realized as a functional segment tree where each point update clones O(log n) nodes along a path, leaving the rest shared. A persistent treap augments a binary search tree with random priorities (heap property) and performs updates via split/merge; cloning touched nodes yields persistence with expected O(log n) time per operation. These techniques scale well: per operation cost is logarithmic and memory is linear in the number of versions times the height touched. They also compose: you can store aggregates (like sums) in nodes for range queries, and order-statistics (like k-th) in treaps via subtree sizes. For Codeforces-level problems, these structures enable elegant solutions to offline range queries, rollback, and multi-version indexing while keeping code and complexity manageable.

02Intuition & Analogies

Imagine a library where each update to a book must be archived. Instead of reprinting the entire book for every change, you slip a new page into a binder that points to the unchanged old pages. The table of contents points to these pages, so each edition shares most pages with prior editions. That is copy-on-write persistence: you copy only what you modify and reuse everything else. A persistent array works like a decision tree that guides you to the correct page (leaf). To change a single entry, you only replace the guides you traverse (the nodes on your path), leaving the rest as-is. Accessing an old edition is simply using the old table of contents. A treap is like a shelf sorted by title (BST order) but also stacked by a random stickiness (priority) so that items tend to form balanced stacks. To insert or remove an item, you split or merge shelves, rearranging only a skinny column of stacks. With persistence, you don’t throw old stacks away; you make fresh copies of the few stacks you touch and leave the rest pointing to the original stacks. Version roots are the labels for each shelf configuration through time. Queries just say, "Start from this label," and everything behaves as if the world were frozen at that moment.

03Formal Definition

A partially persistent data structure supports read operations on any historical version and update operations that create a new version without modifying prior ones. Formally, let be the initial version with root pointer . Each update op applied to returns a new root that defines a new version , and for all j k, remains accessible for queries. A persistent array of length n can be represented as a complete binary tree of height h = n with leaves storing array elements and internal nodes storing aggregates. A point update at index p clones all nodes on the root-to-leaf path to p; the number of new nodes is O(h). Access (read) follows the same path in O(h). A treap is a binary search tree keyed by x with an additional heap property on an independent random priority y: for every node u with key and priority , all keys in the left subtree , all keys in the right , and is greater than priorities of its children. Random independent priorities make the expected height = O( n). Persistence for treaps is achieved by implementing split and merge as pure functions that return new roots and clone only the nodes on the recursion path, yielding O( n) expected new nodes per operation.

04When to Use

Use a persistent array when you need random access and point updates across time, particularly for: (1) offline queries over prefixes or versions (e.g., number of elements \le x in [l, r] via version differences), (2) rollback or undo functionality without auxiliary stacks, and (3) maintaining historical snapshots cheaply. Use a persistent treap when you need full BST operations with versioning: dynamic ordered sets supporting insert, erase, find, k-th smallest, and order_of_key across time. Typical applications include: (a) dynamic order statistics with historical queries, (b) keeping multiple branches of states in search/DP over subsets or timelines, (c) problems like 'k-th number in a subarray' using a value-compressed persistent array or treap, and (d) implementing ropes (sequence editors) with undo/redo by using an implicit persistent treap. Avoid persistence if constant factors and memory are too tight (e.g., extreme online constraints with very large q), or if simple rollback via a change stack is easier (and you do not need random access to arbitrary time points).

⚠️Common Mistakes

Common pitfalls include: (1) Forgetting to clone on the path when updating and accidentally mutating shared nodes, which corrupts old versions; always treat nodes as immutable and allocate fresh nodes for any change. (2) Incorrect version bookkeeping: losing or overwriting a root pointer means that version is gone; always store roots in a vector. (3) Treap priority bugs: using deterministic priorities or small ranges can cause unbalanced trees; use a high-quality RNG (e.g., std::mt19937) and 32-bit or 64-bit priorities. (4) Mishandling duplicates in treaps: either augment with a count per key or disambiguate with (key, id) pairs; ensure order_of_key and k-th respect counts. (5) Stack overflows due to deep recursion when n is extremely large; consider tail-recursive patterns, iterative split/merge, or increasing stack size when required. (6) Memory blowups: each update costs O(log n) nodes; with millions of operations, pre-allocate a pool, and consider reference counting only if you truly need to reclaim memory for discarded versions. (7) Off-by-one errors from mixing 0-based and 1-based indexing, especially in implicit treaps (split by position) and in segment tree ranges; consistently document and assert your conventions.

Key Formulas

Tree Height

Explanation: The height of a complete binary tree over n elements is the ceiling of log base 2 of n. This bounds the path length for updates and queries in persistent arrays.

Segment Tree Build

Explanation: Building a (persistent) segment tree visits each array position once and does constant work per node. Total build time is linear in n.

Path-Copy Update Cost

Explanation: Each point update in a persistent array or treap touches only a root-to-leaf (or recursion) path. Time and number of newly allocated nodes are both logarithmic.

Total Node Count Over q Updates

Explanation: Starting from n-leaf structure, each of the q updates adds O(log n) fresh nodes while reusing the rest. Total memory grows linearly with q log n.

Expected Treap Height

Explanation: Random priorities imply the expected height of a treap is logarithmic in n. More precisely, the expected height is at most about 2 ln n plus a constant.

Order Statistics

Explanation: The rank of x counts how many elements are strictly smaller. The k-th smallest is the minimal x whose rank reaches k. Treaps maintain subtree sizes to support these in O(log n).

Range Sum via Prefix

Explanation: Range sums can be computed by subtracting two prefix sums. With persistent arrays, you can compute across versions by combining prefixes from different roots.

Complexity Analysis

For a persistent array implemented via a functional segment tree over n elements, the tree height is h = ⌈log2 nβŒ‰. A build pass visits each node once, taking O(n) time and O(n) space. Each point update to position p clones exactly the h nodes on the root-to-leaf path and recomputes O(h) aggregates; thus the update time is O(log n) and the number of new nodes allocated per update is O(log n). Point access is also O(log n) by descending to the leaf. If internal nodes store aggregates (e.g., sums), range queries run in O(log n) as well. Over q updates, the total number of nodes across all versions is N(q) = n + Θ(q log n); this is the dominant memory term in practice. For a persistent treap managing m keys, operations are randomized due to priorities. The expected height is O(log m), giving expected O(log m) time for split, merge, insert, erase, find, k-th, and orde_key. Persistence is obtained by cloning the O(log m) nodes traversed in these operations; hence each operation allocates O(log m) new nodes on average. Space across q treap updates grows as m0 + Θ(q log m), where m0 is the initial size (possibly 0). Constants matter: recursion, random priority generation, and allocation cost can be bottlenecks; therefore, a node pool (arena allocator) is recommended to reduce allocator overhead. If old versions are discarded, reference counting can reclaim nodes when their reference count drops to zero, but it complicates code and is unnecessary when memory is sufficient for all versions until program end.

Code Examples

Persistent Array via Path-Copy Segment Tree (point update, point/range query)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Persistent array implemented as a functional segment tree.
5// Supports: build, point update (set), point get, and range sum query.
6// All operations are O(log n). Each update allocates O(log n) nodes.
7
8struct Node {
9 int sum; // aggregate (here: sum); for pure array access, leaf holds the value
10 Node *l, *r; // children
11 Node(int s=0, Node* L=nullptr, Node* R=nullptr) : sum(s), l(L), r(R) {}
12};
13
14// Build from initial array a[tl..tr]
15Node* build(const vector<int>& a, int tl, int tr) {
16 if (tl == tr) return new Node(a[tl], nullptr, nullptr);
17 int tm = (tl + tr) >> 1;
18 Node* L = build(a, tl, tm);
19 Node* R = build(a, tm+1, tr);
20 return new Node(L->sum + R->sum, L, R);
21}
22
23// Create a new version where position pos is set to val
24Node* set_point(Node* v, int tl, int tr, int pos, int val) {
25 if (tl == tr) {
26 return new Node(val, nullptr, nullptr);
27 }
28 int tm = (tl + tr) >> 1;
29 if (pos <= tm) {
30 Node* L = set_point(v->l, tl, tm, pos, val); // clone path on left
31 Node* R = v->r; // reuse right child
32 return new Node(L->sum + R->sum, L, R);
33 } else {
34 Node* L = v->l; // reuse left child
35 Node* R = set_point(v->r, tm+1, tr, pos, val); // clone path on right
36 return new Node(L->sum + R->sum, L, R);
37 }
38}
39
40// Range sum query on version rooted at v
41int sum_query(Node* v, int tl, int tr, int l, int r) {
42 if (!v || l > r) return 0;
43 if (l == tl && r == tr) return v->sum;
44 int tm = (tl + tr) >> 1;
45 return sum_query(v->l, tl, tm, l, min(r, tm)) +
46 sum_query(v->r, tm+1, tr, max(l, tm+1), r);
47}
48
49// Point get by querying [pos, pos]
50int get_point(Node* v, int tl, int tr, int pos) {
51 if (tl == tr) return v->sum;
52 int tm = (tl + tr) >> 1;
53 if (pos <= tm) return get_point(v->l, tl, tm, pos);
54 return get_point(v->r, tm+1, tr, pos);
55}
56
57int main() {
58 ios::sync_with_stdio(false);
59 cin.tie(nullptr);
60
61 int n = 8;
62 vector<int> a = {5, 1, 4, 2, 7, 3, 6, 0};
63
64 vector<Node*> roots; // roots[i] is the root of version i
65
66 Node* root0 = build(a, 0, n-1); // version 0
67 roots.push_back(root0);
68
69 // Make some updates, each creating a new version
70 Node* root1 = set_point(roots.back(), 0, n-1, 2, 10); // a[2] = 10
71 roots.push_back(root1);
72
73 Node* root2 = set_point(roots.back(), 0, n-1, 5, -3); // a[5] = -3
74 roots.push_back(root2);
75
76 // Queries on different versions
77 cout << "Version 0, sum[0..7] = " << sum_query(roots[0], 0, n-1, 0, 7) << "\n"; // original
78 cout << "Version 1, a[2] = " << get_point(roots[1], 0, n-1, 2) << "\n";
79 cout << "Version 2, sum[2..6] = " << sum_query(roots[2], 0, n-1, 2, 6) << "\n";
80
81 // Time-travel check: version 0 remains unchanged
82 cout << "Version 0, a[2] (should be 4) = " << get_point(roots[0], 0, n-1, 2) << "\n";
83
84 return 0;
85}
86

This program implements a persistent array using a segment tree with structural sharing. Each set_point clones only the nodes along the path to the target index, recomputing sums on the way up while reusing the untouched subtrees. range sum and point access both follow O(log n) paths from the root of the chosen version.

Time: O(log n) per update and per querySpace: Initial O(n) for build; O(log n) extra nodes per update; total O(n + q log n)
Persistent Treap (ordered multiset) with split/merge and full BST ops
1#include <bits/stdc++.h>
2using namespace std;
3
4// Persistent treap implementing an ordered multiset.
5// Supports: insert(x), erase(x) (one occurrence), find(x), kth(k), order_of_key(x).
6// Duplicates are handled via a count stored in each node.
7// Persistence: split and merge clone only nodes on the recursion path.
8
9struct Node {
10 int key; // BST key
11 uint32_t pri; // heap priority (random)
12 int cnt; // multiplicity of key at this node
13 int sz; // size of subtree counting multiplicities
14 Node *l, *r;
15};
16
17static const uint32_t FIXED_SEED = 712367u;
18static mt19937 rng(FIXED_SEED);
19
20inline int getsz(Node* t) { return t ? t->sz : 0; }
21
22inline Node* clone(Node* t) {
23 if (!t) return nullptr;
24 Node* u = new Node(*t); // shallow copy of fields; children pointers kept
25 return u;
26}
27
28inline Node* make_node(int key) {
29 Node* t = new Node();
30 t->key = key;
31 t->pri = rng();
32 t->cnt = 1;
33 t->sz = 1;
34 t->l = t->r = nullptr;
35 return t;
36}
37
38inline void pull(Node* t) {
39 if (!t) return;
40 t->sz = t->cnt + getsz(t->l) + getsz(t->r);
41}
42
43// Split by key: L has keys <= key, R has keys > key
44pair<Node*, Node*> split(Node* t, int key) {
45 if (!t) return {nullptr, nullptr};
46 if (t->key <= key) {
47 Node* u = clone(t); // clone path node
48 auto [l2, r2] = split(t->r, key);
49 u->r = l2;
50 pull(u);
51 return {u, r2};
52 } else {
53 Node* u = clone(t);
54 auto [l2, r2] = split(t->l, key);
55 u->l = r2;
56 pull(u);
57 return {l2, u};
58 }
59}
60
61// Merge: all keys in a <= keys in b
62Node* merge(Node* a, Node* b) {
63 if (!a) return b;
64 if (!b) return a;
65 if (a->pri > b->pri) {
66 Node* u = clone(a);
67 u->r = merge(u->r, b);
68 pull(u);
69 return u;
70 } else {
71 Node* u = clone(b);
72 u->l = merge(a, u->l);
73 pull(u);
74 return u;
75 }
76}
77
78// Insert one occurrence of x
79Node* insert(Node* root, int x) {
80 // Isolate equal keys via two splits: (<= x-1), (= x), (> x)
81 auto [A, B] = split(root, x - 1);
82 auto [E, C] = split(B, x);
83 if (E) {
84 Node* u = clone(E);
85 u->cnt += 1;
86 pull(u);
87 E = u;
88 } else {
89 E = make_node(x);
90 }
91 return merge(merge(A, E), C);
92}
93
94// Erase one occurrence of x (if present)
95Node* erase(Node* root, int x) {
96 auto [A, B] = split(root, x - 1);
97 auto [E, C] = split(B, x);
98 if (E) {
99 if (E->cnt > 1) {
100 Node* u = clone(E);
101 u->cnt -= 1;
102 pull(u);
103 E = u;
104 } else {
105 E = nullptr; // drop this key entirely
106 }
107 }
108 return merge(merge(A, E), C);
109}
110
111bool find_key(Node* t, int x) {
112 while (t) {
113 if (x == t->key) return true;
114 if (x < t->key) t = t->l; else t = t->r;
115 }
116 return false;
117}
118
119// Count of elements strictly less than x
120int order_of_key(Node* t, int x) {
121 int ans = 0;
122 while (t) {
123 if (x <= t->key) {
124 t = t->l;
125 } else {
126 ans += getsz(t->l) + t->cnt;
127 t = t->r;
128 }
129 }
130 return ans;
131}
132
133// k-th (1-indexed). Returns pair(found, value)
134pair<bool,int> kth(Node* t, int k) {
135 if (k <= 0 || k > getsz(t)) return {false, 0};
136 while (t) {
137 int left = getsz(t->l);
138 if (k <= left) {
139 t = t->l;
140 } else if (k <= left + t->cnt) {
141 return {true, t->key};
142 } else {
143 k -= left + t->cnt;
144 t = t->r;
145 }
146 }
147 return {false, 0};
148}
149
150int main() {
151 ios::sync_with_stdio(false);
152 cin.tie(nullptr);
153
154 vector<Node*> ver; // version roots
155 ver.push_back(nullptr); // version 0: empty set
156
157 // Insert sequence: [5, 2, 8, 5]
158 ver.push_back(insert(ver.back(), 5)); // v1
159 ver.push_back(insert(ver.back(), 2)); // v2
160 ver.push_back(insert(ver.back(), 8)); // v3
161 ver.push_back(insert(ver.back(), 5)); // v4 (multiset has two 5's)
162
163 // Erase one 5 in a new version
164 ver.push_back(erase(ver.back(), 5)); // v5
165
166 // Queries across versions
167 cout << boolalpha;
168 cout << "v2 find 5: " << find_key(ver[2], 5) << "\n"; // true
169 cout << "v2 order_of_key(5): " << order_of_key(ver[2], 5) << "\n"; // count < 5
170 auto [ok1, x1] = kth(ver[4], 2); // in v4, 2nd smallest
171 cout << "v4 kth(2): found=" << ok1 << ", val=" << x1 << "\n";
172 auto [ok2, x2] = kth(ver[5], 2); // after erasing one 5
173 cout << "v5 kth(2): found=" << ok2 << ", val=" << x2 << "\n";
174
175 // Time travel: older versions unaffected
176 cout << "v1 size: " << getsz(ver[1]) << ", v4 size: " << getsz(ver[4]) << ", v5 size: " << getsz(ver[5]) << "\n";
177
178 return 0;
179}
180

This persistent treap stores multiplicities via a count per node and maintains subtree sizes for order statistics. split/merge are purely functional: they clone only touched nodes and return new roots, enabling persistence. insert and erase isolate the key via splits, adjust its count (creating or deleting the node), and merge back. Queries (find, order_of_key, kth) traverse as in a standard augmented BST.

Time: Expected O(log n) per update and per querySpace: O(log n) new nodes per update on average; total O(n + q log n)
Persistent Array with Reference Counting (memory reclamation demo)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Minimal persistent array (point set/get) with reference counting
5// to reclaim nodes of discarded versions safely.
6
7struct Node {
8 int val; // leaf value when tl==tr; internal nodes ignore val
9 Node *l, *r;
10 int ref; // reference count
11 Node(int v=0, Node* L=nullptr, Node* R=nullptr) : val(v), l(L), r(R), ref(1) {}
12};
13
14void acquire(Node* x) { if (x) ++x->ref; }
15void release(Node* x) {
16 if (!x) return;
17 if (--x->ref == 0) {
18 release(x->l);
19 release(x->r);
20 delete x;
21 }
22}
23
24Node* build(const vector<int>& a, int tl, int tr) {
25 if (tl == tr) return new Node(a[tl]);
26 int tm = (tl + tr) >> 1;
27 Node* L = build(a, tl, tm);
28 Node* R = build(a, tm+1, tr);
29 Node* v = new Node(0, L, R);
30 return v;
31}
32
33// Persistent point update with reference counting:
34// Reuse untouched child by acquiring it; clone the path node.
35Node* set_point(Node* v, int tl, int tr, int pos, int x) {
36 if (tl == tr) {
37 return new Node(x); // fresh leaf, ref=1
38 }
39 int tm = (tl + tr) >> 1;
40 Node* L; Node* R;
41 if (pos <= tm) {
42 L = set_point(v->l, tl, tm, pos, x);
43 R = v->r; acquire(R); // reuse right child
44 } else {
45 L = v->l; acquire(L); // reuse left child
46 R = set_point(v->r, tm+1, tr, pos, x);
47 }
48 Node* u = new Node(0, L, R); // internal node, ref=1
49 return u;
50}
51
52int get_point(Node* v, int tl, int tr, int pos) {
53 if (tl == tr) return v->val;
54 int tm = (tl + tr) >> 1;
55 if (pos <= tm) return get_point(v->l, tl, tm, pos);
56 return get_point(v->r, tm+1, tr, pos);
57}
58
59int main() {
60 ios::sync_with_stdio(false);
61 cin.tie(nullptr);
62
63 int n = 5;
64 vector<int> a = {7, 9, 3, 1, 4};
65
66 // Build version 0
67 Node* v0 = build(a, 0, n-1);
68
69 // Create version 1: set a[2] = 100
70 Node* v1 = set_point(v0, 0, n-1, 2, 100);
71
72 // Create version 2 from v1: set a[4] = 0
73 Node* v2 = set_point(v1, 0, n-1, 4, 0);
74
75 cout << get_point(v0, 0, n-1, 2) << "\n"; // 3
76 cout << get_point(v1, 0, n-1, 2) << "\n"; // 100
77 cout << get_point(v2, 0, n-1, 4) << "\n"; // 0
78
79 // Suppose we no longer need v0 and v1; we can release them.
80 release(v0);
81 release(v1);
82
83 // v2 remains usable; shared subtrees whose ref hit zero were freed.
84 cout << get_point(v2, 0, n-1, 0) << "\n"; // 7
85
86 // Finally release v2 to avoid leaks.
87 release(v2);
88 return 0;
89}
90

This example shows safe memory reclamation for a persistent array using reference counting. Each new node starts with ref=1. When a new version reuses an existing child subtree, acquire increments the reference. When a version is discarded, release recursively decrements refs and deletes nodes whose count reaches zero, without touching still-shared nodes.

Time: O(log n) per update and per accessSpace: O(log n) new nodes per update; reclaimed space depends on which versions are released