🗂️Data StructureAdvanced

Treap

Key Points

  • A treap is a binary search tree on keys combined with a heap on random priorities, which keeps the tree balanced in expectation.
  • Because priorities are random, the expected height is O(log n), so search, insert, erase, split, and merge all run in expected O(log n).
  • Split divides a treap into two treaps by a key threshold, and merge combines two treaps assuming all keys in the left treap are smaller than those in the right treap.
  • Treaps are simple to implement, often shorter and less error-prone than AVL or Red-Black Trees, while providing similar performance on average.
  • An implicit treap treats in-order positions as indices, enabling array-like operations such as insert at position, erase range, reverse range, and range sums.
  • Order statistics like kth smallest and rank are easy by maintaining subtree sizes.
  • Randomization is crucial: use a strong RNG (e.g., mt19937) to avoid adversarial cases that break balance.
  • Common pitfalls include incorrect split/merge invariants, forgetting to update subtree sizes or lazy flags, and mishandling duplicates.

Prerequisites

  • Binary search treesTreaps enforce the BST property on keys; understanding BST traversal and search is essential.
  • Heaps and priority queuesTreaps also enforce a heap property on random priorities; this guides rotations or merges.
  • Big-O notation and expectationsTreap performance is analyzed in expected time; you need asymptotic and probabilistic reasoning.
  • RecursionSplit, merge, insert, and erase are implemented recursively and require careful base/recursive cases.
  • Augmented treesMaintaining subtree sizes/sums enables order statistics and range queries.
  • Lazy propagation basicsImplicit treaps frequently use lazy flags (e.g., reverse) to apply range updates efficiently.

Detailed Explanation

Tap terms for definitions

01Overview

A treap is a randomized balanced binary search tree that blends two structures: a binary search tree (BST) on keys and a heap on priorities. Each node stores a key (used to maintain in-order BST property) and a random priority (used to maintain heap property). The key insight is that if priorities are independent and uniformly random, then the resulting BST shape is distributed like a random BST, which has logarithmic expected height. That yields expected O(log n) time per operation for search, insertion, deletion, split, and merge. Treaps are attractive in competitive programming and systems work because they are simple to code (small constant factors, minimal rotation logic) and flexible to augment: you can maintain subtree sizes for order statistics, sums for range queries, and lazy propagation flags for range updates. Two core primitives make treaps especially powerful: split, which partitions the tree by key (or by index for implicit treaps), and merge, which joins two compatible treaps. Using these, you can implement complex operations like range deletions and concatenations in a few lines. In practice, treaps trade worst-case guarantees (they are probabilistic) for simplicity and excellent average-case performance, making them a go-to choice for dynamic ordered sets and sequence data structures.

02Intuition & Analogies

Imagine you’re building a seating chart (the BST) where everyone must sit in increasing order by their ticket number (the key). People also draw a random “priority” number from a hat when they arrive. Besides seating order, you impose a rule: no one can sit below someone with a lower random number (heap property). If two people compare seats, the person with better (smaller or larger depending on convention) priority sits higher in the hierarchy, but still in the right order by tickets. The random draw keeps the seating chart from becoming a long line; instead, it forms a nicely bushy hierarchy most of the time. Now think of split and merge as Lego moves. Split says: break the seating into those with ticket numbers less than K and those with ticket numbers at least K—like cutting a deck of cards at a given rank while preserving each half’s structure. Merge says: if everyone in group A has smaller tickets than everyone in group B, you can shuffle them together into one valid chart by letting the highest-priority person among the roots lead, and recursively merging the leftovers. For implicit treaps, forget explicit tickets: positions in the seating (1st, 2nd, 3rd) become the keys. This lets you do array tricks—insert at position 5, reverse positions 10 to 20, or sum a segment—by splitting around indices, tweaking the middle piece, and merging back. The magic is that randomness quietly maintains balance so all these splits and merges stay fast on average.

03Formal Definition

A treap is a rooted binary tree where every node v stores a pair (key(v), pri(v)). It satisfies: (1) BST property on keys: for every node v, all keys in v.left are strictly less than key(v) and all keys in v.right are strictly greater than key(v) (or with appropriate handling for duplicates), and the in-order traversal yields the keys in sorted order; (2) Heap property on priorities: for every node v, pri(v) is no greater (min-heap) or no less (max-heap) than the priorities of its children. If priorities are independent, identically distributed continuous random variables, then with probability 1, all priorities are distinct, and the treap is uniquely determined by the set of pairs. Equivalently, the treap is the BST obtained by inserting the keys in order of decreasing heap priority. Under random independent priorities, the distribution of tree shapes matches that of a random BST built from a random permutation of the keys. Therefore, the expected height is \(( n)\), which implies expected \(O( n)\) time per basic operation. The fundamental operations split(T, x) return two treaps (L, R) with all keys in and all keys in R \( x\), and merge(L, R) joins two treaps assuming max key in key in R while preserving both invariants. For an implicit treap, the BST key is the in-order index; sizes of subtrees encode positions so that splitting by index k yields the first k elements and the remainder.

04When to Use

Use treaps when you need a dynamic ordered set or sequence with efficient average-case performance and simple code. Typical use cases include: (1) Ordered set/map with insert, erase, find, lower_bound, predecessor/successor, and order statistics (kth, rank), all in expected O(log n). (2) Batch operations via split/merge, such as deleting all keys in a key range [L, R] with two splits and one merge, concatenating sequences, or extracting subranges. (3) Augmented trees: maintain aggregates like subtree sum, min, max, or custom monoids to answer range queries; combine with lazy propagation flags (e.g., reverse segment, add-to-segment) for array-like updates. (4) Implicit treap as a flexible array: insert at position, erase a range, reverse a subarray, rotate segments, and compute range sums or minima—often replacing more complex structures like splay trees or ropes. Prefer treaps when coding time is tight and worst-case adversarial inputs are unlikely or can be mitigated with strong randomization. If you need strict worst-case guarantees, or the environment is adversarial (e.g., predictable RNG), balanced deterministic trees (AVL/Red-Black) may be more appropriate. For purely static arrays with only range queries, a segment tree may be simpler; for hash-based unordered operations without order statistics, an unordered_map/set might suffice.

⚠️Common Mistakes

• Inconsistent split definition: Decide whether split(T, x) puts keys < x on the left and (\ge x) on the right, or (\le x) on the left and > x on the right, and use that consistently everywhere (insert, erase, range operations). Mixing conventions leads to lost or duplicated keys. • Forgetting to update metadata: Always recompute size, sum, min/max, etc., after structural changes (rotations, merges/splits). Missing updates cause incorrect ranks, kth results, or aggregates. • Neglecting lazy propagation: In implicit treaps with reverse or add tags, always push() before descending and pull() after changing children; otherwise, indices and aggregates drift from reality. • Mishandling duplicates: Decide on a policy—either store a count in each node (multiset) or inject a tiebreaker (e.g., a unique id) into the key; naive handling can violate the BST invariant. • Weak RNG or repeated seeds: Using rand() or reseeding per operation can produce patterns that degrade performance; prefer a single long-lived mt19937 seeded from a good source. • Memory pitfalls: Not deleting detached subtrees in split-based erases causes leaks; dangling pointers can occur if you reuse node pointers after merges/splits incorrectly. • Off-by-one in implicit indices: Be explicit about 0-based vs 1-based splits; segment [l, r] typically uses split at l and r+1. • Deep recursion: Although expected depth is O(log n), pathological inputs can cause deep recursion; consider tail recursion elimination or iterative wrappers if stack limits are tight.

Key Formulas

BST Invariant

Explanation: Every node’s left subtree contains only smaller keys and the right subtree only larger keys. This ensures in-order traversal yields sorted keys.

Max-Heap Invariant

Explanation: Parent priorities dominate children’s priorities (or the reverse for min-heap). This, together with the BST property, uniquely shapes the treap.

Harmonic Numbers

Explanation: Harmonic numbers approximate the natural log up to Euler–Mascheroni constant \(\). They appear in expected depth and path length of random BSTs/treaps.

Expected Depth of Key x

Explanation: If x is the k-th smallest key among n, its expected depth is the sum of two harmonic numbers minus 2. In plain terms, nodes are expected to be O(log n) levels below the root.

Expected Height

Explanation: The expected height of a treap with n nodes is proportional to log n. This underpins the expected O(log n) time for operations.

Subtree Size

Explanation: Maintaining subtree sizes lets you find the k-th smallest element and ranks in O(log n) time by comparing k with left sizes.

Merge Complexity

Explanation: Merge descends along a path defined by priorities, whose expected length equals the tree height. Thus expected time is logarithmic.

Subtree Sum

Explanation: For implicit treaps with range sums, keep an aggregate sum per node to answer queries and maintain it after updates.

Distinct Priorities

Explanation: Assuming real-valued random priorities prevents ties with probability 1, which simplifies reasoning about structure and uniqueness.

Implicit Index

Explanation: In implicit treaps, the in-order position of a node equals one plus the size of its left subtree, enabling splits by index.

Complexity Analysis

Let n be the current number of nodes. Under independent random priorities, the treap’s shape matches that of a random BST. The expected height is Θ(log n), and more precisely, typical node depths have expectation about 2 ln n, while the tree height is also Θ(log n) in expectation and with high probability. All fundamental operations traverse paths whose length is O(). Therefore: search, predecessor/successor, insertion, and deletion each take expected O(log n) time. Split and merge are implemented by descending from the roots and recursing into one child per step; each step removes one level of height, so each is expected O(log n). Order-statistic operations (k-th, rank) depend on subtree sizes and therefore also take expected O(log n). Augmentations like sums/min/max are maintained in O(1) per structural change, preserving the asymptotic costs. For implicit treaps, splitting by index and merging remain expected O(log n). Range operations like reverse or add on [l, r] are composed of O(1) splits/merges plus O(1) lazy tag toggles, thus also expected O(log n). Space is O(n) for nodes plus O(1) per augmentation field (e.g., size, sum, tags). Recursive implementations use O() stack space, which is expected O(log n). Building from k insertions has expected O(k log k); if keys are inserted in random priority order via merge after a single split, performance remains logarithmic per operation. Although worst-case time can be O(n) (e.g., adversarial priorities), using a strong RNG makes such cases vanishingly unlikely in practice.

Code Examples

Rotation-based Treap (Multiset) with Order Statistics
1#include <bits/stdc++.h>
2using namespace std;
3
4// Rotation-based treap that supports duplicates (multiset),
5// rank (count of keys < x), kth smallest, lower_bound, insert, erase.
6// Expected O(log n) per operation.
7
8struct Node {
9 int key; // BST key
10 uint32_t pri; // heap priority (max-heap)
11 int cnt; // multiplicity of this key
12 int sz; // size = cnt + size(left) + size(right)
13 Node *l, *r;
14 Node(int k, uint32_t p): key(k), pri(p), cnt(1), sz(1), l(nullptr), r(nullptr) {}
15};
16
17static mt19937 rng((uint32_t)chrono::steady_clock::now().time_since_epoch().count());
18
19int getsz(Node* t){ return t ? t->sz : 0; }
20void pull(Node* t){ if(t) t->sz = t->cnt + getsz(t->l) + getsz(t->r); }
21
22void rotateRight(Node* &t){
23 Node* x = t->l; // x becomes new root of this subtree
24 t->l = x->r; // move x's right subtree to t's left
25 x->r = t; // t becomes right child of x
26 pull(t); pull(x); // update sizes bottom-up
27 t = x; // set new root
28}
29
30void rotateLeft(Node* &t){
31 Node* x = t->r; // x becomes new root of this subtree
32 t->r = x->l; // move x's left subtree to t's right
33 x->l = t; // t becomes left child of x
34 pull(t); pull(x); // update sizes bottom-up
35 t = x; // set new root
36}
37
38void insert(Node* &t, int key){
39 if(!t){ t = new Node(key, rng()); return; }
40 if(key == t->key){
41 t->cnt++; pull(t); return;
42 } else if(key < t->key){
43 insert(t->l, key);
44 if(t->l && t->l->pri > t->pri) rotateRight(t);
45 } else {
46 insert(t->r, key);
47 if(t->r && t->r->pri > t->pri) rotateLeft(t);
48 }
49 pull(t);
50}
51
52void erase(Node* &t, int key){
53 if(!t) return;
54 if(key == t->key){
55 if(t->cnt > 1){ t->cnt--; pull(t); return; }
56 // remove node: rotate a higher-priority child up until node has <=1 child
57 if(!t->l){ Node* tmp = t; t = t->r; delete tmp; return; }
58 if(!t->r){ Node* tmp = t; t = t->l; delete tmp; return; }
59 if(t->l->pri > t->r->pri){ rotateRight(t); erase(t->r, key); }
60 else { rotateLeft(t); erase(t->l, key); }
61 } else if(key < t->key){
62 erase(t->l, key);
63 } else {
64 erase(t->r, key);
65 }
66 pull(t);
67}
68
69// count of keys strictly less than x
70int rankOf(Node* t, int x){
71 if(!t) return 0;
72 if(x <= t->key) return rankOf(t->l, x);
73 return getsz(t->l) + t->cnt + rankOf(t->r, x);
74}
75
76// kth smallest (1-indexed); assumes 1 <= k <= size
77int kth(Node* t, int k){
78 if(!t) throw runtime_error("k out of bounds");
79 int left = getsz(t->l);
80 if(k <= left) return kth(t->l, k);
81 if(k <= left + t->cnt) return t->key;
82 return kth(t->r, k - left - t->cnt);
83}
84
85// lower_bound: smallest key >= x; returns pair(found?, value)
86pair<bool,int> lower_bound(Node* t, int x){
87 int ans = 0; bool ok = false;
88 while(t){
89 if(t->key >= x){ ans = t->key; ok = true; t = t->l; }
90 else t = t->r;
91 }
92 return {ok, ans};
93}
94
95// Simple demo
96int main(){
97 ios::sync_with_stdio(false);
98 cin.tie(nullptr);
99
100 Node* root = nullptr;
101 vector<int> a = {5, 1, 7, 3, 5, 9, 2, 5};
102 for(int x : a) insert(root, x);
103
104 cout << "size = " << getsz(root) << "\n"; // 8
105 cout << "rankOf(5) = " << rankOf(root, 5) << "\n"; // count of keys < 5
106 cout << "kth(4) = " << kth(root, 4) << "\n"; // 4th smallest key
107
108 auto lb = lower_bound(root, 6);
109 cout << "lower_bound(6): ";
110 if(lb.first) cout << lb.second << "\n"; else cout << "none\n";
111
112 erase(root, 5); // removes one occurrence of 5
113 cout << "size after erasing one 5 = " << getsz(root) << "\n";
114
115 // Clean-up omitted for brevity (would require a post-order delete)
116 return 0;
117}
118

This rotation-based treap stores multiplicities (cnt) to act like a multiset. It maintains subtree sizes to support rank and kth queries. Rotations ensure the heap property by priority while preserving BST order. Expected time for each operation is logarithmic due to the randomized shape.

Time: Expected O(log n) per operationSpace: O(n)
FHQ Treap with split/merge and range deletion [L, R]
1#include <bits/stdc++.h>
2using namespace std;
3
4// FHQ treap: operations implemented via split/merge only (no explicit rotations).
5// Demonstrates insert, erase by key, and delete-by-range [L, R] using two splits.
6
7struct Node {
8 int key; // BST key (assumed unique in this demo)
9 uint32_t pri; // heap priority (max-heap)
10 int sz; // subtree size
11 Node *l, *r;
12 Node(int k, uint32_t p): key(k), pri(p), sz(1), l(nullptr), r(nullptr) {}
13};
14
15static mt19937 rng((uint32_t)chrono::steady_clock::now().time_since_epoch().count());
16
17int getsz(Node* t){ return t ? t->sz : 0; }
18void pull(Node* t){ if(t) t->sz = 1 + getsz(t->l) + getsz(t->r); }
19
20// Split by key: L has keys < x, R has keys >= x
21void split(Node* t, int x, Node* &L, Node* &R){
22 if(!t){ L = R = nullptr; return; }
23 if(t->key < x){
24 split(t->r, x, t->r, R);
25 L = t; pull(L);
26 } else {
27 split(t->l, x, L, t->l);
28 R = t; pull(R);
29 }
30}
31
32// Merge assumes all keys in L < all keys in R
33Node* merge(Node* L, Node* R){
34 if(!L || !R) return L ? L : R;
35 if(L->pri > R->pri){
36 L->r = merge(L->r, R);
37 pull(L); return L;
38 } else {
39 R->l = merge(L, R->l);
40 pull(R); return R;
41 }
42}
43
44Node* insert(Node* root, int key){
45 Node *A, *B; split(root, key, A, B);
46 Node* m = new Node(key, rng());
47 return merge(merge(A, m), B);
48}
49
50Node* eraseKey(Node* root, int key){
51 Node *A, *B, *C; // A: < key, B: [key, +inf)
52 split(root, key, A, B);
53 split(B, key+1, B, C); // B: == key, C: > key
54 // delete all nodes in B (here keys are unique, so B is at most one node)
55 // In general, iterate and delete B subtree
56 // Freeing memory:
57 function<void(Node*)> del = [&](Node* t){ if(!t) return; del(t->l); del(t->r); delete t; };
58 del(B);
59 return merge(A, C);
60}
61
62// Delete all keys in [L, R]
63Node* eraseRange(Node* root, int Lb, int Rb){
64 Node *A, *B, *C; // A: < Lb, B: [Lb, +inf)
65 split(root, Lb, A, B);
66 Node *M, *C2; // M: [Lb, Rb], C2: > Rb
67 split(B, Rb+1, M, C2);
68 function<void(Node*)> del = [&](Node* t){ if(!t) return; del(t->l); del(t->r); delete t; };
69 del(M);
70 return merge(A, C2);
71}
72
73int main(){
74 ios::sync_with_stdio(false);
75 cin.tie(nullptr);
76
77 Node* root = nullptr;
78 vector<int> keys = {10, 5, 20, 7, 15, 30, 3, 8};
79 for(int k : keys) root = insert(root, k);
80
81 cout << "size = " << getsz(root) << "\n"; // 8
82 root = eraseKey(root, 7);
83 cout << "size after erase 7 = " << getsz(root) << "\n"; // 7
84
85 root = eraseRange(root, 8, 18); // remove keys in [8,18]: 8,10,15
86 cout << "size after erase [8,18] = " << getsz(root) << "\n"; // 4
87
88 // Clean-up omitted for brevity
89 return 0;
90}
91

This FHQ treap uses split and merge exclusively. Insertion splits by key and merges the new node in between. Erasing a single key isolates it via two splits, deletes it, and merges the remaining parts. Range deletion [L, R] takes two splits to isolate the middle and deletes it. All operations take expected logarithmic time.

Time: Expected O(log n) per insert/erase/split/merge; O(k) to free k deleted nodesSpace: O(n)
Implicit Treap (sequence) with range reverse and range sum
1#include <bits/stdc++.h>
2using namespace std;
3
4// Implicit treap: indices are implicit via in-order positions.
5// Supports: insert at position, erase range, reverse range, and range sum.
6
7struct Node {
8 long long val; // value stored at this position
9 uint32_t pri; // random priority (max-heap)
10 int sz; // subtree size (number of elements)
11 long long sum; // subtree sum aggregate
12 bool rev; // lazy reverse flag
13 Node *l, *r;
14 Node(long long v, uint32_t p): val(v), pri(p), sz(1), sum(v), rev(false), l(nullptr), r(nullptr) {}
15};
16
17static mt19937 rng((uint32_t)chrono::steady_clock::now().time_since_epoch().count());
18
19int getsz(Node* t){ return t ? t->sz : 0; }
20long long getsum(Node* t){ return t ? t->sum : 0LL; }
21
22void push(Node* t){
23 if(t && t->rev){
24 t->rev = false;
25 swap(t->l, t->r);
26 if(t->l) t->l->rev ^= 1;
27 if(t->r) t->r->rev ^= 1;
28 }
29}
30
31void pull(Node* t){
32 if(!t) return;
33 t->sz = 1 + getsz(t->l) + getsz(t->r);
34 t->sum = t->val + getsum(t->l) + getsum(t->r);
35}
36
37// Split by index k: first k elements go to A, rest to B (0-based k)
38void split(Node* t, int k, Node* &A, Node* &B){
39 if(!t){ A = B = nullptr; return; }
40 push(t);
41 int left = getsz(t->l);
42 if(k <= left){
43 split(t->l, k, A, t->l);
44 B = t; pull(B);
45 } else {
46 split(t->r, k - left - 1, t->r, B);
47 A = t; pull(A);
48 }
49}
50
51Node* merge(Node* A, Node* B){
52 if(!A || !B) return A ? A : B;
53 if(A->pri > B->pri){
54 push(A);
55 A->r = merge(A->r, B);
56 pull(A); return A;
57 } else {
58 push(B);
59 B->l = merge(A, B->l);
60 pull(B); return B;
61 }
62}
63
64// Insert value v at position pos (0-based)
65Node* insertAt(Node* root, int pos, long long v){
66 Node *A, *B; split(root, pos, A, B);
67 Node* m = new Node(v, rng());
68 return merge(merge(A, m), B);
69}
70
71// Erase range [l, r] inclusive (0-based)
72Node* eraseRange(Node* root, int l, int r){
73 Node *A, *B, *C; // A: [0, l-1], B: [l, n-1]
74 split(root, l, A, B);
75 Node *M, *C2; // M: [l, r], C2: [r+1, n-1]
76 split(B, r - l + 1, M, C2);
77 function<void(Node*)> del = [&](Node* t){ if(!t) return; del(t->l); del(t->r); delete t; };
78 del(M);
79 return merge(A, C2);
80}
81
82// Reverse range [l, r]
83Node* reverseRange(Node* root, int l, int r){
84 Node *A, *B, *C; split(root, l, A, B);
85 Node *M, *C2; split(B, r - l + 1, M, C2);
86 if(M) M->rev ^= 1; // toggle reverse flag
87 return merge(A, merge(M, C2));
88}
89
90// Range sum query on [l, r]
91long long rangeSum(Node* &root, int l, int r){
92 Node *A, *B, *C; split(root, l, A, B);
93 Node *M, *C2; split(B, r - l + 1, M, C2);
94 long long ans = getsum(M);
95 root = merge(A, merge(M, C2)); // restore
96 return ans;
97}
98
99int main(){
100 ios::sync_with_stdio(false);
101 cin.tie(nullptr);
102
103 Node* root = nullptr;
104 // Build array [1, 2, 3, 4, 5]
105 for(int i = 0; i < 5; ++i) root = insertAt(root, i, i+1);
106
107 cout << "sum[0,4] = " << rangeSum(root, 0, 4) << "\n"; // 15
108 root = reverseRange(root, 1, 3); // array becomes [1,4,3,2,5]
109 cout << "sum[1,3] = " << rangeSum(root, 1, 3) << "\n"; // 4+3+2 = 9
110
111 root = eraseRange(root, 2, 3); // erase positions 2..3 -> [1,4,5]
112 cout << "sum[0,2] = " << rangeSum(root, 0, 2) << "\n"; // 10
113
114 // Clean-up omitted for brevity
115 return 0;
116}
117

This implicit treap treats in-order positions as indices. Splitting by index partitions the sequence. A reverse is implemented by toggling a lazy rev flag and pushing it when needed. Range sums are aggregates maintained via pull(). Range updates/queries decompose into O(1) splits/merges, giving expected logarithmic performance.

Time: Expected O(log n) per insert/erase/reverse/sumSpace: O(n)