Chtholly Tree (ODT - Old Driver Tree)
Key Points
- •Chtholly Tree (ODT) stores an array as a set of non-overlapping value-constant intervals and updates by cutting and replacing whole ranges.
- •Its key operation split(x) cuts the interval containing x into [l, x-1] and [x, r], enabling precise range editing.
- •Range assign(l, r, v) is done by splitting at l and r+1, erasing all fully covered intervals, and inserting one interval [l, r] with value v.
- •With random data, the number of intervals stays small on average, making total time roughly O((n + q) log(n + q)).
- •In worst-case adversarial inputs, ODT can degrade to O(n) per operation, so it is a heuristic, not a guaranteed log-time DS.
- •ODT shines for problems like Codeforces 896C that need range assign/add, k-th statistic, and sum of powers on random tests.
- •Implementation uses set ordered by interval start; values are marked mutable to allow in-place changes without breaking ordering.
- •Always be consistent about inclusive [l, r] or half-open [l, r) intervals to avoid off-by-one bugs and iterator invalidation.
Prerequisites
- →Balanced binary search trees (std::set) — ODT relies on ordered sets for interval storage and logarithmic-time insert/erase/search.
- →Interval arithmetic and boundary conventions — Correct splitting and assignment require careful handling of inclusive vs. half-open intervals.
- →Amortized and expected complexity — Understanding why ODT is fast on random data but not guaranteed in worst-case needs these analyses.
- →Sorting and order statistics — k-th queries sort intervals by value and consume cumulative lengths.
- →Modular arithmetic and fast exponentiation — Sum of powers queries require computing v^x mod p efficiently.
- →Run-length encoding (RLE) — ODT generalizes dynamic RLE; understanding runs helps grasp interval merging and updates.
Detailed Explanation
Tap terms for definitions01Overview
The Chtholly Tree, better known as ODT (Old Driver Tree), is a practical data structure for range updates on arrays when the data behaves randomly. Instead of storing each element, ODT compresses the array into a set of disjoint intervals [l, r] where every position i in that range shares the same value. These intervals are kept in an ordered set keyed by their left endpoints. The core primitive is split(x), which finds the interval containing position x and splits it into two intervals at x, enabling precise range operations. A range assignment assign(l, r, v) is then implemented by splitting at l and r+1, erasing all the fully covered intervals, and inserting a single interval [l, r] with value v. Many additional queries—like range add, k-th smallest by value, and sum of powers modulo—can be implemented by iterating over the O(k) intervals overlapping [l, r].
Why is ODT used? On random inputs, the number of intervals remains moderate and assignments often coalesce large blocks, keeping the total number of nodes small. Therefore, the expected running time over many operations is competitive, even though worst-case bounds are weak. ODT is particularly popular in competitive programming (e.g., Codeforces 896C) where test data is randomized, allowing its expected performance to shine. Despite the name, ODT is not a tree data structure internally; it relies on C++ std::set (a balanced BST) and careful interval management.
02Intuition & Analogies
Imagine a long fence painted in different colors. Instead of remembering the color of every board, you write down stretches like “boards 1–10 are blue,” “11–20 are red,” and so on. If someone asks you to repaint boards 5–15 green, you cut the note at the endpoints 5 and 16: you now have [1–4 blue], [5–15 green], [16–20 red]. You replace the middle stretch entirely instead of repainting board by board. This is exactly how ODT works—store long same-value stretches and update by cutting and replacing.
The split(x) operation is the “scissors.” It finds which stretch (interval) contains x and cuts it so that x becomes the start of a new interval. After splitting at l and r+1, the repainting (assignment) is just removing all pieces in between and dropping in one new piece with the new color.
On random repaint requests, you usually end up with relatively few pieces because new stretches often absorb or replace old ones, and adjacent stretches with the same color can be merged. Think of it like painting big patches rather than specks—your notebook describing the fence stays short. However, if someone adversarially repaints tiny alternating segments, your notebook explodes in length—ODT will slow down. That’s why ODT is excellent under randomness, but risky against worst-case designed inputs.
03Formal Definition
04When to Use
Use ODT when your problem includes many range assignments (or adds) and the data is random or effectively random due to problem generation. Typical tasks include: replacing all values in [l, r] by v; increasing all values in [l, r] by delta; computing statistics over [l, r] like the k-th smallest by value; or computing functions like sum of v^x over [l, r] modulo p. Codeforces 896C is the canonical showcase: a mix of range add/assign, k-th queries, and sum-of-powers under randomized tests.
ODT is great when: (1) the number of distinct value blocks remains small; (2) you need flexible per-value processing (e.g., sorting by value of intervals); (3) you can tolerate heuristic performance without hard worst-case guarantees.
Avoid ODT when: (1) you need strict worst-case O(log n) per operation; (2) assignments are rare but queries are numerous and adversarial; (3) your updates are not well modeled by rewriting values (e.g., range min set with order-dependent semantics). In those cases, consider segment trees or Fenwick trees, possibly with lazy propagation, which offer guaranteed logarithmic performance for supported operations.
⚠️Common Mistakes
- Interval boundary confusion: Mixing [l, r] and [l, r) leads to off-by-one errors. Decide on inclusive or half-open intervals and stick to it everywhere, especially in split and assign.
- Forgetting to split at r+1: For inclusive [l, r], you must split at r+1 to isolate the right boundary. Splitting at r instead will produce incorrect intervals and overlaps.
- Modifying keys inside std::set: Only mark the value field as mutable and never change l or r after insertion. Changing ordered keys breaks the set’s invariants and yields undefined behavior.
- Iterator invalidation: Erasing while iterating must be done carefully (use post-increment idioms or collect iterators first). After split, the original iterator is invalid; use the returned iterators.
- Not merging neighbors: After assign, failing to merge with neighbors of the same value can cause an unnecessary explosion in the interval count, hurting performance.
- Using ODT on adversarial data: Expecting guaranteed fast performance is a mistake. Worst-case sequences can force O(n) per query. Know the testing environment.
- Overflow and types: Range adds and sums can exceed 32-bit. Use long long (int64) and handle modular arithmetic correctly.
- Over-splitting on read-only queries: Each query introduces at most two new boundaries; over many queries this grows S. Keep queries minimal, and consider merging when possible to limit S.
Key Formulas
Split Cost
Explanation: Finding and cutting the interval containing x requires a tree search and up to two inserts, each logarithmic in the number of intervals S.
Assign Cost
Explanation: After two splits, we erase k intervals fully inside [l, r] and insert one; each erase/insert costs O(log S). k is the number of covered intervals.
Interval Growth Bound
Explanation: Each operation introduces at most two new boundaries from split(l) and split(r+1). Thus the total number of intervals S grows linearly with m in the worst case of read-only queries.
K-th Query Cost
Explanation: After isolating [l, r] with splits, we collect k intervals and sort them by value to find the k-th with multiplicities by lengths.
Sum of Powers Cost
Explanation: We iterate over k intervals and compute mod p with fast exponentiation per interval, which takes O(log x) time.
Expected Total Time
Explanation: Under random inputs, the number of intervals remains moderate (roughly linear in operations), and each operation costs logarithmic time in the number of intervals.
Length Conservation
Explanation: The lengths of disjoint intervals covering [l, r] sum to the total range length. This is used in statistics and sums.
K-th Definition
Explanation: We view the multiset of values in [l, r] with frequencies equal to interval lengths and find the value whose cumulative frequency position equals k.
Sum of Powers
Explanation: Compute the sum over k intervals that cover [l, r], weighting each interval’s contribution by its length and taking modulo p.
Triangular Number (for testing)
Explanation: Sometimes used to sanity-check range sums after assignments or adds.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct ODT { 5 struct Node { 6 int l, r; // inclusive interval [l, r] 7 mutable long long v; // value; mutable so we can modify in set 8 bool operator<(Node const& other) const { return l < other.l; } 9 }; 10 11 int n; // domain is [1..n] 12 set<Node> tr; // ordered by l 13 14 ODT(int n = 0) : n(n) {} 15 16 // Build from 1-indexed array a[1..n] 17 void build(const vector<long long>& a) { 18 n = (int)a.size() - 1; // expect a.size() == n+1 with dummy a[0] 19 tr.clear(); 20 int i = 1; 21 while (i <= n) { 22 int j = i; 23 while (j + 1 <= n && a[j + 1] == a[i]) j++; 24 tr.insert({i, j, a[i]}); 25 i = j + 1; 26 } 27 } 28 29 // Find and split at position x; return iterator to interval starting at x 30 // If x > n, returns tr.end(). If x equals an interval start, returns it directly. 31 auto split(int x) { 32 if (x > n) return tr.end(); 33 auto it = prev(tr.upper_bound({x, 0, 0})); 34 if (it->l == x) return it; 35 int L = it->l, R = it->r; long long V = it->v; 36 tr.erase(it); 37 tr.insert({L, x - 1, V}); 38 return tr.insert({x, R, V}).first; 39 } 40 41 // Assign A[l..r] = v 42 void assign(int l, int r, long long v) { 43 auto itr = split(r + 1), itl = split(l); 44 // erase all intervals in [l, r] 45 for (auto it = itl; it != itr; ) it = tr.erase(it); 46 // insert new interval 47 auto it = tr.insert({l, r, v}).first; 48 // optional: merge with left neighbor 49 if (it != tr.begin()) { 50 auto L = prev(it); 51 if (L->v == it->v) { 52 int nl = L->l; tr.erase(L); it = tr.erase(it); 53 it = tr.insert({nl, r, v}).first; 54 } 55 } 56 // optional: merge with right neighbor 57 auto R = next(it); 58 if (R != tr.end() && R->v == it->v) { 59 int nr = R->r; tr.erase(R); int nl = it->l; tr.erase(it); 60 tr.insert({nl, nr, v}); 61 } 62 } 63 64 // Range sum on [l, r] 65 long long range_sum(int l, int r) { 66 long long ans = 0; 67 auto itr = split(r + 1), itl = split(l); 68 for (auto it = itl; it != itr; ++it) { 69 long long len = it->r - it->l + 1; 70 ans += len * it->v; 71 } 72 return ans; 73 } 74 }; 75 76 int main() { 77 // Example usage 78 // Array: index 1..7: [1,1,2,2,2,3,3] 79 vector<long long> a = {0, 1,1,2,2,2,3,3}; // 1-indexed with dummy 0 80 ODT odt; odt.build(a); 81 82 cout << "Initial sum[1,7] = " << odt.range_sum(1,7) << "\n"; // 1+1+2+2+2+3+3=14 83 84 odt.assign(2, 5, 10); // set indices 2..5 to 10 85 cout << "After assign(2,5,10), sum[1,7] = " << odt.range_sum(1,7) << "\n"; // 1+10+10+10+10+3+3=47 86 87 odt.assign(6, 7, 10); // merge right 88 cout << "After assign(6,7,10), sum[1,7] = " << odt.range_sum(1,7) << "\n"; // now all 1..7 except 1 may be 10 89 90 return 0; 91 } 92
This minimal ODT stores the array as disjoint intervals in a std::set. The split operation isolates a position to make exact range operations possible. assign(l, r, v) replaces all intervals fully within [l, r] by one interval and optionally merges with neighbors that have the same value to control the number of intervals. range_sum splits to isolate [l, r], then accumulates length × value over the overlapping intervals.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct ODT { 5 struct Node { 6 int l, r; mutable long long v; 7 bool operator<(Node const& o) const { return l < o.l; } 8 }; 9 int n; set<Node> tr; 10 ODT(int n=0): n(n) {} 11 12 void build(const vector<long long>& a) { // a is 1-indexed 13 n = (int)a.size() - 1; tr.clear(); 14 for (int i=1, j; i<=n; i=j+1) { 15 j = i; while (j+1<=n && a[j+1]==a[i]) ++j; 16 tr.insert({i,j,a[i]}); 17 } 18 } 19 20 auto split(int x) { 21 if (x > n) return tr.end(); 22 auto it = prev(tr.upper_bound({x,0,0})); 23 if (it->l == x) return it; 24 int L=it->l, R=it->r; long long V=it->v; tr.erase(it); 25 tr.insert({L, x-1, V}); 26 return tr.insert({x, R, V}).first; 27 } 28 29 // add delta to A[l..r] 30 void add(int l, int r, long long delta) { 31 auto itr = split(r+1), itl = split(l); 32 for (auto it = itl; it != itr; ++it) it->v += delta; // v is mutable 33 } 34 35 // assign A[l..r] = v (with neighbor merges) 36 void assign(int l, int r, long long v) { 37 auto itr = split(r+1), itl = split(l); 38 for (auto it = itl; it != itr; ) it = tr.erase(it); 39 auto it = tr.insert({l,r,v}).first; 40 if (it != tr.begin()) { 41 auto L = prev(it); if (L->v == it->v) { int nl=L->l; tr.erase(L); int nr=it->r; tr.erase(it); it=tr.insert({nl,nr,v}).first; } 42 } 43 auto R = next(it); 44 if (R != tr.end() && R->v == it->v) { int nr=R->r; int nl=it->l; tr.erase(R); tr.erase(it); tr.insert({nl,nr,v}); } 45 } 46 47 // kth smallest value in [l..r] 48 long long kth(int l, int r, long long k) { 49 vector<pair<long long,long long>> bag; // (value, length) 50 auto itr = split(r+1), itl = split(l); 51 for (auto it = itl; it != itr; ++it) { 52 long long len = it->r - it->l + 1; 53 bag.push_back({it->v, len}); 54 } 55 sort(bag.begin(), bag.end(), [](auto &a, auto &b){ return a.first < b.first; }); 56 for (auto &p : bag) { 57 if (k <= p.second) return p.first; 58 k -= p.second; 59 } 60 return -1; // k out of range 61 } 62 63 // fast exponentiation: a^e mod m 64 static long long modpow(long long a, long long e, long long m) { 65 a %= m; long long r = 1 % m; 66 while (e > 0) { 67 if (e & 1) r = (__int128)r * a % m; 68 a = (__int128)a * a % m; 69 e >>= 1; 70 } 71 return r; 72 } 73 74 // sum over [l..r] of (value^x mod p) weighted by length, mod p 75 long long sumPow(int l, int r, long long x, long long p) { 76 auto itr = split(r+1), itl = split(l); 77 long long ans = 0; 78 for (auto it = itl; it != itr; ++it) { 79 long long len = it->r - it->l + 1; 80 long long term = modpow(it->v, x, p); 81 ans = (ans + (__int128)len * term) % p; 82 } 83 return ans; 84 } 85 }; 86 87 int main() { 88 // Toy demo for operations 89 vector<long long> a = {0, 5,5,1,1,1,4,4,4,4}; // n=9, 1-indexed 90 ODT odt; odt.build(a); 91 92 cout << odt.kth(1,9,4) << "\n"; // 4th smallest initially 93 odt.add(3,7,3); // add 3 to indices 3..7 94 cout << odt.sumPow(1,9,2,1000000007LL) << "\n"; // sum of squares mod 1e9+7 95 odt.assign(2,5,10); // set indices 2..5 to 10 96 cout << odt.kth(1,9,6) << "\n"; // 6th smallest after ops 97 98 return 0; 99 } 100
This ODT supports the classic CF-896C mix: range add, range assign, k-th by value, and sum of powers modulo. add modifies interval values in-place (v is mutable). kth collects the overlapping intervals and sorts by value, using interval lengths as multiplicities. sumPow uses fast modular exponentiation to compute each interval’s contribution. All operations begin by splitting at the query boundaries to isolate [l, r].