🗂️Data StructureAdvanced

Chtholly Tree (ODT - Old Driver Tree)

Key Points

  • Chtholly Tree (ODT) stores an array as a set of non-overlapping value-constant intervals and updates by cutting and replacing whole ranges.
  • Its key operation split(x) cuts the interval containing x into [l, x-1] and [x, r], enabling precise range editing.
  • Range assign(l, r, v) is done by splitting at l and r+1, erasing all fully covered intervals, and inserting one interval [l, r] with value v.
  • With random data, the number of intervals stays small on average, making total time roughly O((n + q) log(n + q)).
  • In worst-case adversarial inputs, ODT can degrade to O(n) per operation, so it is a heuristic, not a guaranteed log-time DS.
  • ODT shines for problems like Codeforces 896C that need range assign/add, k-th statistic, and sum of powers on random tests.
  • Implementation uses set ordered by interval start; values are marked mutable to allow in-place changes without breaking ordering.
  • Always be consistent about inclusive [l, r] or half-open [l, r) intervals to avoid off-by-one bugs and iterator invalidation.

Prerequisites

  • Balanced binary search trees (std::set)ODT relies on ordered sets for interval storage and logarithmic-time insert/erase/search.
  • Interval arithmetic and boundary conventionsCorrect splitting and assignment require careful handling of inclusive vs. half-open intervals.
  • Amortized and expected complexityUnderstanding why ODT is fast on random data but not guaranteed in worst-case needs these analyses.
  • Sorting and order statisticsk-th queries sort intervals by value and consume cumulative lengths.
  • Modular arithmetic and fast exponentiationSum of powers queries require computing v^x mod p efficiently.
  • Run-length encoding (RLE)ODT generalizes dynamic RLE; understanding runs helps grasp interval merging and updates.

Detailed Explanation

Tap terms for definitions

01Overview

The Chtholly Tree, better known as ODT (Old Driver Tree), is a practical data structure for range updates on arrays when the data behaves randomly. Instead of storing each element, ODT compresses the array into a set of disjoint intervals [l, r] where every position i in that range shares the same value. These intervals are kept in an ordered set keyed by their left endpoints. The core primitive is split(x), which finds the interval containing position x and splits it into two intervals at x, enabling precise range operations. A range assignment assign(l, r, v) is then implemented by splitting at l and r+1, erasing all the fully covered intervals, and inserting a single interval [l, r] with value v. Many additional queries—like range add, k-th smallest by value, and sum of powers modulo—can be implemented by iterating over the O(k) intervals overlapping [l, r].

Why is ODT used? On random inputs, the number of intervals remains moderate and assignments often coalesce large blocks, keeping the total number of nodes small. Therefore, the expected running time over many operations is competitive, even though worst-case bounds are weak. ODT is particularly popular in competitive programming (e.g., Codeforces 896C) where test data is randomized, allowing its expected performance to shine. Despite the name, ODT is not a tree data structure internally; it relies on C++ std::set (a balanced BST) and careful interval management.

02Intuition & Analogies

Imagine a long fence painted in different colors. Instead of remembering the color of every board, you write down stretches like “boards 1–10 are blue,” “11–20 are red,” and so on. If someone asks you to repaint boards 5–15 green, you cut the note at the endpoints 5 and 16: you now have [1–4 blue], [5–15 green], [16–20 red]. You replace the middle stretch entirely instead of repainting board by board. This is exactly how ODT works—store long same-value stretches and update by cutting and replacing.

The split(x) operation is the “scissors.” It finds which stretch (interval) contains x and cuts it so that x becomes the start of a new interval. After splitting at l and r+1, the repainting (assignment) is just removing all pieces in between and dropping in one new piece with the new color.

On random repaint requests, you usually end up with relatively few pieces because new stretches often absorb or replace old ones, and adjacent stretches with the same color can be merged. Think of it like painting big patches rather than specks—your notebook describing the fence stays short. However, if someone adversarially repaints tiny alternating segments, your notebook explodes in length—ODT will slow down. That’s why ODT is excellent under randomness, but risky against worst-case designed inputs.

03Formal Definition

We represent an array A[1..n] as a set S of disjoint intervals I = [l, r] with associated value v(I), such that for all i in [l, r], A[i] = v(I). Intervals in S are non-overlapping, cover [1, n], and are stored ordered by l. Define split(x): if x ∈ [l, r] for some I ∈ S and , replace I by two intervals I₁ = [l, x−1] with value v(I) and I₂ = [x, r] with value v(I). If , return the iterator to I; if or , return end(). Range assignment assign(l, r, v): perform ), ). Erase all intervals in the half-open iterator range [L, R) from S, and insert a new interval J = [l, r] with value v. Optionally, merge J with its immediate neighbors if they have the same value to reduce the number of intervals. Queries over a range [l, r] likewise begin with ), ), then iterate over intervals in [L, R). Complexity depends on k, the number of intervals fully contained in [l, r]. Operations are implemented atop a balanced BST (e.g., std::set), yielding O(log S) for searching/splitting, and O(k log S) for erasing/inserting across k intervals, where S is the current number of intervals.

04When to Use

Use ODT when your problem includes many range assignments (or adds) and the data is random or effectively random due to problem generation. Typical tasks include: replacing all values in [l, r] by v; increasing all values in [l, r] by delta; computing statistics over [l, r] like the k-th smallest by value; or computing functions like sum of v^x over [l, r] modulo p. Codeforces 896C is the canonical showcase: a mix of range add/assign, k-th queries, and sum-of-powers under randomized tests.

ODT is great when: (1) the number of distinct value blocks remains small; (2) you need flexible per-value processing (e.g., sorting by value of intervals); (3) you can tolerate heuristic performance without hard worst-case guarantees.

Avoid ODT when: (1) you need strict worst-case O(log n) per operation; (2) assignments are rare but queries are numerous and adversarial; (3) your updates are not well modeled by rewriting values (e.g., range min set with order-dependent semantics). In those cases, consider segment trees or Fenwick trees, possibly with lazy propagation, which offer guaranteed logarithmic performance for supported operations.

⚠️Common Mistakes

  • Interval boundary confusion: Mixing [l, r] and [l, r) leads to off-by-one errors. Decide on inclusive or half-open intervals and stick to it everywhere, especially in split and assign.
  • Forgetting to split at r+1: For inclusive [l, r], you must split at r+1 to isolate the right boundary. Splitting at r instead will produce incorrect intervals and overlaps.
  • Modifying keys inside std::set: Only mark the value field as mutable and never change l or r after insertion. Changing ordered keys breaks the set’s invariants and yields undefined behavior.
  • Iterator invalidation: Erasing while iterating must be done carefully (use post-increment idioms or collect iterators first). After split, the original iterator is invalid; use the returned iterators.
  • Not merging neighbors: After assign, failing to merge with neighbors of the same value can cause an unnecessary explosion in the interval count, hurting performance.
  • Using ODT on adversarial data: Expecting guaranteed fast performance is a mistake. Worst-case sequences can force O(n) per query. Know the testing environment.
  • Overflow and types: Range adds and sums can exceed 32-bit. Use long long (int64) and handle modular arithmetic correctly.
  • Over-splitting on read-only queries: Each query introduces at most two new boundaries; over many queries this grows S. Keep queries minimal, and consider merging when possible to limit S.

Key Formulas

Split Cost

Explanation: Finding and cutting the interval containing x requires a tree search and up to two inserts, each logarithmic in the number of intervals S.

Assign Cost

Explanation: After two splits, we erase k intervals fully inside [l, r] and insert one; each erase/insert costs O(log S). k is the number of covered intervals.

Interval Growth Bound

Explanation: Each operation introduces at most two new boundaries from split(l) and split(r+1). Thus the total number of intervals S grows linearly with m in the worst case of read-only queries.

K-th Query Cost

Explanation: After isolating [l, r] with splits, we collect k intervals and sort them by value to find the k-th with multiplicities by lengths.

Sum of Powers Cost

Explanation: We iterate over k intervals and compute mod p with fast exponentiation per interval, which takes O(log x) time.

Expected Total Time

Explanation: Under random inputs, the number of intervals remains moderate (roughly linear in operations), and each operation costs logarithmic time in the number of intervals.

Length Conservation

Explanation: The lengths of disjoint intervals covering [l, r] sum to the total range length. This is used in statistics and sums.

K-th Definition

Explanation: We view the multiset of values in [l, r] with frequencies equal to interval lengths and find the value whose cumulative frequency position equals k.

Sum of Powers

Explanation: Compute the sum over k intervals that cover [l, r], weighting each interval’s contribution by its length and taking modulo p.

Triangular Number (for testing)

Explanation: Sometimes used to sanity-check range sums after assignments or adds.

Complexity Analysis

Let S denote the current number of intervals. The set operations (search, insert, erase) cost O(log S) due to the balanced BST backing std::set. The split(x) operation performs a search to find the interval containing x, and, if x is strictly inside, erases one interval and inserts two. This is O(log S) for the search and O(log S) for each erase/insert, leading to O(log S) overall. A range assignment assign(l, r, v) performs two splits at l and r+1, then erases k intervals fully covered by [l, r], and finally inserts one interval. Hence the time is O((k+1) log S). The parameter k is data-dependent; in dense fragmentation k can be large, but in random settings it tends to be small because intervals are long. Queries like range add also cost O(k log S) if they alter k intervals’ values in-place (no erase/insert needed if the number of intervals does not change), while read-only queries (k-th, sum) cost O(k log S) if you need to sort by value (k-th) or O(k log S) dominated by splits and iteration (sum). For k-th specifically, collecting k intervals and sorting yields O(k log k) additional time. Space usage is O(S), storing one node per interval. Over m operations, each operation can introduce at most two new splits, so if you never erase intervals. In practice, assignments erase many intervals, often keeping S much smaller. Expected total time under randomized inputs is O((n + m) log(n + m)), but worst-case adversarial sequences can force Θ(n) intervals in a small region and give Θ(n) per-op costs, making ODT a heuristic rather than a guaranteed-logarithmic structure. Merging adjacent equal-value neighbors after assignments helps keep S small and improves constants.

Code Examples

Minimal Chtholly Tree (ODT) with range assign and range sum
1#include <bits/stdc++.h>
2using namespace std;
3
4struct ODT {
5 struct Node {
6 int l, r; // inclusive interval [l, r]
7 mutable long long v; // value; mutable so we can modify in set
8 bool operator<(Node const& other) const { return l < other.l; }
9 };
10
11 int n; // domain is [1..n]
12 set<Node> tr; // ordered by l
13
14 ODT(int n = 0) : n(n) {}
15
16 // Build from 1-indexed array a[1..n]
17 void build(const vector<long long>& a) {
18 n = (int)a.size() - 1; // expect a.size() == n+1 with dummy a[0]
19 tr.clear();
20 int i = 1;
21 while (i <= n) {
22 int j = i;
23 while (j + 1 <= n && a[j + 1] == a[i]) j++;
24 tr.insert({i, j, a[i]});
25 i = j + 1;
26 }
27 }
28
29 // Find and split at position x; return iterator to interval starting at x
30 // If x > n, returns tr.end(). If x equals an interval start, returns it directly.
31 auto split(int x) {
32 if (x > n) return tr.end();
33 auto it = prev(tr.upper_bound({x, 0, 0}));
34 if (it->l == x) return it;
35 int L = it->l, R = it->r; long long V = it->v;
36 tr.erase(it);
37 tr.insert({L, x - 1, V});
38 return tr.insert({x, R, V}).first;
39 }
40
41 // Assign A[l..r] = v
42 void assign(int l, int r, long long v) {
43 auto itr = split(r + 1), itl = split(l);
44 // erase all intervals in [l, r]
45 for (auto it = itl; it != itr; ) it = tr.erase(it);
46 // insert new interval
47 auto it = tr.insert({l, r, v}).first;
48 // optional: merge with left neighbor
49 if (it != tr.begin()) {
50 auto L = prev(it);
51 if (L->v == it->v) {
52 int nl = L->l; tr.erase(L); it = tr.erase(it);
53 it = tr.insert({nl, r, v}).first;
54 }
55 }
56 // optional: merge with right neighbor
57 auto R = next(it);
58 if (R != tr.end() && R->v == it->v) {
59 int nr = R->r; tr.erase(R); int nl = it->l; tr.erase(it);
60 tr.insert({nl, nr, v});
61 }
62 }
63
64 // Range sum on [l, r]
65 long long range_sum(int l, int r) {
66 long long ans = 0;
67 auto itr = split(r + 1), itl = split(l);
68 for (auto it = itl; it != itr; ++it) {
69 long long len = it->r - it->l + 1;
70 ans += len * it->v;
71 }
72 return ans;
73 }
74};
75
76int main() {
77 // Example usage
78 // Array: index 1..7: [1,1,2,2,2,3,3]
79 vector<long long> a = {0, 1,1,2,2,2,3,3}; // 1-indexed with dummy 0
80 ODT odt; odt.build(a);
81
82 cout << "Initial sum[1,7] = " << odt.range_sum(1,7) << "\n"; // 1+1+2+2+2+3+3=14
83
84 odt.assign(2, 5, 10); // set indices 2..5 to 10
85 cout << "After assign(2,5,10), sum[1,7] = " << odt.range_sum(1,7) << "\n"; // 1+10+10+10+10+3+3=47
86
87 odt.assign(6, 7, 10); // merge right
88 cout << "After assign(6,7,10), sum[1,7] = " << odt.range_sum(1,7) << "\n"; // now all 1..7 except 1 may be 10
89
90 return 0;
91}
92

This minimal ODT stores the array as disjoint intervals in a std::set. The split operation isolates a position to make exact range operations possible. assign(l, r, v) replaces all intervals fully within [l, r] by one interval and optionally merges with neighbors that have the same value to control the number of intervals. range_sum splits to isolate [l, r], then accumulates length × value over the overlapping intervals.

Time: assign: O((k+1) log S); range_sum: O(k log S), where k is the number of intervals overlapping [l, r] and S is the current number of intervals.Space: O(S), where S is the number of intervals.
ODT for CF-896C-style operations: add, assign, k-th smallest, sum of powers mod
1#include <bits/stdc++.h>
2using namespace std;
3
4struct ODT {
5 struct Node {
6 int l, r; mutable long long v;
7 bool operator<(Node const& o) const { return l < o.l; }
8 };
9 int n; set<Node> tr;
10 ODT(int n=0): n(n) {}
11
12 void build(const vector<long long>& a) { // a is 1-indexed
13 n = (int)a.size() - 1; tr.clear();
14 for (int i=1, j; i<=n; i=j+1) {
15 j = i; while (j+1<=n && a[j+1]==a[i]) ++j;
16 tr.insert({i,j,a[i]});
17 }
18 }
19
20 auto split(int x) {
21 if (x > n) return tr.end();
22 auto it = prev(tr.upper_bound({x,0,0}));
23 if (it->l == x) return it;
24 int L=it->l, R=it->r; long long V=it->v; tr.erase(it);
25 tr.insert({L, x-1, V});
26 return tr.insert({x, R, V}).first;
27 }
28
29 // add delta to A[l..r]
30 void add(int l, int r, long long delta) {
31 auto itr = split(r+1), itl = split(l);
32 for (auto it = itl; it != itr; ++it) it->v += delta; // v is mutable
33 }
34
35 // assign A[l..r] = v (with neighbor merges)
36 void assign(int l, int r, long long v) {
37 auto itr = split(r+1), itl = split(l);
38 for (auto it = itl; it != itr; ) it = tr.erase(it);
39 auto it = tr.insert({l,r,v}).first;
40 if (it != tr.begin()) {
41 auto L = prev(it); if (L->v == it->v) { int nl=L->l; tr.erase(L); int nr=it->r; tr.erase(it); it=tr.insert({nl,nr,v}).first; }
42 }
43 auto R = next(it);
44 if (R != tr.end() && R->v == it->v) { int nr=R->r; int nl=it->l; tr.erase(R); tr.erase(it); tr.insert({nl,nr,v}); }
45 }
46
47 // kth smallest value in [l..r]
48 long long kth(int l, int r, long long k) {
49 vector<pair<long long,long long>> bag; // (value, length)
50 auto itr = split(r+1), itl = split(l);
51 for (auto it = itl; it != itr; ++it) {
52 long long len = it->r - it->l + 1;
53 bag.push_back({it->v, len});
54 }
55 sort(bag.begin(), bag.end(), [](auto &a, auto &b){ return a.first < b.first; });
56 for (auto &p : bag) {
57 if (k <= p.second) return p.first;
58 k -= p.second;
59 }
60 return -1; // k out of range
61 }
62
63 // fast exponentiation: a^e mod m
64 static long long modpow(long long a, long long e, long long m) {
65 a %= m; long long r = 1 % m;
66 while (e > 0) {
67 if (e & 1) r = (__int128)r * a % m;
68 a = (__int128)a * a % m;
69 e >>= 1;
70 }
71 return r;
72 }
73
74 // sum over [l..r] of (value^x mod p) weighted by length, mod p
75 long long sumPow(int l, int r, long long x, long long p) {
76 auto itr = split(r+1), itl = split(l);
77 long long ans = 0;
78 for (auto it = itl; it != itr; ++it) {
79 long long len = it->r - it->l + 1;
80 long long term = modpow(it->v, x, p);
81 ans = (ans + (__int128)len * term) % p;
82 }
83 return ans;
84 }
85};
86
87int main() {
88 // Toy demo for operations
89 vector<long long> a = {0, 5,5,1,1,1,4,4,4,4}; // n=9, 1-indexed
90 ODT odt; odt.build(a);
91
92 cout << odt.kth(1,9,4) << "\n"; // 4th smallest initially
93 odt.add(3,7,3); // add 3 to indices 3..7
94 cout << odt.sumPow(1,9,2,1000000007LL) << "\n"; // sum of squares mod 1e9+7
95 odt.assign(2,5,10); // set indices 2..5 to 10
96 cout << odt.kth(1,9,6) << "\n"; // 6th smallest after ops
97
98 return 0;
99}
100

This ODT supports the classic CF-896C mix: range add, range assign, k-th by value, and sum of powers modulo. add modifies interval values in-place (v is mutable). kth collects the overlapping intervals and sorts by value, using interval lengths as multiplicities. sumPow uses fast modular exponentiation to compute each interval’s contribution. All operations begin by splitting at the query boundaries to isolate [l, r].

Time: add: O(k log S) due to two splits plus iterating k intervals; assign: O((k+1) log S); kth: O(k log S + k log k); sumPow: O(k log S + k log x).Space: O(S) intervals plus O(k) temporary storage in kth.