Sqrt Decomposition on Queries
Key Points
- •Sqrt decomposition on queries (time blocking) processes Q operations in blocks of size about \(\) to balance per-query overhead and rebuild cost.
- •Within each block, only a few updates occur, so we handle their effect on each query by scanning at most \(O()\) recent updates.
- •Between blocks, we apply all updates from the previous block to a fast base data structure so its answers reflect all 'external' updates.
- •This technique is powerful when neither fully online (updating a heavy structure per query) nor fully offline (reordering across time) is feasible.
- •Typical total complexity is \(O(Q () + Q ))\), giving \(O(Q)\) when base costs are near-constant.
- •It excels for additive queries (like sums) where in-block updates contribute an easily computed delta to each answer.
- •Care must be taken to combine multiple updates to the same index within a block and to clear in-block buffers between blocks.
- •It differs from Mo’s algorithm: here we preserve chronological order and only group by time; we do not reorder queries by their ranges.
Prerequisites
- →Fenwick Tree (Binary Indexed Tree) — A standard base data structure for sums and range updates; knowing its O(log n) operations helps implement the external state efficiently.
- →Segment Tree basics — Understanding segment trees helps when base queries require mins/maxes or complex aggregates.
- →Amortized Analysis — Needed to reason why committing updates once per block yields O(Q) total commit cost.
- →Binary Search on sorted vectors — Useful for base queries like counting occurrences in a range using position lists.
- →Big-O notation — Essential for selecting and justifying the block size B and overall complexity.
- →Set/map and hashing basics — Helpful for deduplicating multiple in-block updates to the same index.
- →Range intersection arithmetic — Computing overlap lengths enables O(1) in-block adjustments for range-add updates.
Detailed Explanation
Tap terms for definitions01Overview
Sqrt decomposition on queries (also called time blocking or offline-by-blocks) is a batching technique for mixed sequences of updates and queries. Given Q operations in chronological order, we split them into contiguous blocks of size B ≈ √Q. We maintain a base data structure (Fenwick tree, segment tree, or other) that reflects all updates from completed blocks. While answering queries within the current block, we do not mutate the base structure for every small update; instead, we keep a small list of in-block updates and account for their effects on each query on the fly. After finishing the block, we commit those updates to the base structure, clear the temporary list, and move to the next block.
The method is effective when applying every update immediately makes queries too slow (heavy per-op cost), but deferring all updates until the very end is impossible because queries must be answered in chronological order. By keeping blocks small (≈√Q), each query only pays a small extra cost scanning at most B in-block updates, while the base structure remains fast and stable. This gives a predictable trade-off that often results in overall O(Q√Q) time with low constant factors and simple implementation.
02Intuition & Analogies
Imagine running a help desk that receives a long stream of change requests (updates) and information requests (queries). Handling each change immediately for every query is like reprinting the whole manual each time—expensive. On the other hand, you can’t ignore changes until the very end because people need correct answers as they arrive.
So you adopt a routine: you work in day-sized batches. At the start of a day, your manual already includes all changes from past days. Throughout the day, you jot down today’s few changes on sticky notes. When someone asks a question, you first consult the manual (fast) and then glance through the day’s sticky notes (few) to adjust the answer. At day’s end, you permanently merge the sticky notes into the manual. The next day starts fresh with an updated manual and an empty sticky-note stack.
Here, the manual is your base data structure after applying all earlier blocks’ updates. The sticky notes are the in-block updates. Because there are only about √Q queries per block, there are at most √Q sticky notes to skim for any single answer—manageable. This way, your answers are chronologically correct (you don’t reorder requests), and you avoid the cost of constantly rewriting the manual (mutating the heavy structure per tiny update). Choosing the ‘day length’ as √Q balances the time spent skimming sticky notes against the frequency of merging them, providing an efficient middle ground between fully online and fully offline processing.
03Formal Definition
04When to Use
Use sqrt decomposition on queries when:
- Updates and queries are interleaved and must be answered online in chronological order, but updating the full structure for every tiny change is too slow.
- In-block updates can be handled cheaply per query: each update’s impact on a query is computable in O(1) or O(\log n) (e.g., point updates for range sums; range add for range sums via overlap length; value changes for frequency counts via inclusion/exclusion).
- A robust base structure exists for the operation set (Fenwick tree, segment tree, order statistics by maps/vectors, etc.) that you can incrementally update between blocks.
- Purely offline tricks (like sorting by ranges as in Mo’s algorithm or CDQ divide-and-conquer) either don’t apply or would complicate the solution due to constraints (e.g., queries depend on all prior updates).
Typical competitive programming settings include:
- Point updates with range queries (sum/min/max), where base queries are O(\log n) and in-block corrections are O(B).
- Range add with range sum, where an update’s effect on a query is proportional to overlap length.
- Frequency or membership queries (e.g., count of a value in [L, R]) with point reassignment; maintain sorted position lists for base answers and adjust by the small set of in-block changes.
- Problems where you want deterministic performance with simple code, avoiding rollback or persistence.
⚠️Common Mistakes
- Double-counting in-block updates: If a position is updated multiple times within a block, only the last update before the current query should affect the query’s answer. Use a small hash map to track the latest in-block value per index when scanning.
- Forgetting to clear per-block buffers: Always clear the list of in-block updates and any auxiliary maps when moving to the next block, or your adjustments will leak across blocks.
- Applying updates to the base structure too early: In-block updates must not be committed to the base structure until the block ends; otherwise, you will apply their effect twice (once via base, once via scan).
- Incorrect boundaries or inclusivity: Off-by-one errors on intervals [L, R] are common. Standardize on 1-indexing or 0-indexing and stick to it everywhere, including BIT/segment tree code.
- Poor block size choice: While (B = \lceil\sqrt{Q}\rceil) is a safe default, very skewed workloads (almost all queries or almost all updates) might benefit from tuning (e.g., c·√Q). Measure constants.
- Heavy per-update adjustment: If computing an in-block update’s contribution to a query takes more than O(1)–O(\log n), the method can degrade. Seek additive formulas (overlap length, linearity) or precompute small helpers per block.
- Not maintaining the external array state: For some problems, you need the current base value at a position when scanning updates. Keep a base array synchronized with committed updates to avoid reading stale values.
Key Formulas
Default Block Size
Explanation: Choosing B around the square root of the number of operations balances the per-query scan of in-block updates and the number of block transitions. It is a robust default in practice.
Baseline Time with No Rebuild
Explanation: Total time equals the cost of answering all queries on the base structure plus scanning up to B in-block updates per query. If both costs are small constants, this is O(QB).
Time with Per-Block Rebuild
Explanation: If each block requires rebuilding a helper in R(n) time, total time has a rebuild term Q/B · R(n) plus the per-query in-block adjustment term. This captures many advanced uses.
Optimal B with Rebuild
Explanation: Minimizing T(Q,B) when rebuild cost R(n) and per-update adjustment cost dominate yields an optimal block size proportional to the square root of their ratio. Plugging B* back gives the best asymptotic balance.
Typical Overall Complexity
Explanation: With constant base and adjustment costs and the standard B = √Q choice, the total time is O(Q√Q). This is a common and acceptable bound in practice.
Overlap Length
Explanation: The size of the intersection of two inclusive ranges. It enables O(1) adjustments for range-add updates to range-sum queries.
Summation Identity
Explanation: A reminder that many additive adjustments reduce to simple arithmetic series. It’s helpful when reasoning about cumulative effects of updates.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Fenwick Tree (Binary Indexed Tree) for prefix sums (1-indexed) 5 struct Fenwick { 6 int n; vector<long long> bit; 7 Fenwick(int n=0): n(n), bit(n+1, 0) {} 8 void init(int n_) { n = n_; bit.assign(n+1, 0); } 9 // add value v at position idx 10 void add(int idx, long long v) { 11 for (; idx <= n; idx += idx & -idx) bit[idx] += v; 12 } 13 // prefix sum [1..idx] 14 long long sumPrefix(int idx) const { 15 long long res = 0; 16 for (; idx > 0; idx -= idx & -idx) res += bit[idx]; 17 return res; 18 } 19 // range sum [l..r] 20 long long sumRange(int l, int r) const { 21 if (l > r) return 0; 22 return sumPrefix(r) - sumPrefix(l-1); 23 } 24 }; 25 26 struct Update { int idx; long long delta; }; 27 28 int main() { 29 ios::sync_with_stdio(false); 30 cin.tie(nullptr); 31 32 // Input format: 33 // n q 34 // a1 a2 ... an (initial array) 35 // Then q lines, each is either: 36 // 1 i delta --> point update: a[i] += delta 37 // 2 l r --> query: sum of a[l..r] 38 39 int n, q; if (!(cin >> n >> q)) return 0; 40 vector<long long> a(n+1); 41 for (int i = 1; i <= n; ++i) cin >> a[i]; 42 43 Fenwick fw(n); 44 for (int i = 1; i <= n; ++i) fw.add(i, a[i]); // build base with initial array 45 46 int B = max(1, (int)floor(sqrt((long double)q))); // block size ~ sqrt(q) 47 48 vector<Update> inBlock; inBlock.reserve(B+5); 49 50 auto commit_block = [&]() { 51 // Apply in-block updates to Fenwick and to base array 52 for (auto &u : inBlock) { 53 fw.add(u.idx, u.delta); 54 a[u.idx] += u.delta; // keep base array consistent (useful if needed elsewhere) 55 } 56 inBlock.clear(); 57 }; 58 59 for (int t = 0; t < q; ++t) { 60 int type; cin >> type; 61 if (type == 1) { 62 int i; long long delta; cin >> i >> delta; 63 // Record update in current block only; don't touch Fenwick yet 64 inBlock.push_back({i, delta}); 65 } else if (type == 2) { 66 int l, r; cin >> l >> r; 67 // Base answer from Fenwick (external updates only) 68 long long ans = fw.sumRange(l, r); 69 // Adjust by scanning in-block point updates 70 for (const auto &u : inBlock) { 71 if (l <= u.idx && u.idx <= r) ans += u.delta; 72 } 73 cout << ans << '\n'; 74 } else { 75 cerr << "Unknown type" << endl; return 0; 76 } 77 78 // End of block: commit accumulated updates 79 if ((t+1) % B == 0) commit_block(); 80 } 81 // Commit remaining updates after the last partial block 82 commit_block(); 83 84 return 0; 85 } 86
We maintain a Fenwick tree as the base structure that already includes all updates from past blocks. Inside the current block, point updates are buffered in inBlock. A range-sum query is answered by combining the Fenwick base sum with the contributions of at most B in-block updates that fall within [l, r]. At the end of each block, we apply the buffered updates to the Fenwick tree in one batch and clear the buffer.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Two-Fenwick technique for range add and range sum queries (1-indexed) 5 struct Fenwick { 6 int n; vector<long long> bit; 7 Fenwick(int n=0): n(n), bit(n+1, 0) {} 8 void init(int n_) { n = n_; bit.assign(n+1, 0); } 9 void add(int idx, long long v) { for (; idx <= n; idx += idx & -idx) bit[idx] += v; } 10 long long sumPrefix(int idx) const { long long r=0; for (; idx>0; idx-=idx&-idx) r+=bit[idx]; return r; } 11 }; 12 13 struct RangeFenwick { 14 int n; Fenwick B1, B2; 15 RangeFenwick(int n=0): n(n), B1(n), B2(n) {} 16 void init(int n_) { n = n_; B1.init(n); B2.init(n); } 17 // Add v to [l, r] 18 void range_add(int l, int r, long long v) { 19 if (l>r) return; 20 B1.add(l, v); B1.add(r+1, -v); 21 B2.add(l, v*(l-1)); B2.add(r+1, -v*r); 22 } 23 long long prefix_sum(int x) const { return B1.sumPrefix(x)*x - B2.sumPrefix(x); } 24 long long range_sum(int l, int r) const { if (l>r) return 0; return prefix_sum(r) - prefix_sum(l-1); } 25 }; 26 27 struct Update { int l, r; long long add; }; 28 29 int main(){ 30 ios::sync_with_stdio(false); 31 cin.tie(nullptr); 32 33 // Input format: 34 // n q 35 // a1 a2 ... an (initial array) 36 // q lines: 37 // 1 l r v --> add v to [l, r] 38 // 2 l r --> query sum on [l, r] 39 40 int n, q; if (!(cin >> n >> q)) return 0; 41 vector<long long> a(n+1); 42 for (int i = 1; i <= n; ++i) cin >> a[i]; 43 44 RangeFenwick rf(n); 45 // Build: add initial a[i] as point additions 46 for (int i = 1; i <= n; ++i) rf.range_add(i, i, a[i]); 47 48 int B = max(1, (int)floor(sqrt((long double)q))); 49 vector<Update> inBlock; inBlock.reserve(B+5); 50 51 auto commit_block = [&]() { 52 for (auto &u : inBlock) rf.range_add(u.l, u.r, u.add); 53 inBlock.clear(); 54 }; 55 56 auto overlap_len = [](int l1, int r1, int l2, int r2) -> long long { 57 int L = max(l1, l2), R = min(r1, r2); 58 return (L > R) ? 0LL : (long long)(R - L + 1); 59 }; 60 61 for (int t = 0; t < q; ++t) { 62 int type; cin >> type; 63 if (type == 1) { 64 int l, r; long long v; cin >> l >> r >> v; 65 // Buffer in-block range add 66 inBlock.push_back({l, r, v}); 67 } else if (type == 2) { 68 int l, r; cin >> l >> r; 69 long long ans = rf.range_sum(l, r); // base answer 70 // Add contributions from in-block range adds via overlap length 71 for (const auto &u : inBlock) { 72 long long len = overlap_len(l, r, u.l, u.r); 73 if (len) ans += len * u.add; 74 } 75 cout << ans << '\n'; 76 } else { 77 cerr << "Unknown type" << '\n'; return 0; 78 } 79 if ((t+1) % B == 0) commit_block(); 80 } 81 commit_block(); 82 return 0; 83 } 84
We support range add and range sum using two Fenwick trees as the base structure, which reflects all updates from completed blocks. Within a block, range updates are buffered; a query’s correction is the update value multiplied by the overlap length between the update range and the query range. This uses linearity to make in-block adjustments O(1) each.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // We maintain for each value a sorted vector of positions where it occurs (external state only). 5 // Queries: count of value x in [l, r]. Updates: a[i] := newVal. 6 // In-block, we buffer updates and adjust per query by scanning at most B updates. 7 8 int main(){ 9 ios::sync_with_stdio(false); 10 cin.tie(nullptr); 11 12 // Input format: 13 // n q 14 // a1 a2 ... an (initial values) 15 // q lines: 16 // 1 i v --> assign a[i] = v 17 // 2 l r x --> count of value x in [l, r] 18 19 int n, q; if (!(cin >> n >> q)) return 0; 20 vector<int> a(n+1); 21 for (int i = 1; i <= n; ++i) cin >> a[i]; 22 23 // Coordinate compression is optional if values are large; here we store raw values in an ordered map. 24 unordered_map<int, vector<int>> pos; pos.reserve(n*2); 25 pos.max_load_factor(0.7); 26 // Build position lists for external state 27 { 28 // We’ll collect then sort once per value 29 unordered_map<int, vector<int>> buckets; buckets.reserve(n*2); buckets.max_load_factor(0.7); 30 for (int i = 1; i <= n; ++i) buckets[a[i]].push_back(i); 31 for (auto &kv : buckets) { 32 auto &vec = kv.second; sort(vec.begin(), vec.end()); 33 pos.emplace(kv.first, move(vec)); 34 } 35 } 36 37 struct Upd { int idx; int oldv; int newv; }; 38 int B = max(1, (int)floor(sqrt((long double)q))); 39 vector<Upd> inBlock; inBlock.reserve(B+5); 40 41 auto commit_block = [&]() { 42 // Apply buffered updates to external state: update a[], and move index in pos maps 43 for (auto &u : inBlock) { 44 if (u.oldv == u.newv) continue; 45 // Remove idx from pos[oldv] 46 auto &vecOld = pos[u.oldv]; 47 auto itOld = lower_bound(vecOld.begin(), vecOld.end(), u.idx); 48 if (itOld != vecOld.end() && *itOld == u.idx) vecOld.erase(itOld); 49 // Insert idx into pos[newv] 50 auto &vecNew = pos[u.newv]; 51 auto itNew = lower_bound(vecNew.begin(), vecNew.end(), u.idx); 52 vecNew.insert(itNew, u.idx); 53 a[u.idx] = u.newv; // base array reflects committed update 54 } 55 inBlock.clear(); 56 }; 57 58 auto base_count_in_range = [&](int x, int l, int r) -> int { 59 auto it = pos.find(x); 60 if (it == pos.end()) return 0; 61 const auto &v = it->second; 62 auto L = lower_bound(v.begin(), v.end(), l); 63 auto R = upper_bound(v.begin(), v.end(), r); 64 return (int)(R - L); 65 }; 66 67 for (int t = 0; t < q; ++t) { 68 int type; cin >> type; 69 if (type == 1) { 70 int i, v; cin >> i >> v; 71 // Buffer update with old value at block start (from base array a) 72 inBlock.push_back({i, a[i], v}); 73 } else if (type == 2) { 74 int l, r, x; cin >> l >> r >> x; 75 int ans = base_count_in_range(x, l, r); 76 77 // Adjust by scanning in-block updates that occurred so far. 78 // Use a small map to ensure we only consider the last update per index within the block up to now. 79 unordered_map<int,int> lastNew; lastNew.reserve(inBlock.size()*2); lastNew.max_load_factor(0.7); 80 for (const auto &u : inBlock) lastNew[u.idx] = u.newv; // record latest new value for each updated index 81 82 // Now adjust count relative to base array a[] (external state only) 83 for (const auto &kv : lastNew) { 84 int idx = kv.first; int curv = kv.second; // current value after in-block updates 85 if (idx < l || idx > r) continue; // outside query range 86 int basev = a[idx]; // value at block start (after external updates) 87 // If base has x but current does not, we overcounted by 1; if base does not have x but current does, we undercounted by 1. 88 if (basev == x && curv != x) ans -= 1; 89 else if (basev != x && curv == x) ans += 1; 90 } 91 cout << ans << '\n'; 92 } else { 93 cerr << "Unknown type" << '\n'; return 0; 94 } 95 96 if ((t+1) % B == 0) commit_block(); 97 } 98 commit_block(); 99 return 0; 100 } 101
The base state stores, for each value x, a sorted vector of positions where x occurs; thus base queries are answered with two binary searches. Within a block, we buffer point assignments and, for each query, adjust the base count by scanning the distinct updated indices that lie in [L, R]. We compare their base value (from the external a[]) to their current in-block value (the latest buffered one) to add or subtract one accordingly. At block end, we commit updates by removing and inserting positions in the appropriate value lists.