⚙️AlgorithmAdvanced

Sqrt Decomposition on Queries

Key Points

  • Sqrt decomposition on queries (time blocking) processes Q operations in blocks of size about \(\) to balance per-query overhead and rebuild cost.
  • Within each block, only a few updates occur, so we handle their effect on each query by scanning at most \(O()\) recent updates.
  • Between blocks, we apply all updates from the previous block to a fast base data structure so its answers reflect all 'external' updates.
  • This technique is powerful when neither fully online (updating a heavy structure per query) nor fully offline (reordering across time) is feasible.
  • Typical total complexity is \(O(Q () + Q ))\), giving \(O(Q)\) when base costs are near-constant.
  • It excels for additive queries (like sums) where in-block updates contribute an easily computed delta to each answer.
  • Care must be taken to combine multiple updates to the same index within a block and to clear in-block buffers between blocks.
  • It differs from Mo’s algorithm: here we preserve chronological order and only group by time; we do not reorder queries by their ranges.

Prerequisites

  • Fenwick Tree (Binary Indexed Tree)A standard base data structure for sums and range updates; knowing its O(log n) operations helps implement the external state efficiently.
  • Segment Tree basicsUnderstanding segment trees helps when base queries require mins/maxes or complex aggregates.
  • Amortized AnalysisNeeded to reason why committing updates once per block yields O(Q) total commit cost.
  • Binary Search on sorted vectorsUseful for base queries like counting occurrences in a range using position lists.
  • Big-O notationEssential for selecting and justifying the block size B and overall complexity.
  • Set/map and hashing basicsHelpful for deduplicating multiple in-block updates to the same index.
  • Range intersection arithmeticComputing overlap lengths enables O(1) in-block adjustments for range-add updates.

Detailed Explanation

Tap terms for definitions

01Overview

Sqrt decomposition on queries (also called time blocking or offline-by-blocks) is a batching technique for mixed sequences of updates and queries. Given Q operations in chronological order, we split them into contiguous blocks of size B ≈ √Q. We maintain a base data structure (Fenwick tree, segment tree, or other) that reflects all updates from completed blocks. While answering queries within the current block, we do not mutate the base structure for every small update; instead, we keep a small list of in-block updates and account for their effects on each query on the fly. After finishing the block, we commit those updates to the base structure, clear the temporary list, and move to the next block.

The method is effective when applying every update immediately makes queries too slow (heavy per-op cost), but deferring all updates until the very end is impossible because queries must be answered in chronological order. By keeping blocks small (≈√Q), each query only pays a small extra cost scanning at most B in-block updates, while the base structure remains fast and stable. This gives a predictable trade-off that often results in overall O(Q√Q) time with low constant factors and simple implementation.

02Intuition & Analogies

Imagine running a help desk that receives a long stream of change requests (updates) and information requests (queries). Handling each change immediately for every query is like reprinting the whole manual each time—expensive. On the other hand, you can’t ignore changes until the very end because people need correct answers as they arrive.

So you adopt a routine: you work in day-sized batches. At the start of a day, your manual already includes all changes from past days. Throughout the day, you jot down today’s few changes on sticky notes. When someone asks a question, you first consult the manual (fast) and then glance through the day’s sticky notes (few) to adjust the answer. At day’s end, you permanently merge the sticky notes into the manual. The next day starts fresh with an updated manual and an empty sticky-note stack.

Here, the manual is your base data structure after applying all earlier blocks’ updates. The sticky notes are the in-block updates. Because there are only about √Q queries per block, there are at most √Q sticky notes to skim for any single answer—manageable. This way, your answers are chronologically correct (you don’t reorder requests), and you avoid the cost of constantly rewriting the manual (mutating the heavy structure per tiny update). Choosing the ‘day length’ as √Q balances the time spent skimming sticky notes against the frequency of merging them, providing an efficient middle ground between fully online and fully offline processing.

03Formal Definition

Let there be a sequence of Q operations \(, , , \) on an underlying object A[1..n]. Each operation is either an update U that mutates A, or a query Qry that asks a function f over A (e.g., sums, counts, minima). Pick a block size \(B\) (commonly \(B = \)). Partition the operation indices into blocks \(, , \) where \( = \{ (k-1)B+1, , (kB, Q) \} \). Maintain a base data structure S that supports: - applying updates from completed blocks (external updates), and - answering base queries as if only external updates have occurred. For the current block \(\): - Keep an ordered list \(\) of in-block updates encountered so far while scanning \(\) in chronological order. - To answer a query Qry in \(\), compute \() + g(Qry, )\), where \(\) is the base answer from S (ignoring in-block updates), and \(g\) is a correction computed by scanning only the in-block updates that affect the query. For additive queries, \(g\) sums simple per-update contributions; for non-additive queries, \(g\) may require deduplication or small helper structures. At the end of \(\), apply all updates in \(\) to S in order, clear \(\), and proceed to \(\). Under mild conditions (e.g., O(1) or O( n) base queries and O(1) per in-block adjustment), the total time is \(O(Q() + Q B)\), yielding \(O(Q)\) for \(B = ()\).

04When to Use

Use sqrt decomposition on queries when:

  • Updates and queries are interleaved and must be answered online in chronological order, but updating the full structure for every tiny change is too slow.
  • In-block updates can be handled cheaply per query: each update’s impact on a query is computable in O(1) or O(\log n) (e.g., point updates for range sums; range add for range sums via overlap length; value changes for frequency counts via inclusion/exclusion).
  • A robust base structure exists for the operation set (Fenwick tree, segment tree, order statistics by maps/vectors, etc.) that you can incrementally update between blocks.
  • Purely offline tricks (like sorting by ranges as in Mo’s algorithm or CDQ divide-and-conquer) either don’t apply or would complicate the solution due to constraints (e.g., queries depend on all prior updates).

Typical competitive programming settings include:

  • Point updates with range queries (sum/min/max), where base queries are O(\log n) and in-block corrections are O(B).
  • Range add with range sum, where an update’s effect on a query is proportional to overlap length.
  • Frequency or membership queries (e.g., count of a value in [L, R]) with point reassignment; maintain sorted position lists for base answers and adjust by the small set of in-block changes.
  • Problems where you want deterministic performance with simple code, avoiding rollback or persistence.

⚠️Common Mistakes

  • Double-counting in-block updates: If a position is updated multiple times within a block, only the last update before the current query should affect the query’s answer. Use a small hash map to track the latest in-block value per index when scanning.
  • Forgetting to clear per-block buffers: Always clear the list of in-block updates and any auxiliary maps when moving to the next block, or your adjustments will leak across blocks.
  • Applying updates to the base structure too early: In-block updates must not be committed to the base structure until the block ends; otherwise, you will apply their effect twice (once via base, once via scan).
  • Incorrect boundaries or inclusivity: Off-by-one errors on intervals [L, R] are common. Standardize on 1-indexing or 0-indexing and stick to it everywhere, including BIT/segment tree code.
  • Poor block size choice: While (B = \lceil\sqrt{Q}\rceil) is a safe default, very skewed workloads (almost all queries or almost all updates) might benefit from tuning (e.g., c·√Q). Measure constants.
  • Heavy per-update adjustment: If computing an in-block update’s contribution to a query takes more than O(1)–O(\log n), the method can degrade. Seek additive formulas (overlap length, linearity) or precompute small helpers per block.
  • Not maintaining the external array state: For some problems, you need the current base value at a position when scanning updates. Keep a base array synchronized with committed updates to avoid reading stale values.

Key Formulas

Default Block Size

Explanation: Choosing B around the square root of the number of operations balances the per-query scan of in-block updates and the number of block transitions. It is a robust default in practice.

Baseline Time with No Rebuild

Explanation: Total time equals the cost of answering all queries on the base structure plus scanning up to B in-block updates per query. If both costs are small constants, this is O(QB).

Time with Per-Block Rebuild

Explanation: If each block requires rebuilding a helper in R(n) time, total time has a rebuild term Q/B · R(n) plus the per-query in-block adjustment term. This captures many advanced uses.

Optimal B with Rebuild

Explanation: Minimizing T(Q,B) when rebuild cost R(n) and per-update adjustment cost dominate yields an optimal block size proportional to the square root of their ratio. Plugging B* back gives the best asymptotic balance.

Typical Overall Complexity

Explanation: With constant base and adjustment costs and the standard B = √Q choice, the total time is O(Q√Q). This is a common and acceptable bound in practice.

Overlap Length

Explanation: The size of the intersection of two inclusive ranges. It enables O(1) adjustments for range-add updates to range-sum queries.

Summation Identity

Explanation: A reminder that many additive adjustments reduce to simple arithmetic series. It’s helpful when reasoning about cumulative effects of updates.

Complexity Analysis

Let Q be the number of operations and choose block size B. We process ⌈Q/B⌉ blocks sequentially. Each update appears in exactly one block as an in-block update and is committed exactly once to the base structure between blocks, so the total commit cost over the whole run is O(Q · ), where is the base structure’s update cost (e.g., O(log n) for a Fenwick/segment tree). For a query, we first ask the base structure in O() (e.g., O(log n) for sums/mins) and then scan all in-block updates that could affect it. In the worst case there are at most B in-block updates so the scan is O(B · ), where is the per-update adjustment cost for that query (often O(1)). Therefore, per query cost is O( + B · ), and over all Q operations, O(Q · + Q · B · ). If there is an additional per-block rebuild with cost R(n), the total rebuild cost is O( · R(n)). The complete running time is then T(Q,B) = O( · R(n) + Q · + Q · B · ). Choosing B to minimize this expression yields B* = √(R(n)/) and T(Q,B*) = O(Q · + 2Q · √(R(n) · )). When no explicit rebuild is needed (R(n) = 0), a simple and effective choice is B = ⌈√Q⌉, giving T(Q) = O(Q · + Q · √Q · ). Space usage is dominated by the base structure (typically O(n)) plus the temporary in-block buffer of size O(B).

Code Examples

Point update (+delta) and range sum query using sqrt decomposition on queries (Fenwick base)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Fenwick Tree (Binary Indexed Tree) for prefix sums (1-indexed)
5struct Fenwick {
6 int n; vector<long long> bit;
7 Fenwick(int n=0): n(n), bit(n+1, 0) {}
8 void init(int n_) { n = n_; bit.assign(n+1, 0); }
9 // add value v at position idx
10 void add(int idx, long long v) {
11 for (; idx <= n; idx += idx & -idx) bit[idx] += v;
12 }
13 // prefix sum [1..idx]
14 long long sumPrefix(int idx) const {
15 long long res = 0;
16 for (; idx > 0; idx -= idx & -idx) res += bit[idx];
17 return res;
18 }
19 // range sum [l..r]
20 long long sumRange(int l, int r) const {
21 if (l > r) return 0;
22 return sumPrefix(r) - sumPrefix(l-1);
23 }
24};
25
26struct Update { int idx; long long delta; };
27
28int main() {
29 ios::sync_with_stdio(false);
30 cin.tie(nullptr);
31
32 // Input format:
33 // n q
34 // a1 a2 ... an (initial array)
35 // Then q lines, each is either:
36 // 1 i delta --> point update: a[i] += delta
37 // 2 l r --> query: sum of a[l..r]
38
39 int n, q; if (!(cin >> n >> q)) return 0;
40 vector<long long> a(n+1);
41 for (int i = 1; i <= n; ++i) cin >> a[i];
42
43 Fenwick fw(n);
44 for (int i = 1; i <= n; ++i) fw.add(i, a[i]); // build base with initial array
45
46 int B = max(1, (int)floor(sqrt((long double)q))); // block size ~ sqrt(q)
47
48 vector<Update> inBlock; inBlock.reserve(B+5);
49
50 auto commit_block = [&]() {
51 // Apply in-block updates to Fenwick and to base array
52 for (auto &u : inBlock) {
53 fw.add(u.idx, u.delta);
54 a[u.idx] += u.delta; // keep base array consistent (useful if needed elsewhere)
55 }
56 inBlock.clear();
57 };
58
59 for (int t = 0; t < q; ++t) {
60 int type; cin >> type;
61 if (type == 1) {
62 int i; long long delta; cin >> i >> delta;
63 // Record update in current block only; don't touch Fenwick yet
64 inBlock.push_back({i, delta});
65 } else if (type == 2) {
66 int l, r; cin >> l >> r;
67 // Base answer from Fenwick (external updates only)
68 long long ans = fw.sumRange(l, r);
69 // Adjust by scanning in-block point updates
70 for (const auto &u : inBlock) {
71 if (l <= u.idx && u.idx <= r) ans += u.delta;
72 }
73 cout << ans << '\n';
74 } else {
75 cerr << "Unknown type" << endl; return 0;
76 }
77
78 // End of block: commit accumulated updates
79 if ((t+1) % B == 0) commit_block();
80 }
81 // Commit remaining updates after the last partial block
82 commit_block();
83
84 return 0;
85}
86

We maintain a Fenwick tree as the base structure that already includes all updates from past blocks. Inside the current block, point updates are buffered in inBlock. A range-sum query is answered by combining the Fenwick base sum with the contributions of at most B in-block updates that fall within [l, r]. At the end of each block, we apply the buffered updates to the Fenwick tree in one batch and clear the buffer.

Time: Each query: O(log n + B). Each update commit: O(log n). Total: O(Q log n + Q B). With B = Θ(√Q), O(Q log n + Q√Q).Space: O(n) for Fenwick and O(B) for the in-block buffer.
Range add and range sum with sqrt decomposition on queries (two Fenwicks base)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Two-Fenwick technique for range add and range sum queries (1-indexed)
5struct Fenwick {
6 int n; vector<long long> bit;
7 Fenwick(int n=0): n(n), bit(n+1, 0) {}
8 void init(int n_) { n = n_; bit.assign(n+1, 0); }
9 void add(int idx, long long v) { for (; idx <= n; idx += idx & -idx) bit[idx] += v; }
10 long long sumPrefix(int idx) const { long long r=0; for (; idx>0; idx-=idx&-idx) r+=bit[idx]; return r; }
11};
12
13struct RangeFenwick {
14 int n; Fenwick B1, B2;
15 RangeFenwick(int n=0): n(n), B1(n), B2(n) {}
16 void init(int n_) { n = n_; B1.init(n); B2.init(n); }
17 // Add v to [l, r]
18 void range_add(int l, int r, long long v) {
19 if (l>r) return;
20 B1.add(l, v); B1.add(r+1, -v);
21 B2.add(l, v*(l-1)); B2.add(r+1, -v*r);
22 }
23 long long prefix_sum(int x) const { return B1.sumPrefix(x)*x - B2.sumPrefix(x); }
24 long long range_sum(int l, int r) const { if (l>r) return 0; return prefix_sum(r) - prefix_sum(l-1); }
25};
26
27struct Update { int l, r; long long add; };
28
29int main(){
30 ios::sync_with_stdio(false);
31 cin.tie(nullptr);
32
33 // Input format:
34 // n q
35 // a1 a2 ... an (initial array)
36 // q lines:
37 // 1 l r v --> add v to [l, r]
38 // 2 l r --> query sum on [l, r]
39
40 int n, q; if (!(cin >> n >> q)) return 0;
41 vector<long long> a(n+1);
42 for (int i = 1; i <= n; ++i) cin >> a[i];
43
44 RangeFenwick rf(n);
45 // Build: add initial a[i] as point additions
46 for (int i = 1; i <= n; ++i) rf.range_add(i, i, a[i]);
47
48 int B = max(1, (int)floor(sqrt((long double)q)));
49 vector<Update> inBlock; inBlock.reserve(B+5);
50
51 auto commit_block = [&]() {
52 for (auto &u : inBlock) rf.range_add(u.l, u.r, u.add);
53 inBlock.clear();
54 };
55
56 auto overlap_len = [](int l1, int r1, int l2, int r2) -> long long {
57 int L = max(l1, l2), R = min(r1, r2);
58 return (L > R) ? 0LL : (long long)(R - L + 1);
59 };
60
61 for (int t = 0; t < q; ++t) {
62 int type; cin >> type;
63 if (type == 1) {
64 int l, r; long long v; cin >> l >> r >> v;
65 // Buffer in-block range add
66 inBlock.push_back({l, r, v});
67 } else if (type == 2) {
68 int l, r; cin >> l >> r;
69 long long ans = rf.range_sum(l, r); // base answer
70 // Add contributions from in-block range adds via overlap length
71 for (const auto &u : inBlock) {
72 long long len = overlap_len(l, r, u.l, u.r);
73 if (len) ans += len * u.add;
74 }
75 cout << ans << '\n';
76 } else {
77 cerr << "Unknown type" << '\n'; return 0;
78 }
79 if ((t+1) % B == 0) commit_block();
80 }
81 commit_block();
82 return 0;
83}
84

We support range add and range sum using two Fenwick trees as the base structure, which reflects all updates from completed blocks. Within a block, range updates are buffered; a query’s correction is the update value multiplied by the overlap length between the update range and the query range. This uses linearity to make in-block adjustments O(1) each.

Time: Each base query O(log n), each commit update O(log n). Each query scans up to B in-block updates for O(B) adjustment. Total O(Q log n + Q B). With B = Θ(√Q), O(Q log n + Q√Q).Space: O(n) for two Fenwicks and O(B) for in-block updates.
Count occurrences of a value in [L, R] with point reassignment using sqrt decomposition on queries
1#include <bits/stdc++.h>
2using namespace std;
3
4// We maintain for each value a sorted vector of positions where it occurs (external state only).
5// Queries: count of value x in [l, r]. Updates: a[i] := newVal.
6// In-block, we buffer updates and adjust per query by scanning at most B updates.
7
8int main(){
9 ios::sync_with_stdio(false);
10 cin.tie(nullptr);
11
12 // Input format:
13 // n q
14 // a1 a2 ... an (initial values)
15 // q lines:
16 // 1 i v --> assign a[i] = v
17 // 2 l r x --> count of value x in [l, r]
18
19 int n, q; if (!(cin >> n >> q)) return 0;
20 vector<int> a(n+1);
21 for (int i = 1; i <= n; ++i) cin >> a[i];
22
23 // Coordinate compression is optional if values are large; here we store raw values in an ordered map.
24 unordered_map<int, vector<int>> pos; pos.reserve(n*2);
25 pos.max_load_factor(0.7);
26 // Build position lists for external state
27 {
28 // We’ll collect then sort once per value
29 unordered_map<int, vector<int>> buckets; buckets.reserve(n*2); buckets.max_load_factor(0.7);
30 for (int i = 1; i <= n; ++i) buckets[a[i]].push_back(i);
31 for (auto &kv : buckets) {
32 auto &vec = kv.second; sort(vec.begin(), vec.end());
33 pos.emplace(kv.first, move(vec));
34 }
35 }
36
37 struct Upd { int idx; int oldv; int newv; };
38 int B = max(1, (int)floor(sqrt((long double)q)));
39 vector<Upd> inBlock; inBlock.reserve(B+5);
40
41 auto commit_block = [&]() {
42 // Apply buffered updates to external state: update a[], and move index in pos maps
43 for (auto &u : inBlock) {
44 if (u.oldv == u.newv) continue;
45 // Remove idx from pos[oldv]
46 auto &vecOld = pos[u.oldv];
47 auto itOld = lower_bound(vecOld.begin(), vecOld.end(), u.idx);
48 if (itOld != vecOld.end() && *itOld == u.idx) vecOld.erase(itOld);
49 // Insert idx into pos[newv]
50 auto &vecNew = pos[u.newv];
51 auto itNew = lower_bound(vecNew.begin(), vecNew.end(), u.idx);
52 vecNew.insert(itNew, u.idx);
53 a[u.idx] = u.newv; // base array reflects committed update
54 }
55 inBlock.clear();
56 };
57
58 auto base_count_in_range = [&](int x, int l, int r) -> int {
59 auto it = pos.find(x);
60 if (it == pos.end()) return 0;
61 const auto &v = it->second;
62 auto L = lower_bound(v.begin(), v.end(), l);
63 auto R = upper_bound(v.begin(), v.end(), r);
64 return (int)(R - L);
65 };
66
67 for (int t = 0; t < q; ++t) {
68 int type; cin >> type;
69 if (type == 1) {
70 int i, v; cin >> i >> v;
71 // Buffer update with old value at block start (from base array a)
72 inBlock.push_back({i, a[i], v});
73 } else if (type == 2) {
74 int l, r, x; cin >> l >> r >> x;
75 int ans = base_count_in_range(x, l, r);
76
77 // Adjust by scanning in-block updates that occurred so far.
78 // Use a small map to ensure we only consider the last update per index within the block up to now.
79 unordered_map<int,int> lastNew; lastNew.reserve(inBlock.size()*2); lastNew.max_load_factor(0.7);
80 for (const auto &u : inBlock) lastNew[u.idx] = u.newv; // record latest new value for each updated index
81
82 // Now adjust count relative to base array a[] (external state only)
83 for (const auto &kv : lastNew) {
84 int idx = kv.first; int curv = kv.second; // current value after in-block updates
85 if (idx < l || idx > r) continue; // outside query range
86 int basev = a[idx]; // value at block start (after external updates)
87 // If base has x but current does not, we overcounted by 1; if base does not have x but current does, we undercounted by 1.
88 if (basev == x && curv != x) ans -= 1;
89 else if (basev != x && curv == x) ans += 1;
90 }
91 cout << ans << '\n';
92 } else {
93 cerr << "Unknown type" << '\n'; return 0;
94 }
95
96 if ((t+1) % B == 0) commit_block();
97 }
98 commit_block();
99 return 0;
100}
101

The base state stores, for each value x, a sorted vector of positions where x occurs; thus base queries are answered with two binary searches. Within a block, we buffer point assignments and, for each query, adjust the base count by scanning the distinct updated indices that lie in [L, R]. We compare their base value (from the external a[]) to their current in-block value (the latest buffered one) to add or subtract one accordingly. At block end, we commit updates by removing and inserting positions in the appropriate value lists.

Time: Base query O(log n) via two binary searches. Per query, adjustment scans up to B distinct indices, so O(B). Commit cost per update is O(log n) for erase/insert in vectors (amortized via binary search + insertion). Total O(Q log n + Q B). With B = Θ(√Q), O(Q log n + Q√Q).Space: O(n) for storing position lists plus O(B) for in-block updates and temporary maps.