Sqrt Decomposition
Key Points
- •Sqrt decomposition partitions an array into about \(\) blocks, each of size about \(\), to speed up range queries and updates.
- •Each block stores a precomputed summary (like sum or minimum) so that queries touch only O(\(\)) full blocks plus O(\(\)) boundary elements.
- •Point updates only rebuild the affected block in O(\(\)) time, keeping the structure simple and cache-friendly.
- •Choosing block size \(B \) balances the cost between full blocks and boundary elements, minimizing total time.
- •Sqrt decomposition is often simpler to implement than segment trees and is flexible for custom per-block logic.
- •With sorted data per block, you can answer order-statistic-like queries (e.g., counts \( x\)) in O(\( n\)).
- •Lazy tags per block let you support range add + range sum with O(\(\)) per operation.
- •It is a conceptual foundation for Mo’s algorithm, which reorders offline queries by blocks to achieve O(\(n\))-ish performance.
Prerequisites
- →Arrays and indexing — Sqrt decomposition partitions arrays into contiguous blocks and requires precise index arithmetic.
- →Big-O notation — Understanding O(\(\sqrt{n}\)), O(\(\log n\)), and trade-offs like n/B + B is essential for design choices.
- →Prefix sums (for static queries) — Provides a baseline; knowing when sqrt is preferable to static precomputation helps decision-making.
- →Binary search — Per-block sorted vectors use upper_bound/lower_bound for efficient counting queries.
- →Basic calculus (optional) — Deriving the optimal block size uses minimizing n/B + B via derivatives.
- →Segment tree / Fenwick tree (contrast) — Helps you choose sqrt decomposition vs tree-based structures for given constraints and operations.
Detailed Explanation
Tap terms for definitions01Overview
Sqrt decomposition (also called block or bucket decomposition) is a technique for accelerating array operations by splitting an array of length n into roughly (\sqrt{n}) contiguous blocks. For each block, we precompute a summary that answers part of a query quickly—such as the sum, minimum, maximum, or a sorted copy for frequency counting. When answering a range query [l, r], we process at most two partial blocks at the ends element-by-element and handle all fully covered blocks using their precomputed summaries. Point updates only affect one block, so we rebuild or adjust that block’s summary in sublinear time. This approach yields time per operation around O((\sqrt{n})) instead of O(n), while keeping implementation straightforward and memory light. Unlike segment trees or Fenwick trees, sqrt decomposition trades some asymptotic optimality for ease and flexibility; it is especially attractive when operations are complex or custom per-block logic is easier than maintaining a full tree. The idea extends naturally: with lazy block tags you can implement range add + range sum; with block-sorted vectors you can count values (\le x) in a range; with per-block minima you can do range minimum queries. The method is also conceptually linked to Mo’s algorithm, which organizes offline queries by blocks to keep modifications local.
02Intuition & Analogies
Imagine a long bookshelf with thousands of books, and you need to answer questions like “How many pages are in books 200 through 800?” or “What’s the thinnest book between positions 1,000 and 1,500?” If you computed from scratch each time, you’d scan hundreds of books repeatedly. Instead, you divide the shelf into moderate-sized sections—say 100 books each. For each section, you keep a sticky note with a summary: the total pages, the shortest book’s thickness, maybe even a sorted list of thicknesses. Now, when someone asks about a range, you only skim the few books on the edges that don’t make a complete section and use the sticky note summaries for any whole sections inside. That means you touch far fewer books per question. If a book changes (an update), you only need to update the sticky note for that one section, not the entire shelf. Choosing the section size is a balancing act: larger sections mean fewer summaries but more edge-skimming; smaller sections mean more summaries but smaller edges. Picking section size about (\sqrt{n}) hits a sweet spot where both the number of sections and section size are similar, keeping the total work low. This mindset—summarize chunks, treat boundaries directly, and keep updates local—is the heart of sqrt decomposition.
03Formal Definition
04When to Use
Use sqrt decomposition when you need to support both queries and updates on arrays, want something simpler than a segment tree, and O((\sqrt{n})) per operation fits the constraints. It shines when: (1) operations are complex but aggregatable per block (e.g., a custom function that’s hard to maintain along a tree), (2) values fit well with per-block summaries like sums, minima, or sorted vectors for frequency counting, (3) constraints are around n, q up to 2e5–5e5 where (\sqrt{n}) is a few hundred, and constant factors matter but are manageable, and (4) you are prototyping or in a contest where implementation time is limited. Prefer a Fenwick or segment tree if you need O((\log n)) worst-case and operations are standard (sum, min/max). Consider Mo’s algorithm if queries are offline and there are no (or few) updates—Mo often achieves better constants when reordering queries is allowed. Use sqrt with lazy tags for range add + range sum, but for heavier combinations (range min with range add), a full segment tree with lazy propagation may be more appropriate.
⚠️Common Mistakes
Common pitfalls include: (1) Picking a poor block size. If B is too small, overhead from many blocks dominates; if too large, boundary scanning dominates. Use (B \approx \lfloor \sqrt{n} \rfloor) or tune with benchmarks. (2) Off-by-one errors at block boundaries—be precise with left and right end indices: left boundary end is (\min(r, (\lfloor l/B \rfloor + 1)B - 1)), and full blocks are those with (jB \ge l) and ((j+1)B - 1 \le r). (3) Forgetting to update the block summary after a point update; e.g., update the element but not the block sum or block minimum. (4) Overflow with sums when using 32-bit ints; prefer 64-bit (long long) for sums. (5) Mixing lazy tags and raw values incorrectly—when querying partial elements, remember to include the block’s lazy tag in their contributions. (6) For per-block sorted vectors, failing to erase/insert correctly on updates (use lower_bound/upper_bound and maintain sorted order). (7) Assuming theoretical O((\sqrt{n})) automatically wins—constants can be high; test against prefix sums (for static queries) or BIT/segment trees (for simpler operations). (8) Not considering cache behavior—contiguous arrays and tight loops over blocks are typically faster than pointer-heavy structures.
Key Formulas
Number of Blocks
Explanation: If you split n elements into blocks of size B, you need m blocks. The ceiling accounts for the possibly smaller last block.
Query Cost Trade-off
Explanation: A range query touches O full blocks and O(B) boundary elements. The total time balances these two terms.
Optimal Block Size
Explanation: Minimizing the trade-off gives B approximately the square root of n. This choice balances the number of blocks and the boundary size.
Calculus Derivation
Explanation: Taking the derivative and setting it to zero yields the minimizing B. This is a standard way to optimize continuous relaxations of discrete parameters.
Block Sum
Explanation: The summary for block j can be its total sum. Full-block contributions to a range sum query are then added in O(1) each.
Count Values \(\le x\) via Blocks
Explanation: For each fully covered block, use binary search in its sorted list to count elements \( x\). Handle the boundary elements individually.
Update Cost
Explanation: If the summary is simple (sum, min), updates are O(B). If the summary is sorted, maintaining order adds a \( B\) factor for binary search.
Total Work with q Queries and u Updates
Explanation: Overall cost depends on the number of queries q and updates u. You can tune B to your workload and the per-update cost function.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct SqrtDecompSum { 5 int n; // array size 6 int B; // block size 7 int m; // number of blocks 8 vector<long long> a; // original array 9 vector<long long> bsum;// sum per block 10 11 SqrtDecompSum(int n=0): n(n), B(0), m(0) {} 12 13 void build(const vector<long long>& arr) { 14 n = (int)arr.size(); 15 a = arr; 16 B = max(1, (int)sqrt((long double)n)); // choose B ≈ sqrt(n) 17 m = (n + B - 1) / B; 18 bsum.assign(m, 0); 19 for (int i = 0; i < n; ++i) bsum[i / B] += a[i]; 20 } 21 22 // Set a[pos] = val 23 void point_update(int pos, long long val) { 24 int bid = pos / B; 25 bsum[bid] += (val - a[pos]); 26 a[pos] = val; 27 } 28 29 // Query sum over [l, r] inclusive 30 long long range_sum(int l, int r) const { 31 if (l > r) return 0; 32 long long ans = 0; 33 int bl = l / B, br = r / B; 34 if (bl == br) { 35 for (int i = l; i <= r; ++i) ans += a[i]; 36 return ans; 37 } 38 int lend = (bl + 1) * B - 1; // end of left partial block 39 for (int i = l; i <= lend; ++i) ans += a[i]; 40 for (int b = bl + 1; b <= br - 1; ++b) ans += bsum[b]; 41 for (int i = br * B; i <= r; ++i) ans += a[i]; 42 return ans; 43 } 44 }; 45 46 int main() { 47 ios::sync_with_stdio(false); 48 cin.tie(nullptr); 49 50 int n, q; // number of elements and queries 51 cin >> n >> q; 52 vector<long long> arr(n); 53 for (int i = 0; i < n; ++i) cin >> arr[i]; 54 55 SqrtDecompSum ds; ds.build(arr); 56 57 // Supported ops: 58 // 1 l r -> print sum on [l, r] (0-based inclusive) 59 // 2 i x -> set a[i] = x 60 while (q--) { 61 int type; cin >> type; 62 if (type == 1) { 63 int l, r; cin >> l >> r; 64 cout << ds.range_sum(l, r) << '\n'; 65 } else if (type == 2) { 66 int i; long long x; cin >> i >> x; 67 ds.point_update(i, x); 68 } 69 } 70 return 0; 71 } 72
We split the array into blocks of size B ≈ √n and store the sum per block. A range sum query adds boundary elements explicitly and uses precomputed sums for full blocks. A point update adjusts the affected block’s sum. This is the canonical pattern for sqrt decomposition.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Supports two operations: 5 // - add x to all a[i] in [l, r] 6 // - query sum of a[i] over [l, r] 7 // Implementation: sqrt decomposition with per-block lazy add tag. 8 struct SqrtRangeAddSum { 9 int n, B, m; 10 vector<long long> a; // base values stored explicitly 11 vector<long long> bsum; // sum of block with current lazy implicitly applied 12 vector<long long> blazy; // additive lazy tag per block 13 14 void build(const vector<long long>& arr) { 15 n = (int)arr.size(); 16 a = arr; 17 B = max(1, (int)sqrt((long double)n)); 18 m = (n + B - 1) / B; 19 bsum.assign(m, 0); 20 blazy.assign(m, 0); 21 for (int i = 0; i < n; ++i) bsum[i / B] += a[i]; 22 } 23 24 // Apply add x to [l, r] 25 void range_add(int l, int r, long long x) { 26 int bl = l / B, br = r / B; 27 if (bl == br) { 28 // Same block: update elements and block sum directly 29 for (int i = l; i <= r; ++i) { 30 a[i] += x; 31 bsum[bl] += x; 32 } 33 return; 34 } 35 int lend = (bl + 1) * B - 1; 36 for (int i = l; i <= lend; ++i) { 37 a[i] += x; 38 bsum[bl] += x; 39 } 40 for (int b = bl + 1; b <= br - 1; ++b) { 41 blazy[b] += x; // mark full block 42 bsum[b] += x * (long long)B; // sum increases by x per element 43 } 44 for (int i = br * B; i <= r; ++i) { 45 a[i] += x; 46 bsum[br] += x; 47 } 48 } 49 50 // Query sum on [l, r] 51 long long range_sum(int l, int r) const { 52 long long ans = 0; 53 int bl = l / B, br = r / B; 54 if (bl == br) { 55 for (int i = l; i <= r; ++i) ans += a[i] + blazy[bl]; 56 return ans; 57 } 58 int lend = (bl + 1) * B - 1; 59 for (int i = l; i <= lend; ++i) ans += a[i] + blazy[bl]; 60 for (int b = bl + 1; b <= br - 1; ++b) ans += bsum[b]; // already includes lazy effect 61 for (int i = br * B; i <= r; ++i) ans += a[i] + blazy[br]; 62 return ans; 63 } 64 }; 65 66 int main() { 67 ios::sync_with_stdio(false); 68 cin.tie(nullptr); 69 70 int n, q; cin >> n >> q; 71 vector<long long> a(n); 72 for (int i = 0; i < n; ++i) cin >> a[i]; 73 74 SqrtRangeAddSum ds; ds.build(a); 75 76 // Ops: 77 // 1 l r x -> add x to [l, r] 78 // 2 l r -> sum on [l, r] 79 while (q--) { 80 int t; cin >> t; 81 if (t == 1) { 82 int l, r; long long x; cin >> l >> r >> x; 83 ds.range_add(l, r, x); 84 } else if (t == 2) { 85 int l, r; cin >> l >> r; 86 cout << ds.range_sum(l, r) << '\n'; 87 } 88 } 89 return 0; 90 } 91
We maintain a lazy add tag per block. Full blocks in a range add are updated by increasing the block’s lazy tag and its precomputed sum. Queries over partial blocks add the lazy tag to explicit elements, while full blocks use the stored sum. This demonstrates block-level lazy propagation.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Supports: 5 // - count of elements <= x on [l, r] 6 // - point update a[pos] = val 7 // Maintains a sorted vector per block for binary searching full blocks. 8 struct SqrtCountLEQ { 9 int n, B, m; 10 vector<int> a; 11 vector<vector<int>> bsorted; // sorted copy per block 12 13 void build(const vector<int>& arr) { 14 n = (int)arr.size(); 15 a = arr; 16 B = max(1, (int)sqrt((long double)n)); 17 m = (n + B - 1) / B; 18 bsorted.assign(m, {}); 19 for (int i = 0; i < n; ++i) bsorted[i / B].push_back(a[i]); 20 for (int b = 0; b < m; ++b) sort(bsorted[b].begin(), bsorted[b].end()); 21 } 22 23 // Count of values <= x in [l, r] 24 int count_leq(int l, int r, int x) const { 25 int bl = l / B, br = r / B; 26 int ans = 0; 27 if (bl == br) { 28 for (int i = l; i <= r; ++i) ans += (a[i] <= x); 29 return ans; 30 } 31 int lend = (bl + 1) * B - 1; 32 for (int i = l; i <= lend; ++i) ans += (a[i] <= x); 33 for (int b = bl + 1; b <= br - 1; ++b) { 34 // count <= x via upper_bound in sorted block 35 ans += upper_bound(bsorted[b].begin(), bsorted[b].end(), x) - bsorted[b].begin(); 36 } 37 for (int i = br * B; i <= r; ++i) ans += (a[i] <= x); 38 return ans; 39 } 40 41 // Set a[pos] = val 42 void point_update(int pos, int val) { 43 int b = pos / B; 44 // remove old a[pos] 45 auto &vec = bsorted[b]; 46 auto it = lower_bound(vec.begin(), vec.end(), a[pos]); 47 // it should exist because value is present 48 if (it != vec.end() && *it == a[pos]) vec.erase(it); 49 // insert new value keeping vec sorted 50 auto it2 = upper_bound(vec.begin(), vec.end(), val); 51 vec.insert(it2, val); 52 a[pos] = val; 53 } 54 }; 55 56 int main() { 57 ios::sync_with_stdio(false); 58 cin.tie(nullptr); 59 60 int n, q; cin >> n >> q; 61 vector<int> a(n); 62 for (int i = 0; i < n; ++i) cin >> a[i]; 63 SqrtCountLEQ ds; ds.build(a); 64 65 // Ops: 66 // 1 l r x -> count of elements <= x on [l, r] 67 // 2 i v -> set a[i] = v 68 while (q--) { 69 int t; cin >> t; 70 if (t == 1) { 71 int l, r, x; cin >> l >> r >> x; 72 cout << ds.count_leq(l, r, x) << '\n'; 73 } else { 74 int i, v; cin >> i >> v; 75 ds.point_update(i, v); 76 } 77 } 78 return 0; 79 } 80
Each block stores a sorted vector of its elements. For fully covered blocks, we count \(\le x\) using upper_bound; for partial blocks, we check elements directly. Point updates remove the old value and insert the new value into the block’s sorted vector, keeping it sorted.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct SqrtRangeMin { 5 int n, B, m; 6 vector<long long> a; 7 vector<long long> bmin; // minimum per block 8 9 void build(const vector<long long>& arr) { 10 n = (int)arr.size(); 11 a = arr; 12 B = max(1, (int)sqrt((long double)n)); 13 m = (n + B - 1) / B; 14 bmin.assign(m, LLONG_MAX); 15 for (int i = 0; i < n; ++i) bmin[i / B] = min(bmin[i / B], a[i]); 16 } 17 18 long long range_min(int l, int r) const { 19 long long ans = LLONG_MAX; 20 int bl = l / B, br = r / B; 21 if (bl == br) { 22 for (int i = l; i <= r; ++i) ans = min(ans, a[i]); 23 return ans; 24 } 25 int lend = (bl + 1) * B - 1; 26 for (int i = l; i <= lend; ++i) ans = min(ans, a[i]); 27 for (int b = bl + 1; b <= br - 1; ++b) ans = min(ans, bmin[b]); 28 for (int i = br * B; i <= r; ++i) ans = min(ans, a[i]); 29 return ans; 30 } 31 32 void point_update(int pos, long long val) { 33 int b = pos / B; 34 a[pos] = val; 35 // Recompute block minimum (O(B)) 36 long long mn = LLONG_MAX; 37 int l = b * B; 38 int r = min(n - 1, (b + 1) * B - 1); 39 for (int i = l; i <= r; ++i) mn = min(mn, a[i]); 40 bmin[b] = mn; 41 } 42 }; 43 44 int main() { 45 ios::sync_with_stdio(false); 46 cin.tie(nullptr); 47 48 int n, q; cin >> n >> q; 49 vector<long long> a(n); 50 for (int i = 0; i < n; ++i) cin >> a[i]; 51 52 SqrtRangeMin ds; ds.build(a); 53 54 // Ops: 55 // 1 l r -> min on [l, r] 56 // 2 i x -> set a[i] = x 57 while (q--) { 58 int t; cin >> t; 59 if (t == 1) { 60 int l, r; cin >> l >> r; 61 cout << ds.range_min(l, r) << '\n'; 62 } else { 63 int i; long long x; cin >> i >> x; 64 ds.point_update(i, x); 65 } 66 } 67 return 0; 68 } 69
We maintain the minimum per block. Queries scan two partial blocks and use block minima for full blocks. Point updates rebuild only the affected block’s minimum in O(\(\sqrt{n}\)).