🗂️Data StructureIntermediate

Sqrt Decomposition

Key Points

  • Sqrt decomposition partitions an array into about \(\) blocks, each of size about \(\), to speed up range queries and updates.
  • Each block stores a precomputed summary (like sum or minimum) so that queries touch only O(\(\)) full blocks plus O(\(\)) boundary elements.
  • Point updates only rebuild the affected block in O(\(\)) time, keeping the structure simple and cache-friendly.
  • Choosing block size \(B \) balances the cost between full blocks and boundary elements, minimizing total time.
  • Sqrt decomposition is often simpler to implement than segment trees and is flexible for custom per-block logic.
  • With sorted data per block, you can answer order-statistic-like queries (e.g., counts \( x\)) in O(\( n\)).
  • Lazy tags per block let you support range add + range sum with O(\(\)) per operation.
  • It is a conceptual foundation for Mo’s algorithm, which reorders offline queries by blocks to achieve O(\(n\))-ish performance.

Prerequisites

  • Arrays and indexingSqrt decomposition partitions arrays into contiguous blocks and requires precise index arithmetic.
  • Big-O notationUnderstanding O(\(\sqrt{n}\)), O(\(\log n\)), and trade-offs like n/B + B is essential for design choices.
  • Prefix sums (for static queries)Provides a baseline; knowing when sqrt is preferable to static precomputation helps decision-making.
  • Binary searchPer-block sorted vectors use upper_bound/lower_bound for efficient counting queries.
  • Basic calculus (optional)Deriving the optimal block size uses minimizing n/B + B via derivatives.
  • Segment tree / Fenwick tree (contrast)Helps you choose sqrt decomposition vs tree-based structures for given constraints and operations.

Detailed Explanation

Tap terms for definitions

01Overview

Sqrt decomposition (also called block or bucket decomposition) is a technique for accelerating array operations by splitting an array of length n into roughly (\sqrt{n}) contiguous blocks. For each block, we precompute a summary that answers part of a query quickly—such as the sum, minimum, maximum, or a sorted copy for frequency counting. When answering a range query [l, r], we process at most two partial blocks at the ends element-by-element and handle all fully covered blocks using their precomputed summaries. Point updates only affect one block, so we rebuild or adjust that block’s summary in sublinear time. This approach yields time per operation around O((\sqrt{n})) instead of O(n), while keeping implementation straightforward and memory light. Unlike segment trees or Fenwick trees, sqrt decomposition trades some asymptotic optimality for ease and flexibility; it is especially attractive when operations are complex or custom per-block logic is easier than maintaining a full tree. The idea extends naturally: with lazy block tags you can implement range add + range sum; with block-sorted vectors you can count values (\le x) in a range; with per-block minima you can do range minimum queries. The method is also conceptually linked to Mo’s algorithm, which organizes offline queries by blocks to keep modifications local.

02Intuition & Analogies

Imagine a long bookshelf with thousands of books, and you need to answer questions like “How many pages are in books 200 through 800?” or “What’s the thinnest book between positions 1,000 and 1,500?” If you computed from scratch each time, you’d scan hundreds of books repeatedly. Instead, you divide the shelf into moderate-sized sections—say 100 books each. For each section, you keep a sticky note with a summary: the total pages, the shortest book’s thickness, maybe even a sorted list of thicknesses. Now, when someone asks about a range, you only skim the few books on the edges that don’t make a complete section and use the sticky note summaries for any whole sections inside. That means you touch far fewer books per question. If a book changes (an update), you only need to update the sticky note for that one section, not the entire shelf. Choosing the section size is a balancing act: larger sections mean fewer summaries but more edge-skimming; smaller sections mean more summaries but smaller edges. Picking section size about (\sqrt{n}) hits a sweet spot where both the number of sections and section size are similar, keeping the total work low. This mindset—summarize chunks, treat boundaries directly, and keep updates local—is the heart of sqrt decomposition.

03Formal Definition

Given an array \(a[0..n-1]\) and an integer block size \(B\), partition indices into blocks \( = [jB,\, ((j+1)B, n)-1]\) for \(,1,, m-1\), where \(m = n/B \). For each block, maintain a summary structure \(\) that supports: (1) merge of two summaries (if needed), (2) apply of updates (point or range-limited), and (3) query over a full block in sublinear time relative to block length. A range query over [l, r] is answered by: processing partial blocks at the ends element-wise, and for each fully covered block \( [l,r]\), using \(\) in O(1), O( B\), or similar time depending on the summary design. A point update at index i modifies \(a[i]\) and updates only the summary \(\). With appropriate \(\), time per operation is typically \(O(n/B + B)\). Minimizing this with respect to \(B\) gives \(B = ()\), yielding \(O()\) time per operation. Variants add a lazy tag per block for range updates (e.g., additive tags) or maintain per-block sorted arrays for order-statistic-like queries. Correctness follows from partitioning the range into disjoint covered full blocks and uncovered boundary elements, ensuring no element is double-counted or missed.

04When to Use

Use sqrt decomposition when you need to support both queries and updates on arrays, want something simpler than a segment tree, and O((\sqrt{n})) per operation fits the constraints. It shines when: (1) operations are complex but aggregatable per block (e.g., a custom function that’s hard to maintain along a tree), (2) values fit well with per-block summaries like sums, minima, or sorted vectors for frequency counting, (3) constraints are around n, q up to 2e5–5e5 where (\sqrt{n}) is a few hundred, and constant factors matter but are manageable, and (4) you are prototyping or in a contest where implementation time is limited. Prefer a Fenwick or segment tree if you need O((\log n)) worst-case and operations are standard (sum, min/max). Consider Mo’s algorithm if queries are offline and there are no (or few) updates—Mo often achieves better constants when reordering queries is allowed. Use sqrt with lazy tags for range add + range sum, but for heavier combinations (range min with range add), a full segment tree with lazy propagation may be more appropriate.

⚠️Common Mistakes

Common pitfalls include: (1) Picking a poor block size. If B is too small, overhead from many blocks dominates; if too large, boundary scanning dominates. Use (B \approx \lfloor \sqrt{n} \rfloor) or tune with benchmarks. (2) Off-by-one errors at block boundaries—be precise with left and right end indices: left boundary end is (\min(r, (\lfloor l/B \rfloor + 1)B - 1)), and full blocks are those with (jB \ge l) and ((j+1)B - 1 \le r). (3) Forgetting to update the block summary after a point update; e.g., update the element but not the block sum or block minimum. (4) Overflow with sums when using 32-bit ints; prefer 64-bit (long long) for sums. (5) Mixing lazy tags and raw values incorrectly—when querying partial elements, remember to include the block’s lazy tag in their contributions. (6) For per-block sorted vectors, failing to erase/insert correctly on updates (use lower_bound/upper_bound and maintain sorted order). (7) Assuming theoretical O((\sqrt{n})) automatically wins—constants can be high; test against prefix sums (for static queries) or BIT/segment trees (for simpler operations). (8) Not considering cache behavior—contiguous arrays and tight loops over blocks are typically faster than pointer-heavy structures.

Key Formulas

Number of Blocks

Explanation: If you split n elements into blocks of size B, you need m blocks. The ceiling accounts for the possibly smaller last block.

Query Cost Trade-off

Explanation: A range query touches O full blocks and O(B) boundary elements. The total time balances these two terms.

Optimal Block Size

Explanation: Minimizing the trade-off gives B approximately the square root of n. This choice balances the number of blocks and the boundary size.

Calculus Derivation

Explanation: Taking the derivative and setting it to zero yields the minimizing B. This is a standard way to optimize continuous relaxations of discrete parameters.

Block Sum

Explanation: The summary for block j can be its total sum. Full-block contributions to a range sum query are then added in O(1) each.

Count Values \(\le x\) via Blocks

Explanation: For each fully covered block, use binary search in its sorted list to count elements \( x\). Handle the boundary elements individually.

Update Cost

Explanation: If the summary is simple (sum, min), updates are O(B). If the summary is sorted, maintaining order adds a \( B\) factor for binary search.

Total Work with q Queries and u Updates

Explanation: Overall cost depends on the number of queries q and updates u. You can tune B to your workload and the per-update cost function.

Complexity Analysis

Let n be the array length and B the block size. We use m = ⌈n/B⌉ blocks. For a typical range query [l, r], we visit at most two partial blocks with up to 2B elements in total, and we include each fully covered block in O(1), O(log B), or O(1) depending on the per-block summary. Thus, time per query is O(n/B + B). For point updates, only one block’s summary must be rebuilt or adjusted: O(B) for sums/minima or O(B log B) when maintaining a sorted block. Choosing B ≈ ⌊√n⌋ minimizes the trade-off term n/B + B, giving O(√n) time per operation. In practice, picking B as an integer near √n suffices; tuning by benchmarking can help if q and u are imbalanced. Space usage is O(n) for the base array plus O(m) or O(m·B) for summaries. For simple sums/minima, we store one value per block, so O(m) space beyond the array. For lazy range-add, we add an O(m) tag array. For per-block sorted vectors, we store a copy of each block’s elements, which totals O(n) additional space. Constants are small because memory is contiguous and cache-friendly. Compared to segment trees (O(log n) per op, O(n) space), sqrt decomposition has worse asymptotics but often better simplicity and acceptable constants for n up to around 2e5–5e5. When q is large and u small, variants like Mo’s algorithm (offline, no updates) can reduce the query cost further by reordering. When both queries and updates are heavy and time limits are tight, a segment tree may be preferable.

Code Examples

Range Sum Query with Point Update (Classic Sqrt Decomposition)
1#include <bits/stdc++.h>
2using namespace std;
3
4struct SqrtDecompSum {
5 int n; // array size
6 int B; // block size
7 int m; // number of blocks
8 vector<long long> a; // original array
9 vector<long long> bsum;// sum per block
10
11 SqrtDecompSum(int n=0): n(n), B(0), m(0) {}
12
13 void build(const vector<long long>& arr) {
14 n = (int)arr.size();
15 a = arr;
16 B = max(1, (int)sqrt((long double)n)); // choose B ≈ sqrt(n)
17 m = (n + B - 1) / B;
18 bsum.assign(m, 0);
19 for (int i = 0; i < n; ++i) bsum[i / B] += a[i];
20 }
21
22 // Set a[pos] = val
23 void point_update(int pos, long long val) {
24 int bid = pos / B;
25 bsum[bid] += (val - a[pos]);
26 a[pos] = val;
27 }
28
29 // Query sum over [l, r] inclusive
30 long long range_sum(int l, int r) const {
31 if (l > r) return 0;
32 long long ans = 0;
33 int bl = l / B, br = r / B;
34 if (bl == br) {
35 for (int i = l; i <= r; ++i) ans += a[i];
36 return ans;
37 }
38 int lend = (bl + 1) * B - 1; // end of left partial block
39 for (int i = l; i <= lend; ++i) ans += a[i];
40 for (int b = bl + 1; b <= br - 1; ++b) ans += bsum[b];
41 for (int i = br * B; i <= r; ++i) ans += a[i];
42 return ans;
43 }
44};
45
46int main() {
47 ios::sync_with_stdio(false);
48 cin.tie(nullptr);
49
50 int n, q; // number of elements and queries
51 cin >> n >> q;
52 vector<long long> arr(n);
53 for (int i = 0; i < n; ++i) cin >> arr[i];
54
55 SqrtDecompSum ds; ds.build(arr);
56
57 // Supported ops:
58 // 1 l r -> print sum on [l, r] (0-based inclusive)
59 // 2 i x -> set a[i] = x
60 while (q--) {
61 int type; cin >> type;
62 if (type == 1) {
63 int l, r; cin >> l >> r;
64 cout << ds.range_sum(l, r) << '\n';
65 } else if (type == 2) {
66 int i; long long x; cin >> i >> x;
67 ds.point_update(i, x);
68 }
69 }
70 return 0;
71}
72

We split the array into blocks of size B ≈ √n and store the sum per block. A range sum query adds boundary elements explicitly and uses precomputed sums for full blocks. A point update adjusts the affected block’s sum. This is the canonical pattern for sqrt decomposition.

Time: O(\(\sqrt{n}\)) per query, O(1) per point update adjustment to the block summary (but choosing a new value still touches O(1) blocks).Space: O(n) for the array plus O(n/\(\sqrt{n}\)) = O(\(\sqrt{n}\)) for block sums.
Range Add and Range Sum with Block Lazy Tags
1#include <bits/stdc++.h>
2using namespace std;
3
4// Supports two operations:
5// - add x to all a[i] in [l, r]
6// - query sum of a[i] over [l, r]
7// Implementation: sqrt decomposition with per-block lazy add tag.
8struct SqrtRangeAddSum {
9 int n, B, m;
10 vector<long long> a; // base values stored explicitly
11 vector<long long> bsum; // sum of block with current lazy implicitly applied
12 vector<long long> blazy; // additive lazy tag per block
13
14 void build(const vector<long long>& arr) {
15 n = (int)arr.size();
16 a = arr;
17 B = max(1, (int)sqrt((long double)n));
18 m = (n + B - 1) / B;
19 bsum.assign(m, 0);
20 blazy.assign(m, 0);
21 for (int i = 0; i < n; ++i) bsum[i / B] += a[i];
22 }
23
24 // Apply add x to [l, r]
25 void range_add(int l, int r, long long x) {
26 int bl = l / B, br = r / B;
27 if (bl == br) {
28 // Same block: update elements and block sum directly
29 for (int i = l; i <= r; ++i) {
30 a[i] += x;
31 bsum[bl] += x;
32 }
33 return;
34 }
35 int lend = (bl + 1) * B - 1;
36 for (int i = l; i <= lend; ++i) {
37 a[i] += x;
38 bsum[bl] += x;
39 }
40 for (int b = bl + 1; b <= br - 1; ++b) {
41 blazy[b] += x; // mark full block
42 bsum[b] += x * (long long)B; // sum increases by x per element
43 }
44 for (int i = br * B; i <= r; ++i) {
45 a[i] += x;
46 bsum[br] += x;
47 }
48 }
49
50 // Query sum on [l, r]
51 long long range_sum(int l, int r) const {
52 long long ans = 0;
53 int bl = l / B, br = r / B;
54 if (bl == br) {
55 for (int i = l; i <= r; ++i) ans += a[i] + blazy[bl];
56 return ans;
57 }
58 int lend = (bl + 1) * B - 1;
59 for (int i = l; i <= lend; ++i) ans += a[i] + blazy[bl];
60 for (int b = bl + 1; b <= br - 1; ++b) ans += bsum[b]; // already includes lazy effect
61 for (int i = br * B; i <= r; ++i) ans += a[i] + blazy[br];
62 return ans;
63 }
64};
65
66int main() {
67 ios::sync_with_stdio(false);
68 cin.tie(nullptr);
69
70 int n, q; cin >> n >> q;
71 vector<long long> a(n);
72 for (int i = 0; i < n; ++i) cin >> a[i];
73
74 SqrtRangeAddSum ds; ds.build(a);
75
76 // Ops:
77 // 1 l r x -> add x to [l, r]
78 // 2 l r -> sum on [l, r]
79 while (q--) {
80 int t; cin >> t;
81 if (t == 1) {
82 int l, r; long long x; cin >> l >> r >> x;
83 ds.range_add(l, r, x);
84 } else if (t == 2) {
85 int l, r; cin >> l >> r;
86 cout << ds.range_sum(l, r) << '\n';
87 }
88 }
89 return 0;
90}
91

We maintain a lazy add tag per block. Full blocks in a range add are updated by increasing the block’s lazy tag and its precomputed sum. Queries over partial blocks add the lazy tag to explicit elements, while full blocks use the stored sum. This demonstrates block-level lazy propagation.

Time: O(\(\sqrt{n}\)) per range add and per range sum.Space: O(n) for the base array plus O(\(\sqrt{n}\)) for block sums and lazy tags.
Count of Elements <= x on [l, r] with Point Updates (Per-Block Sorted Vectors)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Supports:
5// - count of elements <= x on [l, r]
6// - point update a[pos] = val
7// Maintains a sorted vector per block for binary searching full blocks.
8struct SqrtCountLEQ {
9 int n, B, m;
10 vector<int> a;
11 vector<vector<int>> bsorted; // sorted copy per block
12
13 void build(const vector<int>& arr) {
14 n = (int)arr.size();
15 a = arr;
16 B = max(1, (int)sqrt((long double)n));
17 m = (n + B - 1) / B;
18 bsorted.assign(m, {});
19 for (int i = 0; i < n; ++i) bsorted[i / B].push_back(a[i]);
20 for (int b = 0; b < m; ++b) sort(bsorted[b].begin(), bsorted[b].end());
21 }
22
23 // Count of values <= x in [l, r]
24 int count_leq(int l, int r, int x) const {
25 int bl = l / B, br = r / B;
26 int ans = 0;
27 if (bl == br) {
28 for (int i = l; i <= r; ++i) ans += (a[i] <= x);
29 return ans;
30 }
31 int lend = (bl + 1) * B - 1;
32 for (int i = l; i <= lend; ++i) ans += (a[i] <= x);
33 for (int b = bl + 1; b <= br - 1; ++b) {
34 // count <= x via upper_bound in sorted block
35 ans += upper_bound(bsorted[b].begin(), bsorted[b].end(), x) - bsorted[b].begin();
36 }
37 for (int i = br * B; i <= r; ++i) ans += (a[i] <= x);
38 return ans;
39 }
40
41 // Set a[pos] = val
42 void point_update(int pos, int val) {
43 int b = pos / B;
44 // remove old a[pos]
45 auto &vec = bsorted[b];
46 auto it = lower_bound(vec.begin(), vec.end(), a[pos]);
47 // it should exist because value is present
48 if (it != vec.end() && *it == a[pos]) vec.erase(it);
49 // insert new value keeping vec sorted
50 auto it2 = upper_bound(vec.begin(), vec.end(), val);
51 vec.insert(it2, val);
52 a[pos] = val;
53 }
54};
55
56int main() {
57 ios::sync_with_stdio(false);
58 cin.tie(nullptr);
59
60 int n, q; cin >> n >> q;
61 vector<int> a(n);
62 for (int i = 0; i < n; ++i) cin >> a[i];
63 SqrtCountLEQ ds; ds.build(a);
64
65 // Ops:
66 // 1 l r x -> count of elements <= x on [l, r]
67 // 2 i v -> set a[i] = v
68 while (q--) {
69 int t; cin >> t;
70 if (t == 1) {
71 int l, r, x; cin >> l >> r >> x;
72 cout << ds.count_leq(l, r, x) << '\n';
73 } else {
74 int i, v; cin >> i >> v;
75 ds.point_update(i, v);
76 }
77 }
78 return 0;
79}
80

Each block stores a sorted vector of its elements. For fully covered blocks, we count \(\le x\) using upper_bound; for partial blocks, we check elements directly. Point updates remove the old value and insert the new value into the block’s sorted vector, keeping it sorted.

Time: O(\(\sqrt{n}\log n\)) per query (binary search inside ~√n blocks) and O(\(\sqrt{n}\)) amortized per point update (binary searches plus vector erase/insert within a block of size ~√n).Space: O(n) additional space for storing sorted copies per block.
Range Minimum Query with Point Update (Block Minima)
1#include <bits/stdc++.h>
2using namespace std;
3
4struct SqrtRangeMin {
5 int n, B, m;
6 vector<long long> a;
7 vector<long long> bmin; // minimum per block
8
9 void build(const vector<long long>& arr) {
10 n = (int)arr.size();
11 a = arr;
12 B = max(1, (int)sqrt((long double)n));
13 m = (n + B - 1) / B;
14 bmin.assign(m, LLONG_MAX);
15 for (int i = 0; i < n; ++i) bmin[i / B] = min(bmin[i / B], a[i]);
16 }
17
18 long long range_min(int l, int r) const {
19 long long ans = LLONG_MAX;
20 int bl = l / B, br = r / B;
21 if (bl == br) {
22 for (int i = l; i <= r; ++i) ans = min(ans, a[i]);
23 return ans;
24 }
25 int lend = (bl + 1) * B - 1;
26 for (int i = l; i <= lend; ++i) ans = min(ans, a[i]);
27 for (int b = bl + 1; b <= br - 1; ++b) ans = min(ans, bmin[b]);
28 for (int i = br * B; i <= r; ++i) ans = min(ans, a[i]);
29 return ans;
30 }
31
32 void point_update(int pos, long long val) {
33 int b = pos / B;
34 a[pos] = val;
35 // Recompute block minimum (O(B))
36 long long mn = LLONG_MAX;
37 int l = b * B;
38 int r = min(n - 1, (b + 1) * B - 1);
39 for (int i = l; i <= r; ++i) mn = min(mn, a[i]);
40 bmin[b] = mn;
41 }
42};
43
44int main() {
45 ios::sync_with_stdio(false);
46 cin.tie(nullptr);
47
48 int n, q; cin >> n >> q;
49 vector<long long> a(n);
50 for (int i = 0; i < n; ++i) cin >> a[i];
51
52 SqrtRangeMin ds; ds.build(a);
53
54 // Ops:
55 // 1 l r -> min on [l, r]
56 // 2 i x -> set a[i] = x
57 while (q--) {
58 int t; cin >> t;
59 if (t == 1) {
60 int l, r; cin >> l >> r;
61 cout << ds.range_min(l, r) << '\n';
62 } else {
63 int i; long long x; cin >> i >> x;
64 ds.point_update(i, x);
65 }
66 }
67 return 0;
68}
69

We maintain the minimum per block. Queries scan two partial blocks and use block minima for full blocks. Point updates rebuild only the affected block’s minimum in O(\(\sqrt{n}\)).

Time: O(\(\sqrt{n}\)) per query and O(\(\sqrt{n}\)) per point update.Space: O(n) for the array and O(\(\sqrt{n}\)) for block minima.