🗂️Data StructureAdvanced

Sqrt Tree

Key Points

•
A sqrt tree is a layered block-decomposition data structure that answers range queries in O(1) time after O(n $lo g$ $lo g$ n) preprocessing.
•
It works for any associative operation (sum, min, gcd, xor, set-union, etc.), not just idempotent ones.
•
The idea: repeatedly group the array into $n$ blocks (then $n$ groups of blocks, and so on) and precompute prefix/suffix values within each block plus answers for full blocks between endpoints.
•
A query picks the smallest layer where the endpoints lie in the same superblock but in different blocks, then combines three pieces: suffix of the left block, fully covered blocks in the middle, and prefix of the right block.
•
Space and preprocessing grow across only $lo g$ $lo g$ n layers, giving total O(n $lo g$ $lo g$ n) memory and time to build.
•
Point updates are possible by rebuilding the affected blocks up the layers in about O( $n$ $lo g$ $lo g$ n), but the structure is best for static arrays.
•
Compared to a sparse table, a sqrt tree supports non-idempotent operations with much less memory than a disjoint sparse table.
•
In competitive programming, sqrt trees are a practical O(1) alternative to segment trees (O( $lo g$ n)) when queries dominate.

Prerequisites

→Associativity and basic algebraic structures (monoids) — Sqrt trees rely on associativity to split ranges and recombine results correctly.
→Prefix/suffix aggregates — Understanding how within-block contributions are computed and reused is essential.
→Square root decomposition — Sqrt trees layer multiple sqrt decompositions; knowing the single-layer idea helps.
→Sparse/disjoint sparse tables (contrast) — Helpful to compare idempotent vs non-idempotent support and space trade-offs.
→Big-O notation and logarithms — Needed to reason about O(n log log n) preprocessing and O(1) query time.
→Segment tree / Fenwick tree — Provides a baseline for dynamic structures and trade-offs with sqrt trees.

Detailed Explanation

Tap terms for definitions

01Overview

A sqrt tree is a fast, layered data structure for static range queries on arrays. Unlike a single-level sqrt decomposition, it stacks several levels of blockings: first into about \sqrt{n} blocks, then groups those blocks into \sqrt{\sqrt{n}} superblocks, and so on, for only about \log \log n levels. At each level, it precomputes two kinds of information: (1) prefix and suffix results inside each block to answer partial-block contributions quickly, and (2) answers for ranges that span only full blocks, which are handled by the next level (recursively) on the array of block-aggregates. A range query chooses the smallest level where both endpoints fall into the same superblock but different blocks. The answer is then the associative combination of three parts: the suffix from the left endpoint to the end of its block, the result over the fully covered middle blocks (from the next-level structure), and the prefix from the start of the right block to the right endpoint. Because the number of levels is only \log \log n, the constant-time combination at one level and a single recursive call make the query effectively O(1). Preprocessing touches each element at each level, resulting in O(n \log \log n) build time and space. This design works for any associative operation, not only idempotent ones, making it more general than the classic sparse table while using less space than a full disjoint sparse table.

02Intuition & Analogies

Imagine organizing a library shelf. First, you split the books into medium boxes (about \sqrt{n} of them). Then you put those boxes into larger crates (about \sqrt{\sqrt{n}} of them). Then crates into pallets, and so on. If someone asks for all books from book L to book R, you can answer quickly: take the tail of the box containing L (suffix), take all fully covered boxes in the middle (whole boxes are easy), and take the head of the box containing R (prefix). The only catch is: what if L and R are so close that they sit in the same box? Then you go down one level where the “box” is smaller — because you built a hierarchy of smaller and smaller boxes, you will eventually hit a level where L and R are in different boxes but still inside the same larger container. Then the same three-piece trick works perfectly. In a sqrt tree, we precompute these heads and tails (prefixes and suffixes) for every box at every level. We also precompute results for runs of whole boxes by building the next level over an array of “box summaries” (what is the result of combining everything in a box). A query uses the smallest level where L and R are in the same superbox but different boxes. With just one recursive hop to the next level for the middle full boxes, we get the final answer by glueing three pieces together. Because the number of levels is tiny (roughly log log n), this feels like constant time in practice.

03Formal Definition

Let A be an array of length n and let

op

be a binary associative operation on the range values (e.g., sum, min, gcd). Construct levels

ℓ

= 0, 1,

\dots

, L-1 such that

m_{0} = n

b_{ℓ}

⌈

m_{ℓ}

⌉

, and

m_{ℓ + 1}

⌈

m_{ℓ}

b_{ℓ}

⌉

until

m_{L} = 1

. Define

S_{0} = 1

and

S_{ℓ + 1} = S_{ℓ}

\cdot

b_{ℓ}

. At level

ℓ

, partition the original array into blocks of size

S_{ℓ}

(the last block may be shorter). Precompute, for every index i:

p re f i x_{ℓ}

[i] = A[block\_start(i,

ℓ

)]

op

\dots

op

A[i] and

s u ff i x_{ℓ}

[i] = A[i]

op

\dots

op

A[block\_end(i,

ℓ

)]. Also form an array

B_{ℓ}

of length

m_{ℓ}

where

B_{ℓ}

[j] is the

op

-aggregate of the j-th block at level

ℓ

. Recursively build the same structure on

B_{ℓ}

. For a query [L, R], choose the smallest

ℓ

such that

⌊ \frac{L}{S _{ℓ + 1}} ⌋

⌊ \frac{R}{S _{ℓ + 1}} ⌋

(same superblock) but

⌊ \frac{L}{S _{ℓ}} ⌋

\neq =

⌊ \frac{R}{S _{ℓ}} ⌋

(different blocks). Let p =

⌊ \frac{L}{S _{ℓ}} ⌋

and q =

⌊ \frac{R}{S _{ℓ}} ⌋

. Then the answer is

s u ff i x_{ℓ}

[L]

op

( Query_{

B_{ℓ}

}(p+1, q-1) )

op

p re f i x_{ℓ}

[R], where Query_{

B_{ℓ}

} is the same procedure on

B_{ℓ}

, skipped if p+1>q-1. Associativity guarantees correctness.

04When to Use

Use a sqrt tree when you need extremely fast (effectively O(1)) range queries over a static array and the operation is associative. Examples include: sum queries for large-scale analytics where memory is at a premium compared to a disjoint sparse table; range minimum or maximum queries when you prefer less memory than a full disjoint sparse table but want broader support than classic sparse table; range gcd/xor/bitwise-and/or queries; combining small custom monoids in competitive programming (e.g., a struct with count and sum combined by addition). It shines when there are many more queries than updates, or when the data is immutable. If point updates are rare but needed, you can rebuild the affected block chains in O(\sqrt{n} \log \log n) each, which is acceptable in some scenarios. If your operation is idempotent (like min/max/gcd) and memory is not tight, a classic sparse table might be simpler. If you need frequent updates, prefer a Fenwick or segment tree for O(\log n) updates and queries.

⚠️Common Mistakes

Forgetting associativity: the sqrt tree relies on associativity to splice suffix, middle, and prefix pieces. If your operation is not associative, the structure can return incorrect answers. Ensure op(op(a,b),c) = op(a,op(b,c)).
Mishandling same-block queries with only one level: a single-level sqrt decomposition cannot answer same-block ranges in O(1). The multi-layer design is essential; always pick the smallest layer where endpoints are in the same superblock but different blocks.
Off-by-one in block boundaries: when computing prefixes/suffixes for blocks of size S_{\ell}, carefully cap the right end at n-1. Incorrect boundaries cause overlaps or gaps.
Assuming an identity element is required: the standard query composition never needs a neutral element; you simply skip the middle part if there are no full blocks between.
Overbuilding or underbuilding layers: stop when the next-domain size m_{\ell} becomes 1. Building beyond that wastes space; stopping early breaks correctness for long ranges.
Confusing sparse table with sqrt tree: a sparse table for min/gcd is fine but does not handle non-idempotent ops like sum without switching to a disjoint sparse table (O(n \log n) space). The sqrt tree targets O(n \log \log n) with general associative ops.

Key Formulas

Layer progression

b_{ℓ} = ⌈ m_{ℓ} ⌉, m_{ℓ + 1} = ⌈ \frac{m _{ℓ}}{b _{ℓ}} ⌉

Explanation: Each layer groups the current domain into blocks of size roughly the square root, reducing the domain size rapidly. After O(log log n) layers, the domain collapses to size 1.

Original-scale block size

S_{0} = 1, S_{ℓ + 1} = S_{ℓ} \cdot b_{ℓ}

Explanation: S_ $ℓ$ is the block size in original elements at layer $ℓ$ . It is the cumulative product of block factors down to that layer.

Three-piece composition

Query (L, R) = suf_{ℓ} [L] op (Between_{ℓ} (p + 1, q - 1)) op pre_{ℓ} [R]

Explanation: At the chosen layer $ℓ$ , p and q are the block indices of L and R. Combine the suffix from L to the end of its block, the result over full blocks in the middle (possibly empty), and the prefix from the start of the right block to R.

Number of layers

L = O (lo g lo g n)

Explanation: Because each step takes a square root, the number of iterations needed to reduce n to 1 is logarithmic in the logarithm of n.

Complexity summary

Build time = O (n lo g lo g n), Queries = O (1)

Explanation: Each layer processes O(n) work for prefix/suffix and block aggregates, and there are O(log log n) layers. A query touches a constant number of layers and does O(1) work per layer.

Space complexity

Space = O (n lo g lo g n)

Explanation: We store prefix and suffix per element per layer, and small recursive structures on block arrays across layers whose total size is proportional to n times the number of layers.

Complexity Analysis

Let n be the array length and L the number of layers, where

L = O (l o g

log n) because each layer reduces the domain size by a factor of about sqrt. At layer ℓ, we compute prefix/suffix for every element with respect to blocks of size S_ℓ. This is linear in n per layer, so across all layers the total is O(nL) = O(n log log n). For the arrays of block aggregates B_ℓ (one value per block at layer ℓ), the total size over all layers is also bounded by O(n log log n). Intuitively, at layer ℓ there are m_ℓ blocks, but m_ℓ decreases super-fast (n, ~√n, ~

n^{1/4}

, …). Each B_ℓ recursively builds a similar structure on m_ℓ elements, and the telescoping sum over these diminishing sizes remains within the same O(n log log n) envelope. A query picks the smallest layer where both endpoints are in the same superblock but in different blocks. The answer is formed as the associative combination of at most three components: a suffix, an optional middle segment returned by the child structure on B_ℓ, and a prefix. Because only one layer is used for the split and the recursion descends once into the next-level structure, the total work per query is constant with a very small constant factor; formally, it is O(1) with the caveat that the constant reflects at most O(log log n) very small fixed limits in practice. Space usage is dominated by the stored prefix/suffix arrays for all layers and the recursive structures on B_ℓ, again O(n log log n). Point updates, if implemented, require recomputing the affected block at each layer and the corresponding block aggregates upward, which touches about O(√n) positions at the lowest relevant level and propagates through L layers, yielding roughly O(√n log log n) per update. This is why the structure is best for static scenarios.

Code Examples

Generic Sqrt Tree (associative op) with O(1) range queries (static)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Generic associative operation wrapper must provide: T operator()(const T&, const T&) const
5 // Example ops are provided below.
6 
7 template<typename T, class Op>
8 struct SqrtTree {
9     int n;                 // original length
10     Op op;                 // associative operator
11     vector<T> a;           // original array
12 
13     // Layer bookkeeping
14     int L;                 // number of layers (excluding the final collapsed m[L]==1)
15     vector<int> b;         // block factor at layer ell: b[ell] = ceil(sqrt(m_ell))
16     vector<int> S;         // original-scale block size S_ell, with S[0]=1
17     vector<int> m;         // domain sizes m_ell, with m[0]=n and m[ell+1]=ceil(m_ell/b[ell])
18 
19     // Precomputed per-layer data (on the original array scale)
20     vector<vector<T>> pref;  // pref[ell][i]: op over [block_start(i,ell) .. i]
21     vector<vector<T>> suff;  // suff[ell][i]: op over [i .. block_end(i,ell)]
22 
23     // Child structures over block-aggregate arrays B_ell
24     vector<unique_ptr<SqrtTree<T, Op>>> child; // child[ell] built on B_ell if m[ell] > 1
25 
26     SqrtTree() : n(0), L(0) {}
27 
28     SqrtTree(const vector<T>& arr, Op oper) { init(arr, oper); }
29 
30     void init(const vector<T>& arr, Op oper) {
31         a = arr; op = oper; n = (int)a.size();
32         if (n == 0) { L = 0; b.clear(); S.clear(); m.clear(); pref.clear(); suff.clear(); child.clear(); return; }
33 
34         // Build m and b: repeated square roots until domain size becomes 1
35         m.clear(); b.clear();
36         m.push_back(n);
37         while (m.back() > 1) {
38             int cur = m.back();
39             int bl = (int)floor(sqrt((long double)cur));
40             if (1LL * bl * bl < cur) ++bl; // ceil sqrt
41             bl = max(bl, 1);
42             b.push_back(bl);
43             int nxt = (cur + bl - 1) / bl;
44             m.push_back(nxt);
45         }
46         L = (int)b.size();
47 
48         // Build original-scale block sizes S[0]=1; S[ell+1] = S[ell] * b[ell]
49         S.assign(L + 1, 1);
50         for (int ell = 1; ell <= L; ++ell) S[ell] = S[ell - 1] * b[ell - 1];
51 
52         // Precompute prefix/suffix at each layer (on original array indices)
53         pref.assign(L, vector<T>(n));
54         suff.assign(L, vector<T>(n));
55         for (int ell = 0; ell < L; ++ell) {
56             int Bsz = S[ell];                           // block size on original scale
57             int blocks = (n + Bsz - 1) / Bsz;           // equals m[ell]
58             for (int blk = 0; blk < blocks; ++blk) {
59                 int Lx = blk * Bsz;
60                 int Rx = min(n, (blk + 1) * Bsz) - 1;
61                 // prefix within block
62                 T acc = a[Lx];
63                 pref[ell][Lx] = acc;
64                 for (int i = Lx + 1; i <= Rx; ++i) {
65                     acc = op(acc, a[i]);
66                     pref[ell][i] = acc;
67                 }
68                 // suffix within block
69                 acc = a[Rx];
70                 suff[ell][Rx] = acc;
71                 for (int i = Rx - 1; i >= Lx; --i) {
72                     acc = op(a[i], acc);
73                     suff[ell][i] = acc;
74                 }
75             }
76         }
77 
78         // Build child structures on block aggregates B_ell
79         child.resize(L);
80         for (int ell = 0; ell < L; ++ell) {
81             int Bsz = S[ell];
82             int blocks = (n + Bsz - 1) / Bsz; // m[ell]
83             if (blocks <= 1) { child[ell].reset(); continue; }
84             vector<T> B(blocks);
85             for (int j = 0; j < blocks; ++j) {
86                 int Lx = j * Bsz;
87                 // Aggregate of the whole block is precisely the suffix at its first position
88                 B[j] = suff[ell][Lx];
89             }
90             child[ell] = make_unique<SqrtTree<T, Op>>(B, op);
91         }
92     }
93 
94     // Query on original indices [l, r], inclusive. Requires 0 <= l <= r < n.
95     T query(int l, int r) const {
96         if (l == r) return a[l];
97         // Find the smallest layer ell such that l and r are in the same superblock at layer ell
98         // i.e., floor(l / S[ell+1]) == floor(r / S[ell+1]). By minimality, they are in different blocks at S[ell].
99         int ell = 0;
100         while (ell + 1 <= L && (l / S[ell + 1]) != (r / S[ell + 1])) ++ell;
101         // Now split into three parts: suffix of left block, middle full blocks via child, prefix of right block
102         int bl = l / S[ell];
103         int br = r / S[ell];
104         T res = suff[ell][l];
105         if (bl + 1 <= br - 1) {
106             // middle full blocks exist; child is defined because number of blocks m[ell] >= 3
107             T mid = child[ell]->query(bl + 1, br - 1);
108             res = op(res, mid);
109         }
110         res = op(res, pref[ell][r]);
111         return res;
112     }
113 };
114 
115 // Example operator: sum
116 struct SumOp {
117     long long operator()(long long a, long long b) const { return a + b; }
118 };
119 
120 // Example operator: minimum
121 struct MinOp {
122     long long operator()(long long a, long long b) const { return std::min(a, b); }
123 };
124 
125 int main() {
126     ios::sync_with_stdio(false);
127     cin.tie(nullptr);
128 
129     // Demo: range sum and range min on the same array
130     int n = 10;
131     vector<long long> a = {5, -2, 7, 3, 9, 1, 4, -5, 6, 2};
132 
133     SqrtTree<long long, SumOp> st_sum(a, SumOp{});
134     SqrtTree<long long, MinOp> st_min(a, MinOp{});
135 
136     // Sample queries
137     vector<pair<int,int>> qs = {{0,0},{0,3},{2,7},{4,9},{1,8}};
138     cout << "Range Sum:\n";
139     for (auto [l,r] : qs) cout << l << "," << r << ": " << st_sum.query(l,r) << "\n";
140 
141     cout << "\nRange Min:\n";
142     for (auto [l,r] : qs) cout << l << "," << r << ": " << st_min.query(l,r) << "\n";
143 
144     return 0;
145 }
146

This is a complete, generic sqrt tree for any associative operation. It builds O(log log n) layers. Each layer ℓ stores prefix/suffix aggregates for blocks of size S_ℓ on the original array and creates a child sqrt tree on the array of block aggregates B_ℓ. A query picks the smallest ℓ where endpoints share a superblock (block at layer ℓ+1) but lie in different blocks at layer ℓ, then combines suffix(left), middle full blocks via child, and prefix(right). The example demonstrates both sum and min queries.

Time: Preprocessing O(n log log n); each query O(1) (with a very small constant; more precisely O(log log n) but treated as constant in practice).Space: O(n log log n) total space for all layers and child structures.

Micro-usage: reading input, building sqrt tree for sum, answering Q queries

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 template<typename T, class Op>
5 struct SqrtTree {
6     int n, L; vector<int> b, S, m; vector<T> a; Op op; vector<vector<T>> pref, suff; vector<unique_ptr<SqrtTree<T, Op>>> child;
7     SqrtTree() : n(0), L(0) {}
8     SqrtTree(const vector<T>& arr, Op oper){ init(arr, oper); }
9     void init(const vector<T>& arr, Op oper){ a=arr; op=oper; n=(int)a.size(); if(!n){L=0;return;} m.clear(); b.clear(); m.push_back(n); while(m.back()>1){ int cur=m.back(); int bl=(int)floor(sqrt((long double)cur)); if(1LL*bl*bl<cur) ++bl; bl=max(bl,1); b.push_back(bl); m.push_back((cur+bl-1)/bl);} L=(int)b.size(); S.assign(L+1,1); for(int i=1;i<=L;i++) S[i]=S[i-1]*b[i-1]; pref.assign(L, vector<T>(n)); suff.assign(L, vector<T>(n)); for(int ell=0;ell<L;++ell){ int Bsz=S[ell]; int blocks=(n+Bsz-1)/Bsz; for(int blk=0; blk<blocks; ++blk){ int Lx=blk*Bsz; int Rx=min(n,(blk+1)*Bsz)-1; T acc=a[Lx]; pref[ell][Lx]=acc; for(int i=Lx+1;i<=Rx;++i){ acc=op(acc,a[i]); pref[ell][i]=acc;} acc=a[Rx]; suff[ell][Rx]=acc; for(int i=Rx-1;i>=Lx;--i){ acc=op(a[i],acc); suff[ell][i]=acc; } } } child.resize(L); for(int ell=0;ell<L;++ell){ int Bsz=S[ell]; int blocks=(n+Bsz-1)/Bsz; if(blocks<=1){ child[ell].reset(); continue; } vector<T> B(blocks); for(int j=0;j<blocks;++j){ int Lx=j*Bsz; B[j]=suff[ell][Lx]; } child[ell]=make_unique<SqrtTree<T,Op>>(B, op);} }
10     T query(int l, int r) const { if(l==r) return a[l]; int ell=0; while(ell+1<=L && (l/S[ell+1])!=(r/S[ell+1])) ++ell; int bl=l/S[ell]; int br=r/S[ell]; T res=suff[ell][l]; if(bl+1<=br-1){ T mid=child[ell]->query(bl+1, br-1); res=op(res, mid);} return op(res, pref[ell][r]); }
11 };
12 
13 struct SumOp{ long long operator()(long long a, long long b) const { return a+b; } };
14 
15 int main(){
16     ios::sync_with_stdio(false);
17     cin.tie(nullptr);
18 
19     int n,q; if(!(cin>>n>>q)) return 0; vector<long long> a(n); for(int i=0;i<n;++i) cin>>a[i];
20     SqrtTree<long long, SumOp> st(a, SumOp{});
21     while(q--){ int l,r; cin>>l>>r; // 0-indexed inclusive
22         cout<<st.query(l,r) << '\n';
23     }
24     return 0;
25 }
26

Minimal driver for competitive programming: read an array, build a sqrt tree for sum, and answer Q range-sum queries in O(1) each. Replace SumOp with MinOp (or any associative op) to change the operation.

Time: Build O(n log log n); each query O(1).Space: O(n log log n).

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Generic associative operation wrapper must provide: T operator()(const T&, const T&) const
5	// Example ops are provided below.
6
7	template<typename T, class Op>
8	struct SqrtTree {
9	int n; // original length
10	Op op; // associative operator
11	vector<T> a; // original array
12
13	// Layer bookkeeping
14	int L; // number of layers (excluding the final collapsed m[L]==1)
15	vector<int> b; // block factor at layer ell: b[ell] = ceil(sqrt(m_ell))
16	vector<int> S; // original-scale block size S_ell, with S[0]=1
17	vector<int> m; // domain sizes m_ell, with m[0]=n and m[ell+1]=ceil(m_ell/b[ell])
18
19	// Precomputed per-layer data (on the original array scale)
20	vector<vector<T>> pref; // pref[ell][i]: op over [block_start(i,ell) .. i]
21	vector<vector<T>> suff; // suff[ell][i]: op over [i .. block_end(i,ell)]
22
23	// Child structures over block-aggregate arrays B_ell
24	vector<unique_ptr<SqrtTree<T, Op>>> child; // child[ell] built on B_ell if m[ell] > 1
25
26	SqrtTree() : n(0), L(0) {}
27
28	SqrtTree(const vector<T>& arr, Op oper) { init(arr, oper); }
29
30	void init(const vector<T>& arr, Op oper) {
31	a = arr; op = oper; n = (int)a.size();
32	if (n == 0) { L = 0; b.clear(); S.clear(); m.clear(); pref.clear(); suff.clear(); child.clear(); return; }
33
34	// Build m and b: repeated square roots until domain size becomes 1
35	m.clear(); b.clear();
36	m.push_back(n);
37	while (m.back() > 1) {
38	int cur = m.back();
39	int bl = (int)floor(sqrt((long double)cur));
40	if (1LL * bl * bl < cur) ++bl; // ceil sqrt
41	bl = max(bl, 1);
42	b.push_back(bl);
43	int nxt = (cur + bl - 1) / bl;
44	m.push_back(nxt);
45	}
46	L = (int)b.size();
47
48	// Build original-scale block sizes S[0]=1; S[ell+1] = S[ell] * b[ell]
49	S.assign(L + 1, 1);
50	for (int ell = 1; ell <= L; ++ell) S[ell] = S[ell - 1] * b[ell - 1];
51
52	// Precompute prefix/suffix at each layer (on original array indices)
53	pref.assign(L, vector<T>(n));
54	suff.assign(L, vector<T>(n));
55	for (int ell = 0; ell < L; ++ell) {
56	int Bsz = S[ell]; // block size on original scale
57	int blocks = (n + Bsz - 1) / Bsz; // equals m[ell]
58	for (int blk = 0; blk < blocks; ++blk) {
59	int Lx = blk * Bsz;
60	int Rx = min(n, (blk + 1) * Bsz) - 1;
61	// prefix within block
62	T acc = a[Lx];
63	pref[ell][Lx] = acc;
64	for (int i = Lx + 1; i <= Rx; ++i) {
65	acc = op(acc, a[i]);
66	pref[ell][i] = acc;
67	}
68	// suffix within block
69	acc = a[Rx];
70	suff[ell][Rx] = acc;
71	for (int i = Rx - 1; i >= Lx; --i) {
72	acc = op(a[i], acc);
73	suff[ell][i] = acc;
74	}
75	}
76	}
77
78	// Build child structures on block aggregates B_ell
79	child.resize(L);
80	for (int ell = 0; ell < L; ++ell) {
81	int Bsz = S[ell];
82	int blocks = (n + Bsz - 1) / Bsz; // m[ell]
83	if (blocks <= 1) { child[ell].reset(); continue; }
84	vector<T> B(blocks);
85	for (int j = 0; j < blocks; ++j) {
86	int Lx = j * Bsz;
87	// Aggregate of the whole block is precisely the suffix at its first position
88	B[j] = suff[ell][Lx];
89	}
90	child[ell] = make_unique<SqrtTree<T, Op>>(B, op);
91	}
92	}
93
94	// Query on original indices [l, r], inclusive. Requires 0 <= l <= r < n.
95	T query(int l, int r) const {
96	if (l == r) return a[l];
97	// Find the smallest layer ell such that l and r are in the same superblock at layer ell
98	// i.e., floor(l / S[ell+1]) == floor(r / S[ell+1]). By minimality, they are in different blocks at S[ell].
99	int ell = 0;
100	while (ell + 1 <= L && (l / S[ell + 1]) != (r / S[ell + 1])) ++ell;
101	// Now split into three parts: suffix of left block, middle full blocks via child, prefix of right block
102	int bl = l / S[ell];
103	int br = r / S[ell];
104	T res = suff[ell][l];
105	if (bl + 1 <= br - 1) {
106	// middle full blocks exist; child is defined because number of blocks m[ell] >= 3
107	T mid = child[ell]->query(bl + 1, br - 1);
108	res = op(res, mid);
109	}
110	res = op(res, pref[ell][r]);
111	return res;
112	}
113	};
114
115	// Example operator: sum
116	struct SumOp {
117	long long operator()(long long a, long long b) const { return a + b; }
118	};
119
120	// Example operator: minimum
121	struct MinOp {
122	long long operator()(long long a, long long b) const { return std::min(a, b); }
123	};
124
125	int main() {
126	ios::sync_with_stdio(false);
127	cin.tie(nullptr);
128
129	// Demo: range sum and range min on the same array
130	int n = 10;
131	vector<long long> a = {5, -2, 7, 3, 9, 1, 4, -5, 6, 2};
132
133	SqrtTree<long long, SumOp> st_sum(a, SumOp{});
134	SqrtTree<long long, MinOp> st_min(a, MinOp{});
135
136	// Sample queries
137	vector<pair<int,int>> qs = {{0,0},{0,3},{2,7},{4,9},{1,8}};
138	cout << "Range Sum:\n";
139	for (auto [l,r] : qs) cout << l << "," << r << ": " << st_sum.query(l,r) << "\n";
140
141	cout << "\nRange Min:\n";
142	for (auto [l,r] : qs) cout << l << "," << r << ": " << st_min.query(l,r) << "\n";
143
144	return 0;
145	}
146