πŸ—‚οΈData StructureAdvanced

Sqrt Tree

Key Points

  • β€’
    A sqrt tree is a layered block-decomposition data structure that answers range queries in O(1) time after O(n n) preprocessing.
  • β€’
    It works for any associative operation (sum, min, gcd, xor, set-union, etc.), not just idempotent ones.
  • β€’
    The idea: repeatedly group the array into blocks (then groups of blocks, and so on) and precompute prefix/suffix values within each block plus answers for full blocks between endpoints.
  • β€’
    A query picks the smallest layer where the endpoints lie in the same superblock but in different blocks, then combines three pieces: suffix of the left block, fully covered blocks in the middle, and prefix of the right block.
  • β€’
    Space and preprocessing grow across only n layers, giving total O(n n) memory and time to build.
  • β€’
    Point updates are possible by rebuilding the affected blocks up the layers in about O( n), but the structure is best for static arrays.
  • β€’
    Compared to a sparse table, a sqrt tree supports non-idempotent operations with much less memory than a disjoint sparse table.
  • β€’
    In competitive programming, sqrt trees are a practical O(1) alternative to segment trees (O( n)) when queries dominate.

Prerequisites

  • β†’Associativity and basic algebraic structures (monoids) β€” Sqrt trees rely on associativity to split ranges and recombine results correctly.
  • β†’Prefix/suffix aggregates β€” Understanding how within-block contributions are computed and reused is essential.
  • β†’Square root decomposition β€” Sqrt trees layer multiple sqrt decompositions; knowing the single-layer idea helps.
  • β†’Sparse/disjoint sparse tables (contrast) β€” Helpful to compare idempotent vs non-idempotent support and space trade-offs.
  • β†’Big-O notation and logarithms β€” Needed to reason about O(n log log n) preprocessing and O(1) query time.
  • β†’Segment tree / Fenwick tree β€” Provides a baseline for dynamic structures and trade-offs with sqrt trees.

Detailed Explanation

Tap terms for definitions

01Overview

A sqrt tree is a fast, layered data structure for static range queries on arrays. Unlike a single-level sqrt decomposition, it stacks several levels of blockings: first into about \sqrt{n} blocks, then groups those blocks into \sqrt{\sqrt{n}} superblocks, and so on, for only about \log \log n levels. At each level, it precomputes two kinds of information: (1) prefix and suffix results inside each block to answer partial-block contributions quickly, and (2) answers for ranges that span only full blocks, which are handled by the next level (recursively) on the array of block-aggregates. A range query chooses the smallest level where both endpoints fall into the same superblock but different blocks. The answer is then the associative combination of three parts: the suffix from the left endpoint to the end of its block, the result over the fully covered middle blocks (from the next-level structure), and the prefix from the start of the right block to the right endpoint. Because the number of levels is only \log \log n, the constant-time combination at one level and a single recursive call make the query effectively O(1). Preprocessing touches each element at each level, resulting in O(n \log \log n) build time and space. This design works for any associative operation, not only idempotent ones, making it more general than the classic sparse table while using less space than a full disjoint sparse table.

02Intuition & Analogies

Imagine organizing a library shelf. First, you split the books into medium boxes (about \sqrt{n} of them). Then you put those boxes into larger crates (about \sqrt{\sqrt{n}} of them). Then crates into pallets, and so on. If someone asks for all books from book L to book R, you can answer quickly: take the tail of the box containing L (suffix), take all fully covered boxes in the middle (whole boxes are easy), and take the head of the box containing R (prefix). The only catch is: what if L and R are so close that they sit in the same box? Then you go down one level where the β€œbox” is smaller β€” because you built a hierarchy of smaller and smaller boxes, you will eventually hit a level where L and R are in different boxes but still inside the same larger container. Then the same three-piece trick works perfectly. In a sqrt tree, we precompute these heads and tails (prefixes and suffixes) for every box at every level. We also precompute results for runs of whole boxes by building the next level over an array of β€œbox summaries” (what is the result of combining everything in a box). A query uses the smallest level where L and R are in the same superbox but different boxes. With just one recursive hop to the next level for the middle full boxes, we get the final answer by glueing three pieces together. Because the number of levels is tiny (roughly log log n), this feels like constant time in practice.

03Formal Definition

Let A be an array of length n and let be a binary associative operation on the range values (e.g., sum, min, gcd). Construct levels = 0, 1, , L-1 such that , = , and = / until . Define and . At level , partition the original array into blocks of size (the last block may be shorter). Precompute, for every index i: [i] = A[block\_start(i, )] A[i] and [i] = A[i] A[block\_end(i, )]. Also form an array of length where [j] is the -aggregate of the j-th block at level . Recursively build the same structure on . For a query [L, R], choose the smallest such that = (same superblock) but (different blocks). Let p = and q = . Then the answer is [L] ( Query_{}(p+1, q-1) ) [R], where Query_{} is the same procedure on , skipped if p+1>q-1. Associativity guarantees correctness.

04When to Use

Use a sqrt tree when you need extremely fast (effectively O(1)) range queries over a static array and the operation is associative. Examples include: sum queries for large-scale analytics where memory is at a premium compared to a disjoint sparse table; range minimum or maximum queries when you prefer less memory than a full disjoint sparse table but want broader support than classic sparse table; range gcd/xor/bitwise-and/or queries; combining small custom monoids in competitive programming (e.g., a struct with count and sum combined by addition). It shines when there are many more queries than updates, or when the data is immutable. If point updates are rare but needed, you can rebuild the affected block chains in O(\sqrt{n} \log \log n) each, which is acceptable in some scenarios. If your operation is idempotent (like min/max/gcd) and memory is not tight, a classic sparse table might be simpler. If you need frequent updates, prefer a Fenwick or segment tree for O(\log n) updates and queries.

⚠️Common Mistakes

  • Forgetting associativity: the sqrt tree relies on associativity to splice suffix, middle, and prefix pieces. If your operation is not associative, the structure can return incorrect answers. Ensure op(op(a,b),c) = op(a,op(b,c)).
  • Mishandling same-block queries with only one level: a single-level sqrt decomposition cannot answer same-block ranges in O(1). The multi-layer design is essential; always pick the smallest layer where endpoints are in the same superblock but different blocks.
  • Off-by-one in block boundaries: when computing prefixes/suffixes for blocks of size S_{\ell}, carefully cap the right end at n-1. Incorrect boundaries cause overlaps or gaps.
  • Assuming an identity element is required: the standard query composition never needs a neutral element; you simply skip the middle part if there are no full blocks between.
  • Overbuilding or underbuilding layers: stop when the next-domain size m_{\ell} becomes 1. Building beyond that wastes space; stopping early breaks correctness for long ranges.
  • Confusing sparse table with sqrt tree: a sparse table for min/gcd is fine but does not handle non-idempotent ops like sum without switching to a disjoint sparse table (O(n \log n) space). The sqrt tree targets O(n \log \log n) with general associative ops.

Key Formulas

Layer progression

Explanation: Each layer groups the current domain into blocks of size roughly the square root, reducing the domain size rapidly. After O(log log n) layers, the domain collapses to size 1.

Original-scale block size

Explanation: S_ is the block size in original elements at layer . It is the cumulative product of block factors down to that layer.

Three-piece composition

Explanation: At the chosen layer , p and q are the block indices of L and R. Combine the suffix from L to the end of its block, the result over full blocks in the middle (possibly empty), and the prefix from the start of the right block to R.

Number of layers

Explanation: Because each step takes a square root, the number of iterations needed to reduce n to 1 is logarithmic in the logarithm of n.

Complexity summary

Explanation: Each layer processes O(n) work for prefix/suffix and block aggregates, and there are O(log log n) layers. A query touches a constant number of layers and does O(1) work per layer.

Space complexity

Explanation: We store prefix and suffix per element per layer, and small recursive structures on block arrays across layers whose total size is proportional to n times the number of layers.

Complexity Analysis

Let n be the array length and L the number of layers, where log n) because each layer reduces the domain size by a factor of about sqrt. At layer β„“, we compute prefix/suffix for every element with respect to blocks of size S_β„“. This is linear in n per layer, so across all layers the total is O(nL) = O(n log log n). For the arrays of block aggregates B_β„“ (one value per block at layer β„“), the total size over all layers is also bounded by O(n log log n). Intuitively, at layer β„“ there are m_β„“ blocks, but m_β„“ decreases super-fast (n, ~√n, ~, …). Each B_β„“ recursively builds a similar structure on m_β„“ elements, and the telescoping sum over these diminishing sizes remains within the same O(n log log n) envelope. A query picks the smallest layer where both endpoints are in the same superblock but in different blocks. The answer is formed as the associative combination of at most three components: a suffix, an optional middle segment returned by the child structure on B_β„“, and a prefix. Because only one layer is used for the split and the recursion descends once into the next-level structure, the total work per query is constant with a very small constant factor; formally, it is O(1) with the caveat that the constant reflects at most O(log log n) very small fixed limits in practice. Space usage is dominated by the stored prefix/suffix arrays for all layers and the recursive structures on B_β„“, again O(n log log n). Point updates, if implemented, require recomputing the affected block at each layer and the corresponding block aggregates upward, which touches about O(√n) positions at the lowest relevant level and propagates through L layers, yielding roughly O(√n log log n) per update. This is why the structure is best for static scenarios.

Code Examples

Generic Sqrt Tree (associative op) with O(1) range queries (static)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Generic associative operation wrapper must provide: T operator()(const T&, const T&) const
5// Example ops are provided below.
6
7template<typename T, class Op>
8struct SqrtTree {
9 int n; // original length
10 Op op; // associative operator
11 vector<T> a; // original array
12
13 // Layer bookkeeping
14 int L; // number of layers (excluding the final collapsed m[L]==1)
15 vector<int> b; // block factor at layer ell: b[ell] = ceil(sqrt(m_ell))
16 vector<int> S; // original-scale block size S_ell, with S[0]=1
17 vector<int> m; // domain sizes m_ell, with m[0]=n and m[ell+1]=ceil(m_ell/b[ell])
18
19 // Precomputed per-layer data (on the original array scale)
20 vector<vector<T>> pref; // pref[ell][i]: op over [block_start(i,ell) .. i]
21 vector<vector<T>> suff; // suff[ell][i]: op over [i .. block_end(i,ell)]
22
23 // Child structures over block-aggregate arrays B_ell
24 vector<unique_ptr<SqrtTree<T, Op>>> child; // child[ell] built on B_ell if m[ell] > 1
25
26 SqrtTree() : n(0), L(0) {}
27
28 SqrtTree(const vector<T>& arr, Op oper) { init(arr, oper); }
29
30 void init(const vector<T>& arr, Op oper) {
31 a = arr; op = oper; n = (int)a.size();
32 if (n == 0) { L = 0; b.clear(); S.clear(); m.clear(); pref.clear(); suff.clear(); child.clear(); return; }
33
34 // Build m and b: repeated square roots until domain size becomes 1
35 m.clear(); b.clear();
36 m.push_back(n);
37 while (m.back() > 1) {
38 int cur = m.back();
39 int bl = (int)floor(sqrt((long double)cur));
40 if (1LL * bl * bl < cur) ++bl; // ceil sqrt
41 bl = max(bl, 1);
42 b.push_back(bl);
43 int nxt = (cur + bl - 1) / bl;
44 m.push_back(nxt);
45 }
46 L = (int)b.size();
47
48 // Build original-scale block sizes S[0]=1; S[ell+1] = S[ell] * b[ell]
49 S.assign(L + 1, 1);
50 for (int ell = 1; ell <= L; ++ell) S[ell] = S[ell - 1] * b[ell - 1];
51
52 // Precompute prefix/suffix at each layer (on original array indices)
53 pref.assign(L, vector<T>(n));
54 suff.assign(L, vector<T>(n));
55 for (int ell = 0; ell < L; ++ell) {
56 int Bsz = S[ell]; // block size on original scale
57 int blocks = (n + Bsz - 1) / Bsz; // equals m[ell]
58 for (int blk = 0; blk < blocks; ++blk) {
59 int Lx = blk * Bsz;
60 int Rx = min(n, (blk + 1) * Bsz) - 1;
61 // prefix within block
62 T acc = a[Lx];
63 pref[ell][Lx] = acc;
64 for (int i = Lx + 1; i <= Rx; ++i) {
65 acc = op(acc, a[i]);
66 pref[ell][i] = acc;
67 }
68 // suffix within block
69 acc = a[Rx];
70 suff[ell][Rx] = acc;
71 for (int i = Rx - 1; i >= Lx; --i) {
72 acc = op(a[i], acc);
73 suff[ell][i] = acc;
74 }
75 }
76 }
77
78 // Build child structures on block aggregates B_ell
79 child.resize(L);
80 for (int ell = 0; ell < L; ++ell) {
81 int Bsz = S[ell];
82 int blocks = (n + Bsz - 1) / Bsz; // m[ell]
83 if (blocks <= 1) { child[ell].reset(); continue; }
84 vector<T> B(blocks);
85 for (int j = 0; j < blocks; ++j) {
86 int Lx = j * Bsz;
87 // Aggregate of the whole block is precisely the suffix at its first position
88 B[j] = suff[ell][Lx];
89 }
90 child[ell] = make_unique<SqrtTree<T, Op>>(B, op);
91 }
92 }
93
94 // Query on original indices [l, r], inclusive. Requires 0 <= l <= r < n.
95 T query(int l, int r) const {
96 if (l == r) return a[l];
97 // Find the smallest layer ell such that l and r are in the same superblock at layer ell
98 // i.e., floor(l / S[ell+1]) == floor(r / S[ell+1]). By minimality, they are in different blocks at S[ell].
99 int ell = 0;
100 while (ell + 1 <= L && (l / S[ell + 1]) != (r / S[ell + 1])) ++ell;
101 // Now split into three parts: suffix of left block, middle full blocks via child, prefix of right block
102 int bl = l / S[ell];
103 int br = r / S[ell];
104 T res = suff[ell][l];
105 if (bl + 1 <= br - 1) {
106 // middle full blocks exist; child is defined because number of blocks m[ell] >= 3
107 T mid = child[ell]->query(bl + 1, br - 1);
108 res = op(res, mid);
109 }
110 res = op(res, pref[ell][r]);
111 return res;
112 }
113};
114
115// Example operator: sum
116struct SumOp {
117 long long operator()(long long a, long long b) const { return a + b; }
118};
119
120// Example operator: minimum
121struct MinOp {
122 long long operator()(long long a, long long b) const { return std::min(a, b); }
123};
124
125int main() {
126 ios::sync_with_stdio(false);
127 cin.tie(nullptr);
128
129 // Demo: range sum and range min on the same array
130 int n = 10;
131 vector<long long> a = {5, -2, 7, 3, 9, 1, 4, -5, 6, 2};
132
133 SqrtTree<long long, SumOp> st_sum(a, SumOp{});
134 SqrtTree<long long, MinOp> st_min(a, MinOp{});
135
136 // Sample queries
137 vector<pair<int,int>> qs = {{0,0},{0,3},{2,7},{4,9},{1,8}};
138 cout << "Range Sum:\n";
139 for (auto [l,r] : qs) cout << l << "," << r << ": " << st_sum.query(l,r) << "\n";
140
141 cout << "\nRange Min:\n";
142 for (auto [l,r] : qs) cout << l << "," << r << ": " << st_min.query(l,r) << "\n";
143
144 return 0;
145}
146

This is a complete, generic sqrt tree for any associative operation. It builds O(log log n) layers. Each layer β„“ stores prefix/suffix aggregates for blocks of size S_β„“ on the original array and creates a child sqrt tree on the array of block aggregates B_β„“. A query picks the smallest β„“ where endpoints share a superblock (block at layer β„“+1) but lie in different blocks at layer β„“, then combines suffix(left), middle full blocks via child, and prefix(right). The example demonstrates both sum and min queries.

Time: Preprocessing O(n log log n); each query O(1) (with a very small constant; more precisely O(log log n) but treated as constant in practice).Space: O(n log log n) total space for all layers and child structures.
Micro-usage: reading input, building sqrt tree for sum, answering Q queries
1#include <bits/stdc++.h>
2using namespace std;
3
4template<typename T, class Op>
5struct SqrtTree {
6 int n, L; vector<int> b, S, m; vector<T> a; Op op; vector<vector<T>> pref, suff; vector<unique_ptr<SqrtTree<T, Op>>> child;
7 SqrtTree() : n(0), L(0) {}
8 SqrtTree(const vector<T>& arr, Op oper){ init(arr, oper); }
9 void init(const vector<T>& arr, Op oper){ a=arr; op=oper; n=(int)a.size(); if(!n){L=0;return;} m.clear(); b.clear(); m.push_back(n); while(m.back()>1){ int cur=m.back(); int bl=(int)floor(sqrt((long double)cur)); if(1LL*bl*bl<cur) ++bl; bl=max(bl,1); b.push_back(bl); m.push_back((cur+bl-1)/bl);} L=(int)b.size(); S.assign(L+1,1); for(int i=1;i<=L;i++) S[i]=S[i-1]*b[i-1]; pref.assign(L, vector<T>(n)); suff.assign(L, vector<T>(n)); for(int ell=0;ell<L;++ell){ int Bsz=S[ell]; int blocks=(n+Bsz-1)/Bsz; for(int blk=0; blk<blocks; ++blk){ int Lx=blk*Bsz; int Rx=min(n,(blk+1)*Bsz)-1; T acc=a[Lx]; pref[ell][Lx]=acc; for(int i=Lx+1;i<=Rx;++i){ acc=op(acc,a[i]); pref[ell][i]=acc;} acc=a[Rx]; suff[ell][Rx]=acc; for(int i=Rx-1;i>=Lx;--i){ acc=op(a[i],acc); suff[ell][i]=acc; } } } child.resize(L); for(int ell=0;ell<L;++ell){ int Bsz=S[ell]; int blocks=(n+Bsz-1)/Bsz; if(blocks<=1){ child[ell].reset(); continue; } vector<T> B(blocks); for(int j=0;j<blocks;++j){ int Lx=j*Bsz; B[j]=suff[ell][Lx]; } child[ell]=make_unique<SqrtTree<T,Op>>(B, op);} }
10 T query(int l, int r) const { if(l==r) return a[l]; int ell=0; while(ell+1<=L && (l/S[ell+1])!=(r/S[ell+1])) ++ell; int bl=l/S[ell]; int br=r/S[ell]; T res=suff[ell][l]; if(bl+1<=br-1){ T mid=child[ell]->query(bl+1, br-1); res=op(res, mid);} return op(res, pref[ell][r]); }
11};
12
13struct SumOp{ long long operator()(long long a, long long b) const { return a+b; } };
14
15int main(){
16 ios::sync_with_stdio(false);
17 cin.tie(nullptr);
18
19 int n,q; if(!(cin>>n>>q)) return 0; vector<long long> a(n); for(int i=0;i<n;++i) cin>>a[i];
20 SqrtTree<long long, SumOp> st(a, SumOp{});
21 while(q--){ int l,r; cin>>l>>r; // 0-indexed inclusive
22 cout<<st.query(l,r) << '\n';
23 }
24 return 0;
25}
26

Minimal driver for competitive programming: read an array, build a sqrt tree for sum, and answer Q range-sum queries in O(1) each. Replace SumOp with MinOp (or any associative op) to change the operation.

Time: Build O(n log log n); each query O(1).Space: O(n log log n).