Sqrt Tree
Key Points
- β’A sqrt tree is a layered block-decomposition data structure that answers range queries in O(1) time after O(n n) preprocessing.
- β’It works for any associative operation (sum, min, gcd, xor, set-union, etc.), not just idempotent ones.
- β’The idea: repeatedly group the array into blocks (then groups of blocks, and so on) and precompute prefix/suffix values within each block plus answers for full blocks between endpoints.
- β’A query picks the smallest layer where the endpoints lie in the same superblock but in different blocks, then combines three pieces: suffix of the left block, fully covered blocks in the middle, and prefix of the right block.
- β’Space and preprocessing grow across only n layers, giving total O(n n) memory and time to build.
- β’Point updates are possible by rebuilding the affected blocks up the layers in about O( n), but the structure is best for static arrays.
- β’Compared to a sparse table, a sqrt tree supports non-idempotent operations with much less memory than a disjoint sparse table.
- β’In competitive programming, sqrt trees are a practical O(1) alternative to segment trees (O( n)) when queries dominate.
Prerequisites
- βAssociativity and basic algebraic structures (monoids) β Sqrt trees rely on associativity to split ranges and recombine results correctly.
- βPrefix/suffix aggregates β Understanding how within-block contributions are computed and reused is essential.
- βSquare root decomposition β Sqrt trees layer multiple sqrt decompositions; knowing the single-layer idea helps.
- βSparse/disjoint sparse tables (contrast) β Helpful to compare idempotent vs non-idempotent support and space trade-offs.
- βBig-O notation and logarithms β Needed to reason about O(n log log n) preprocessing and O(1) query time.
- βSegment tree / Fenwick tree β Provides a baseline for dynamic structures and trade-offs with sqrt trees.
Detailed Explanation
Tap terms for definitions01Overview
A sqrt tree is a fast, layered data structure for static range queries on arrays. Unlike a single-level sqrt decomposition, it stacks several levels of blockings: first into about \sqrt{n} blocks, then groups those blocks into \sqrt{\sqrt{n}} superblocks, and so on, for only about \log \log n levels. At each level, it precomputes two kinds of information: (1) prefix and suffix results inside each block to answer partial-block contributions quickly, and (2) answers for ranges that span only full blocks, which are handled by the next level (recursively) on the array of block-aggregates. A range query chooses the smallest level where both endpoints fall into the same superblock but different blocks. The answer is then the associative combination of three parts: the suffix from the left endpoint to the end of its block, the result over the fully covered middle blocks (from the next-level structure), and the prefix from the start of the right block to the right endpoint. Because the number of levels is only \log \log n, the constant-time combination at one level and a single recursive call make the query effectively O(1). Preprocessing touches each element at each level, resulting in O(n \log \log n) build time and space. This design works for any associative operation, not only idempotent ones, making it more general than the classic sparse table while using less space than a full disjoint sparse table.
02Intuition & Analogies
Imagine organizing a library shelf. First, you split the books into medium boxes (about \sqrt{n} of them). Then you put those boxes into larger crates (about \sqrt{\sqrt{n}} of them). Then crates into pallets, and so on. If someone asks for all books from book L to book R, you can answer quickly: take the tail of the box containing L (suffix), take all fully covered boxes in the middle (whole boxes are easy), and take the head of the box containing R (prefix). The only catch is: what if L and R are so close that they sit in the same box? Then you go down one level where the βboxβ is smaller β because you built a hierarchy of smaller and smaller boxes, you will eventually hit a level where L and R are in different boxes but still inside the same larger container. Then the same three-piece trick works perfectly. In a sqrt tree, we precompute these heads and tails (prefixes and suffixes) for every box at every level. We also precompute results for runs of whole boxes by building the next level over an array of βbox summariesβ (what is the result of combining everything in a box). A query uses the smallest level where L and R are in the same superbox but different boxes. With just one recursive hop to the next level for the middle full boxes, we get the final answer by glueing three pieces together. Because the number of levels is tiny (roughly log log n), this feels like constant time in practice.
03Formal Definition
04When to Use
Use a sqrt tree when you need extremely fast (effectively O(1)) range queries over a static array and the operation is associative. Examples include: sum queries for large-scale analytics where memory is at a premium compared to a disjoint sparse table; range minimum or maximum queries when you prefer less memory than a full disjoint sparse table but want broader support than classic sparse table; range gcd/xor/bitwise-and/or queries; combining small custom monoids in competitive programming (e.g., a struct with count and sum combined by addition). It shines when there are many more queries than updates, or when the data is immutable. If point updates are rare but needed, you can rebuild the affected block chains in O(\sqrt{n} \log \log n) each, which is acceptable in some scenarios. If your operation is idempotent (like min/max/gcd) and memory is not tight, a classic sparse table might be simpler. If you need frequent updates, prefer a Fenwick or segment tree for O(\log n) updates and queries.
β οΈCommon Mistakes
- Forgetting associativity: the sqrt tree relies on associativity to splice suffix, middle, and prefix pieces. If your operation is not associative, the structure can return incorrect answers. Ensure op(op(a,b),c) = op(a,op(b,c)).
- Mishandling same-block queries with only one level: a single-level sqrt decomposition cannot answer same-block ranges in O(1). The multi-layer design is essential; always pick the smallest layer where endpoints are in the same superblock but different blocks.
- Off-by-one in block boundaries: when computing prefixes/suffixes for blocks of size S_{\ell}, carefully cap the right end at n-1. Incorrect boundaries cause overlaps or gaps.
- Assuming an identity element is required: the standard query composition never needs a neutral element; you simply skip the middle part if there are no full blocks between.
- Overbuilding or underbuilding layers: stop when the next-domain size m_{\ell} becomes 1. Building beyond that wastes space; stopping early breaks correctness for long ranges.
- Confusing sparse table with sqrt tree: a sparse table for min/gcd is fine but does not handle non-idempotent ops like sum without switching to a disjoint sparse table (O(n \log n) space). The sqrt tree targets O(n \log \log n) with general associative ops.
Key Formulas
Layer progression
Explanation: Each layer groups the current domain into blocks of size roughly the square root, reducing the domain size rapidly. After O(log log n) layers, the domain collapses to size 1.
Original-scale block size
Explanation: S_ is the block size in original elements at layer . It is the cumulative product of block factors down to that layer.
Three-piece composition
Explanation: At the chosen layer , p and q are the block indices of L and R. Combine the suffix from L to the end of its block, the result over full blocks in the middle (possibly empty), and the prefix from the start of the right block to R.
Number of layers
Explanation: Because each step takes a square root, the number of iterations needed to reduce n to 1 is logarithmic in the logarithm of n.
Complexity summary
Explanation: Each layer processes O(n) work for prefix/suffix and block aggregates, and there are O(log log n) layers. A query touches a constant number of layers and does O(1) work per layer.
Space complexity
Explanation: We store prefix and suffix per element per layer, and small recursive structures on block arrays across layers whose total size is proportional to n times the number of layers.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Generic associative operation wrapper must provide: T operator()(const T&, const T&) const 5 // Example ops are provided below. 6 7 template<typename T, class Op> 8 struct SqrtTree { 9 int n; // original length 10 Op op; // associative operator 11 vector<T> a; // original array 12 13 // Layer bookkeeping 14 int L; // number of layers (excluding the final collapsed m[L]==1) 15 vector<int> b; // block factor at layer ell: b[ell] = ceil(sqrt(m_ell)) 16 vector<int> S; // original-scale block size S_ell, with S[0]=1 17 vector<int> m; // domain sizes m_ell, with m[0]=n and m[ell+1]=ceil(m_ell/b[ell]) 18 19 // Precomputed per-layer data (on the original array scale) 20 vector<vector<T>> pref; // pref[ell][i]: op over [block_start(i,ell) .. i] 21 vector<vector<T>> suff; // suff[ell][i]: op over [i .. block_end(i,ell)] 22 23 // Child structures over block-aggregate arrays B_ell 24 vector<unique_ptr<SqrtTree<T, Op>>> child; // child[ell] built on B_ell if m[ell] > 1 25 26 SqrtTree() : n(0), L(0) {} 27 28 SqrtTree(const vector<T>& arr, Op oper) { init(arr, oper); } 29 30 void init(const vector<T>& arr, Op oper) { 31 a = arr; op = oper; n = (int)a.size(); 32 if (n == 0) { L = 0; b.clear(); S.clear(); m.clear(); pref.clear(); suff.clear(); child.clear(); return; } 33 34 // Build m and b: repeated square roots until domain size becomes 1 35 m.clear(); b.clear(); 36 m.push_back(n); 37 while (m.back() > 1) { 38 int cur = m.back(); 39 int bl = (int)floor(sqrt((long double)cur)); 40 if (1LL * bl * bl < cur) ++bl; // ceil sqrt 41 bl = max(bl, 1); 42 b.push_back(bl); 43 int nxt = (cur + bl - 1) / bl; 44 m.push_back(nxt); 45 } 46 L = (int)b.size(); 47 48 // Build original-scale block sizes S[0]=1; S[ell+1] = S[ell] * b[ell] 49 S.assign(L + 1, 1); 50 for (int ell = 1; ell <= L; ++ell) S[ell] = S[ell - 1] * b[ell - 1]; 51 52 // Precompute prefix/suffix at each layer (on original array indices) 53 pref.assign(L, vector<T>(n)); 54 suff.assign(L, vector<T>(n)); 55 for (int ell = 0; ell < L; ++ell) { 56 int Bsz = S[ell]; // block size on original scale 57 int blocks = (n + Bsz - 1) / Bsz; // equals m[ell] 58 for (int blk = 0; blk < blocks; ++blk) { 59 int Lx = blk * Bsz; 60 int Rx = min(n, (blk + 1) * Bsz) - 1; 61 // prefix within block 62 T acc = a[Lx]; 63 pref[ell][Lx] = acc; 64 for (int i = Lx + 1; i <= Rx; ++i) { 65 acc = op(acc, a[i]); 66 pref[ell][i] = acc; 67 } 68 // suffix within block 69 acc = a[Rx]; 70 suff[ell][Rx] = acc; 71 for (int i = Rx - 1; i >= Lx; --i) { 72 acc = op(a[i], acc); 73 suff[ell][i] = acc; 74 } 75 } 76 } 77 78 // Build child structures on block aggregates B_ell 79 child.resize(L); 80 for (int ell = 0; ell < L; ++ell) { 81 int Bsz = S[ell]; 82 int blocks = (n + Bsz - 1) / Bsz; // m[ell] 83 if (blocks <= 1) { child[ell].reset(); continue; } 84 vector<T> B(blocks); 85 for (int j = 0; j < blocks; ++j) { 86 int Lx = j * Bsz; 87 // Aggregate of the whole block is precisely the suffix at its first position 88 B[j] = suff[ell][Lx]; 89 } 90 child[ell] = make_unique<SqrtTree<T, Op>>(B, op); 91 } 92 } 93 94 // Query on original indices [l, r], inclusive. Requires 0 <= l <= r < n. 95 T query(int l, int r) const { 96 if (l == r) return a[l]; 97 // Find the smallest layer ell such that l and r are in the same superblock at layer ell 98 // i.e., floor(l / S[ell+1]) == floor(r / S[ell+1]). By minimality, they are in different blocks at S[ell]. 99 int ell = 0; 100 while (ell + 1 <= L && (l / S[ell + 1]) != (r / S[ell + 1])) ++ell; 101 // Now split into three parts: suffix of left block, middle full blocks via child, prefix of right block 102 int bl = l / S[ell]; 103 int br = r / S[ell]; 104 T res = suff[ell][l]; 105 if (bl + 1 <= br - 1) { 106 // middle full blocks exist; child is defined because number of blocks m[ell] >= 3 107 T mid = child[ell]->query(bl + 1, br - 1); 108 res = op(res, mid); 109 } 110 res = op(res, pref[ell][r]); 111 return res; 112 } 113 }; 114 115 // Example operator: sum 116 struct SumOp { 117 long long operator()(long long a, long long b) const { return a + b; } 118 }; 119 120 // Example operator: minimum 121 struct MinOp { 122 long long operator()(long long a, long long b) const { return std::min(a, b); } 123 }; 124 125 int main() { 126 ios::sync_with_stdio(false); 127 cin.tie(nullptr); 128 129 // Demo: range sum and range min on the same array 130 int n = 10; 131 vector<long long> a = {5, -2, 7, 3, 9, 1, 4, -5, 6, 2}; 132 133 SqrtTree<long long, SumOp> st_sum(a, SumOp{}); 134 SqrtTree<long long, MinOp> st_min(a, MinOp{}); 135 136 // Sample queries 137 vector<pair<int,int>> qs = {{0,0},{0,3},{2,7},{4,9},{1,8}}; 138 cout << "Range Sum:\n"; 139 for (auto [l,r] : qs) cout << l << "," << r << ": " << st_sum.query(l,r) << "\n"; 140 141 cout << "\nRange Min:\n"; 142 for (auto [l,r] : qs) cout << l << "," << r << ": " << st_min.query(l,r) << "\n"; 143 144 return 0; 145 } 146
This is a complete, generic sqrt tree for any associative operation. It builds O(log log n) layers. Each layer β stores prefix/suffix aggregates for blocks of size S_β on the original array and creates a child sqrt tree on the array of block aggregates B_β. A query picks the smallest β where endpoints share a superblock (block at layer β+1) but lie in different blocks at layer β, then combines suffix(left), middle full blocks via child, and prefix(right). The example demonstrates both sum and min queries.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 template<typename T, class Op> 5 struct SqrtTree { 6 int n, L; vector<int> b, S, m; vector<T> a; Op op; vector<vector<T>> pref, suff; vector<unique_ptr<SqrtTree<T, Op>>> child; 7 SqrtTree() : n(0), L(0) {} 8 SqrtTree(const vector<T>& arr, Op oper){ init(arr, oper); } 9 void init(const vector<T>& arr, Op oper){ a=arr; op=oper; n=(int)a.size(); if(!n){L=0;return;} m.clear(); b.clear(); m.push_back(n); while(m.back()>1){ int cur=m.back(); int bl=(int)floor(sqrt((long double)cur)); if(1LL*bl*bl<cur) ++bl; bl=max(bl,1); b.push_back(bl); m.push_back((cur+bl-1)/bl);} L=(int)b.size(); S.assign(L+1,1); for(int i=1;i<=L;i++) S[i]=S[i-1]*b[i-1]; pref.assign(L, vector<T>(n)); suff.assign(L, vector<T>(n)); for(int ell=0;ell<L;++ell){ int Bsz=S[ell]; int blocks=(n+Bsz-1)/Bsz; for(int blk=0; blk<blocks; ++blk){ int Lx=blk*Bsz; int Rx=min(n,(blk+1)*Bsz)-1; T acc=a[Lx]; pref[ell][Lx]=acc; for(int i=Lx+1;i<=Rx;++i){ acc=op(acc,a[i]); pref[ell][i]=acc;} acc=a[Rx]; suff[ell][Rx]=acc; for(int i=Rx-1;i>=Lx;--i){ acc=op(a[i],acc); suff[ell][i]=acc; } } } child.resize(L); for(int ell=0;ell<L;++ell){ int Bsz=S[ell]; int blocks=(n+Bsz-1)/Bsz; if(blocks<=1){ child[ell].reset(); continue; } vector<T> B(blocks); for(int j=0;j<blocks;++j){ int Lx=j*Bsz; B[j]=suff[ell][Lx]; } child[ell]=make_unique<SqrtTree<T,Op>>(B, op);} } 10 T query(int l, int r) const { if(l==r) return a[l]; int ell=0; while(ell+1<=L && (l/S[ell+1])!=(r/S[ell+1])) ++ell; int bl=l/S[ell]; int br=r/S[ell]; T res=suff[ell][l]; if(bl+1<=br-1){ T mid=child[ell]->query(bl+1, br-1); res=op(res, mid);} return op(res, pref[ell][r]); } 11 }; 12 13 struct SumOp{ long long operator()(long long a, long long b) const { return a+b; } }; 14 15 int main(){ 16 ios::sync_with_stdio(false); 17 cin.tie(nullptr); 18 19 int n,q; if(!(cin>>n>>q)) return 0; vector<long long> a(n); for(int i=0;i<n;++i) cin>>a[i]; 20 SqrtTree<long long, SumOp> st(a, SumOp{}); 21 while(q--){ int l,r; cin>>l>>r; // 0-indexed inclusive 22 cout<<st.query(l,r) << '\n'; 23 } 24 return 0; 25 } 26
Minimal driver for competitive programming: read an array, build a sqrt tree for sum, and answer Q range-sum queries in O(1) each. Replace SumOp with MinOp (or any associative op) to change the operation.