Merge Sort Tree
Key Points
- •A Merge Sort Tree is a segment tree where every node stores the sorted list of values in its segment.
- •It supports static range queries like counting elements less than or equal to a key in O(lo n) time.
- •Building the tree takes O(n n) time and O(n n) space by merging children’s sorted lists.
- •You can answer range frequency queries and compute k-th smallest in a range by binary searching over the value domain.
- •With per-node prefix sums, you can also compute the sum of elements up to a threshold in O(lo n).
- •It is simpler than a persistent segment tree and often easier than a wavelet tree, though it uses more memory than a plain segment tree.
- •It is best suited for static arrays because point updates are expensive or require advanced per-node structures.
- •Careful handling of 0/1-based indices, inclusive ranges, and use of lowe/uppe avoids common bugs.
Prerequisites
- →Binary Search — Queries rely on lower_bound/upper_bound within sorted node arrays.
- →Segment Tree — Merge Sort Tree is a segment tree variant with sorted lists at nodes.
- →Merge Sort — Building nodes uses the merge step repeatedly across levels.
- →Asymptotic Notation (Big-O) — To analyze build and query complexities like O(n log n) and O(log^2 n).
- →C++ STL (vectors and algorithms) — Efficient merging and binary searching use std::merge, lower_bound, and upper_bound.
- →Coordinate Compression — Needed for efficient k-th queries via value-domain binary search and to reduce value range.
Detailed Explanation
Tap terms for definitions01Overview
A Merge Sort Tree (MST) is a segment-tree-based data structure designed for static arrays where we want fast range queries involving order statistics, such as “how many numbers in A[l..r] are less than k?” or “what is the k-th smallest in A[l..r]?”. Like a standard segment tree, it recursively divides the array into halves. Unlike a standard segment tree that stores an aggregate (like sum or min) at each node, an MST stores the entire subarray covered by the node in sorted order. The tree is built in O(n log n) time by merging the sorted lists of the two children at each node, mirroring the merge step in merge sort (hence the name). Queries traverse only O(log n) nodes, and at each node we can binary search in its sorted list. This produces O(log^2 n) time per query for counting-related operations. The memory cost is O(n log n) because each array element appears in O(log n) nodes. MSTs are particularly effective when the array is fixed and we need to answer many queries that depend on element ordering, including frequency in a value interval and k-th order statistics via binary search on the value domain.
02Intuition & Analogies
Imagine you organize a bookshelf of n books (the array) by repeatedly splitting it into left and right halves (like shelves on a tree). On each shelf (node), you keep a list of the books that belong there, sorted by title (value). To find out how many books between positions l and r come before the title “K” in the alphabet, you only need to check the shelves that exactly cover the l..r range. Because each of those shelves already has its list sorted, counting how many are “before K” is just a quick lookup with a finger running until “K”—that is binary search. You don’t need to open every book; you use the sorted shelf lists to skip work. Building this bookshelf system isn’t free: for each shelf, you merge the already-sorted lists from its left and right subshelves, like the merge step in merge sort. That’s why the structure is called a Merge Sort Tree. This setup pays off when you’ll answer many questions: count elements below a threshold, count in a value interval, or even find the k-th smallest by guessing a candidate value and asking “how many are ≤ candidate?” repeatedly. Because you touch only O(log n) shelves and do a fast binary search on each shelf, your query is fast, albeit not as fast as specialized or persistent structures. But the trade-off is that the whole system is conceptually simple and robust for static data.
03Formal Definition
04When to Use
Use a Merge Sort Tree when: 1) The array is static (no or very infrequent updates). 2) You need order-based range queries: count elements < k, ≤ k, in [a, b]; find the k-th smallest by searching the value domain; compute the sum of elements ≤ k if you augment nodes with prefix sums. 3) Memory of O(n log n) is acceptable. 4) Simpler implementation than a persistent segment tree or wavelet tree is preferred. Typical problems include: number of elements less than k in subarray queries; frequency of a specific value within a subarray; k-th order statistic within a subarray (by value-domain binary search); sum or weighted count under a threshold (with augmented prefix sums); counting pairs or inversions restricted to a query range (often as a subroutine). Avoid using MST when you have many updates, especially insertions/deletions, or when memory is extremely tight. For dynamic scenarios, consider a Fenwick tree of ordered containers, policy-based ordered sets, a persistent segment tree, or a wavelet tree.
⚠️Common Mistakes
- Off-by-one and indexing: Mixing 0-based arrays with 1-based query indices causes incorrect segments; always normalize queries and document inclusivity (both ends inclusive for MST). - Wrong comparator choice: Using upper_bound vs lower_bound incorrectly yields off-by-one errors in counts. Remember: lower_bound(k) counts < k; upper_bound(k) counts ≤ k. - Forgetting to clear or resize the tree before rebuilding, leading to memory blow-up or old data mixed with new data. - Inefficient merges: Repeatedly copying vectors or missing reserve/shrink_to_fit can cause timeouts or excessive memory. Merge children into the parent once and avoid unnecessary temporaries. - Expecting fast updates: MST is mainly for static arrays. Point updates with vectors are not efficient; either rebuild or switch data structures if updates are frequent. - Overflow in prefix sums: When augmenting with per-node prefix sums, use 64-bit integers (long long) if values can sum beyond 32-bit range. - Value compression errors: When performing k-th smallest via binary search over values, compress coordinates properly and keep a consistent mapping. - Recursion depth and stack: On very large n, ensure tail recursion is not too deep or use iterative build if needed.
Key Formulas
Build Complexity
Explanation: Each level of the tree merges lists whose total length is n, and there are O(log n) levels. So the total time is proportional to n times log n.
Space Complexity
Explanation: Every element appears in O(log n) nodes along the path from leaf to root. Summed across nodes, storage is n times log n up to constants.
Count Query Complexity
Explanation: We visit O(log n) nodes to cover [l,r]. On each node, a binary search in the node’s sorted list takes O(log n), giving O(lo n) overall.
Merge Sort Recurrence
Explanation: The time to build an internal node is the sum of times for two children plus linear time to merge their outputs. This standard recurrence solves to n log n.
Segment Tree Size
Explanation: A full binary tree covering n leaves has a number of nodes linear in n. Implementations often size arrays to about 4n.
Interval Count Identity
Explanation: The number of elements within a closed interval equals the number of elements not exceeding the upper bound minus those strictly less than the lower bound.
K-th Query via Value Search
Explanation: Binary search over the value domain (or compressed ranks). Each feasibility check counts how many are mid in O(lo n).
Binary Search Iterations
Explanation: Searching in a sorted list of length m takes logarithmic steps, which multiplies with the tree’s O(log n) nodes.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct MergeSortTree { 5 int n; // size of the input array 6 vector<vector<int>> st; // tree: each node stores a sorted vector of values in its segment 7 8 MergeSortTree() {} 9 explicit MergeSortTree(const vector<int>& a) { build(a); } 10 11 void build(const vector<int>& a) { 12 n = (int)a.size(); 13 st.assign(4 * n, {}); // 4*n is a safe upper bound for segment tree node count 14 build(1, 0, n - 1, a); 15 // Optional: shrink memory of each node to fit its content tightly 16 // for (auto &v : st) v.shrink_to_fit(); 17 } 18 19 void build(int p, int L, int R, const vector<int>& a) { 20 if (L == R) { 21 st[p] = {a[L]}; // leaf: single element 22 return; 23 } 24 int M = (L + R) >> 1; 25 build(p << 1, L, M, a); 26 build(p << 1 | 1, M + 1, R, a); 27 // merge two sorted children into this node 28 st[p].resize(st[p << 1].size() + st[p << 1 | 1].size()); 29 merge(st[p << 1].begin(), st[p << 1].end(), 30 st[p << 1 | 1].begin(), st[p << 1 | 1].end(), 31 st[p].begin()); 32 } 33 34 // Count how many elements in A[l..r] are < k 35 int countLess(int l, int r, int k) const { 36 return countLess(1, 0, n - 1, l, r, k); 37 } 38 39 int countLess(int p, int L, int R, int i, int j, int k) const { 40 if (j < L || R < i) return 0; // disjoint 41 if (i <= L && R <= j) { 42 // lower_bound gives first position of element >= k; its index equals count of < k 43 return (int)(lower_bound(st[p].begin(), st[p].end(), k) - st[p].begin()); 44 } 45 int M = (L + R) >> 1; 46 return countLess(p << 1, L, M, i, j, k) + countLess(p << 1 | 1, M + 1, R, i, j, k); 47 } 48 49 // Count how many elements in A[l..r] are <= k 50 int countLE(int l, int r, int k) const { 51 return countLE(1, 0, n - 1, l, r, k); 52 } 53 54 int countLE(int p, int L, int R, int i, int j, int k) const { 55 if (j < L || R < i) return 0; // disjoint 56 if (i <= L && R <= j) { 57 // upper_bound gives first position of element > k; index equals count of <= k 58 return (int)(upper_bound(st[p].begin(), st[p].end(), k) - st[p].begin()); 59 } 60 int M = (L + R) >> 1; 61 return countLE(p << 1, L, M, i, j, k) + countLE(p << 1 | 1, M + 1, R, i, j, k); 62 } 63 64 // Count how many elements in A[l..r] fall in [a, b] 65 int countInRange(int l, int r, int a, int b) const { 66 if (a > b) return 0; 67 return countLE(l, r, b) - countLess(l, r, a); 68 } 69 }; 70 71 int main() { 72 ios::sync_with_stdio(false); 73 cin.tie(nullptr); 74 75 int n; cin >> n; 76 vector<int> a(n); 77 for (int i = 0; i < n; ++i) cin >> a[i]; 78 79 MergeSortTree mst(a); 80 81 int q; cin >> q; 82 // Query format examples (1-based indices from input, converted to 0-based): 83 // type 1 l r k : count elements < k in [l, r] 84 // type 2 l r a b : count elements in [a, b] within [l, r] 85 while (q--) { 86 int type; cin >> type; 87 if (type == 1) { 88 int l, r, k; cin >> l >> r >> k; --l; --r; 89 cout << mst.countLess(l, r, k) << "\n"; 90 } else if (type == 2) { 91 int l, r, aV, bV; cin >> l >> r >> aV >> bV; --l; --r; 92 cout << mst.countInRange(l, r, aV, bV) << "\n"; 93 } 94 } 95 return 0; 96 } 97
This program builds a Merge Sort Tree over an input array. Each node stores the sorted values for its segment. Queries decompose [l, r] into O(log n) nodes and use binary search per node: lower_bound for counting < k and upper_bound for counting ≤ k. The count in a closed value interval [a, b] is obtained by the identity #(≤ b) − #(< a). Indices are normalized to 0-based internally.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct MergeSortTree { 5 int n; vector<vector<int>> st; 6 MergeSortTree() {} 7 explicit MergeSortTree(const vector<int>& a) { build(a); } 8 void build(const vector<int>& a) { 9 n = (int)a.size(); st.assign(4*n, {}); build(1,0,n-1,a); 10 } 11 void build(int p, int L, int R, const vector<int>& a){ 12 if(L==R){ st[p] = {a[L]}; return; } 13 int M=(L+R)>>1; build(p<<1,L,M,a); build(p<<1|1,M+1,R,a); 14 st[p].resize(st[p<<1].size()+st[p<<1|1].size()); 15 merge(st[p<<1].begin(), st[p<<1].end(), st[p<<1|1].begin(), st[p<<1|1].end(), st[p].begin()); 16 } 17 // count of elements <= k in [l,r] 18 int countLE(int l,int r,int k) const { return countLE(1,0,n-1,l,r,k);} 19 int countLE(int p,int L,int R,int i,int j,int k) const{ 20 if(j<L||R<i) return 0; 21 if(i<=L&&R<=j) return (int)(upper_bound(st[p].begin(),st[p].end(),k)-st[p].begin()); 22 int M=(L+R)>>1; return countLE(p<<1,L,M,i,j,k)+countLE(p<<1|1,M+1,R,i,j,k); 23 } 24 }; 25 26 int main(){ 27 ios::sync_with_stdio(false); 28 cin.tie(nullptr); 29 30 int n; cin>>n; vector<int>a(n); for(int i=0;i<n;++i) cin>>a[i]; 31 32 // Coordinate compress values to binary search over indices [0..m-1] 33 vector<int> vals = a; sort(vals.begin(), vals.end()); vals.erase(unique(vals.begin(), vals.end()), vals.end()); 34 vector<int> comp(n); for(int i=0;i<n;++i) comp[i] = (int)(lower_bound(vals.begin(), vals.end(), a[i]) - vals.begin()); 35 36 MergeSortTree mst(comp); 37 38 auto kth_in_range = [&](int l, int r, int k)->int{ // 0-based l,r; k is 1-based order statistic 39 int lo = 0, hi = (int)vals.size()-1, ans = -1; 40 while(lo<=hi){ 41 int mid = (lo+hi)>>1; 42 // Check how many elements <= vals[mid] in [l,r] 43 int cnt = mst.countLE(l, r, mid); 44 if(cnt >= k){ ans = mid; hi = mid-1; } else { lo = mid+1; } 45 } 46 return (ans==-1)? INT_MIN : vals[ans]; // return actual value, or INT_MIN if k is invalid 47 }; 48 49 int q; cin>>q; 50 // Query: l r k => find k-th smallest in A[l..r] 51 while(q--){ 52 int l,r,k; cin>>l>>r>>k; --l; --r; 53 if(l>r || k<1 || k>r-l+1){ cout << "INVALID\n"; continue; } 54 cout << kth_in_range(l,r,k) << "\n"; 55 } 56 return 0; 57 } 58
We compress values so that the value domain becomes [0..m−1] while preserving order. The Merge Sort Tree is built over compressed ranks. To find the k-th smallest in [l, r], we binary search over the compressed domain and use the MST to count how many elements are ≤ a candidate rank. If at least k elements are ≤ mid, we search left; otherwise, we search right. The final answer is mapped back to the original value using the compression array.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct Node { 5 vector<int> vals; // sorted values 6 vector<long long> pref; // prefix sums over vals (pref[i] = sum of vals[0..i-1], pref.size() = vals.size()+1) 7 }; 8 9 struct MSTSum { 10 int n; vector<Node> st; 11 MSTSum() {} 12 explicit MSTSum(const vector<int>& a){ build(a); } 13 14 void build(const vector<int>& a){ 15 n = (int)a.size(); st.assign(4*n, {}); build(1,0,n-1,a); 16 } 17 18 void build(int p, int L, int R, const vector<int>& a){ 19 if(L==R){ 20 st[p].vals = {a[L]}; 21 st[p].pref = {0LL, (long long)a[L]}; 22 return; 23 } 24 int M=(L+R)>>1; build(p<<1,L,M,a); build(p<<1|1,M+1,R,a); 25 auto &A = st[p<<1].vals; auto &B = st[p<<1|1].vals; 26 auto &V = st[p].vals; V.resize(A.size()+B.size()); 27 merge(A.begin(),A.end(),B.begin(),B.end(),V.begin()); 28 auto &P = st[p].pref; P.resize(V.size()+1); P[0]=0; 29 for(size_t i=0;i<V.size();++i) P[i+1] = P[i] + (long long)V[i]; 30 } 31 32 // Sum of elements <= k in A[l..r] 33 long long sumLE(int l,int r,int k) const { return sumLE(1,0,n-1,l,r,k); } 34 35 long long sumLE(int p,int L,int R,int i,int j,int k) const{ 36 if(j<L||R<i) return 0LL; 37 if(i<=L && R<=j){ 38 const auto &V = st[p].vals; const auto &P = st[p].pref; 39 int idx = (int)(upper_bound(V.begin(), V.end(), k) - V.begin()); // number of elements <= k 40 return P[idx]; // prefix sum up to idx-1 41 } 42 int M=(L+R)>>1; return sumLE(p<<1,L,M,i,j,k) + sumLE(p<<1|1,M+1,R,i,j,k); 43 } 44 }; 45 46 int main(){ 47 ios::sync_with_stdio(false); 48 cin.tie(nullptr); 49 50 int n; cin>>n; vector<int>a(n); for(int i=0;i<n;++i) cin>>a[i]; 51 MSTSum tree(a); 52 53 int q; cin>>q; 54 // Query: l r k => sum of elements <= k in A[l..r] 55 while(q--){ 56 int l,r,k; cin>>l>>r>>k; --l; --r; 57 cout << tree.sumLE(l,r,k) << "\n"; 58 } 59 return 0; 60 } 61
Each node stores both the sorted values in its segment and a prefix sum array over those values. To compute the sum of elements ≤ k in [l, r], we visit O(log n) nodes and use upper_bound to find how many values are ≤ k at each node, then take the corresponding prefix sum in O(1). Summing across visited nodes gives the answer. This is useful for thresholded sums and can be extended to weighted counts.