🗂️Data StructureIntermediate

Merge Sort Tree

Key Points

  • A Merge Sort Tree is a segment tree where every node stores the sorted list of values in its segment.
  • It supports static range queries like counting elements less than or equal to a key in O(lo n) time.
  • Building the tree takes O(n n) time and O(n n) space by merging children’s sorted lists.
  • You can answer range frequency queries and compute k-th smallest in a range by binary searching over the value domain.
  • With per-node prefix sums, you can also compute the sum of elements up to a threshold in O(lo n).
  • It is simpler than a persistent segment tree and often easier than a wavelet tree, though it uses more memory than a plain segment tree.
  • It is best suited for static arrays because point updates are expensive or require advanced per-node structures.
  • Careful handling of 0/1-based indices, inclusive ranges, and use of lowe/uppe avoids common bugs.

Prerequisites

  • Binary SearchQueries rely on lower_bound/upper_bound within sorted node arrays.
  • Segment TreeMerge Sort Tree is a segment tree variant with sorted lists at nodes.
  • Merge SortBuilding nodes uses the merge step repeatedly across levels.
  • Asymptotic Notation (Big-O)To analyze build and query complexities like O(n log n) and O(log^2 n).
  • C++ STL (vectors and algorithms)Efficient merging and binary searching use std::merge, lower_bound, and upper_bound.
  • Coordinate CompressionNeeded for efficient k-th queries via value-domain binary search and to reduce value range.

Detailed Explanation

Tap terms for definitions

01Overview

A Merge Sort Tree (MST) is a segment-tree-based data structure designed for static arrays where we want fast range queries involving order statistics, such as “how many numbers in A[l..r] are less than k?” or “what is the k-th smallest in A[l..r]?”. Like a standard segment tree, it recursively divides the array into halves. Unlike a standard segment tree that stores an aggregate (like sum or min) at each node, an MST stores the entire subarray covered by the node in sorted order. The tree is built in O(n log n) time by merging the sorted lists of the two children at each node, mirroring the merge step in merge sort (hence the name). Queries traverse only O(log n) nodes, and at each node we can binary search in its sorted list. This produces O(log^2 n) time per query for counting-related operations. The memory cost is O(n log n) because each array element appears in O(log n) nodes. MSTs are particularly effective when the array is fixed and we need to answer many queries that depend on element ordering, including frequency in a value interval and k-th order statistics via binary search on the value domain.

02Intuition & Analogies

Imagine you organize a bookshelf of n books (the array) by repeatedly splitting it into left and right halves (like shelves on a tree). On each shelf (node), you keep a list of the books that belong there, sorted by title (value). To find out how many books between positions l and r come before the title “K” in the alphabet, you only need to check the shelves that exactly cover the l..r range. Because each of those shelves already has its list sorted, counting how many are “before K” is just a quick lookup with a finger running until “K”—that is binary search. You don’t need to open every book; you use the sorted shelf lists to skip work. Building this bookshelf system isn’t free: for each shelf, you merge the already-sorted lists from its left and right subshelves, like the merge step in merge sort. That’s why the structure is called a Merge Sort Tree. This setup pays off when you’ll answer many questions: count elements below a threshold, count in a value interval, or even find the k-th smallest by guessing a candidate value and asking “how many are ≤ candidate?” repeatedly. Because you touch only O(log n) shelves and do a fast binary search on each shelf, your query is fast, albeit not as fast as specialized or persistent structures. But the trade-off is that the whole system is conceptually simple and robust for static data.

03Formal Definition

Given an array A of length n indexed from 0 to n−1, a Merge Sort Tree is a perfectly standard binary segment tree where each node v corresponds to a segment [, ] and stores an array containing exactly the multiset of values {A[i] | i ∈ [L_v, R_v]}, sorted in non-decreasing order. The leaves correspond to single positions i and store the singleton list [A[i]]. For an internal node v with children u and w covering [L_v, M] and [M+1, R_v], S_v is obtained as the stable merge of S_u and S_w. Build time is dominated by the merging across all levels. A typical query over an interval [l, r] decomposes [l, r] into O(log n) node segments that exactly partition [l, r]. Each step of the query visits only those nodes and performs binary search operations on their S_v arrays. For example, count_less(l, r, k) = Σ_{v covering parts of [l, r]} |{x ∈ | x < k}|, computed with lowe(k) on each . Storage is O(Σ_v ) = O(n log n). Query time is O(log n · log n) because there are O(log n) visited nodes and binary search on each costs O(log ) = O(log n).

04When to Use

Use a Merge Sort Tree when: 1) The array is static (no or very infrequent updates). 2) You need order-based range queries: count elements < k, ≤ k, in [a, b]; find the k-th smallest by searching the value domain; compute the sum of elements ≤ k if you augment nodes with prefix sums. 3) Memory of O(n log n) is acceptable. 4) Simpler implementation than a persistent segment tree or wavelet tree is preferred. Typical problems include: number of elements less than k in subarray queries; frequency of a specific value within a subarray; k-th order statistic within a subarray (by value-domain binary search); sum or weighted count under a threshold (with augmented prefix sums); counting pairs or inversions restricted to a query range (often as a subroutine). Avoid using MST when you have many updates, especially insertions/deletions, or when memory is extremely tight. For dynamic scenarios, consider a Fenwick tree of ordered containers, policy-based ordered sets, a persistent segment tree, or a wavelet tree.

⚠️Common Mistakes

  • Off-by-one and indexing: Mixing 0-based arrays with 1-based query indices causes incorrect segments; always normalize queries and document inclusivity (both ends inclusive for MST). - Wrong comparator choice: Using upper_bound vs lower_bound incorrectly yields off-by-one errors in counts. Remember: lower_bound(k) counts < k; upper_bound(k) counts ≤ k. - Forgetting to clear or resize the tree before rebuilding, leading to memory blow-up or old data mixed with new data. - Inefficient merges: Repeatedly copying vectors or missing reserve/shrink_to_fit can cause timeouts or excessive memory. Merge children into the parent once and avoid unnecessary temporaries. - Expecting fast updates: MST is mainly for static arrays. Point updates with vectors are not efficient; either rebuild or switch data structures if updates are frequent. - Overflow in prefix sums: When augmenting with per-node prefix sums, use 64-bit integers (long long) if values can sum beyond 32-bit range. - Value compression errors: When performing k-th smallest via binary search over values, compress coordinates properly and keep a consistent mapping. - Recursion depth and stack: On very large n, ensure tail recursion is not too deep or use iterative build if needed.

Key Formulas

Build Complexity

Explanation: Each level of the tree merges lists whose total length is n, and there are O(log n) levels. So the total time is proportional to n times log n.

Space Complexity

Explanation: Every element appears in O(log n) nodes along the path from leaf to root. Summed across nodes, storage is n times log n up to constants.

Count Query Complexity

Explanation: We visit O(log n) nodes to cover [l,r]. On each node, a binary search in the node’s sorted list takes O(log n), giving O(lo n) overall.

Merge Sort Recurrence

Explanation: The time to build an internal node is the sum of times for two children plus linear time to merge their outputs. This standard recurrence solves to n log n.

Segment Tree Size

Explanation: A full binary tree covering n leaves has a number of nodes linear in n. Implementations often size arrays to about 4n.

Interval Count Identity

Explanation: The number of elements within a closed interval equals the number of elements not exceeding the upper bound minus those strictly less than the lower bound.

K-th Query via Value Search

Explanation: Binary search over the value domain (or compressed ranks). Each feasibility check counts how many are mid in O(lo n).

Binary Search Iterations

Explanation: Searching in a sorted list of length m takes logarithmic steps, which multiplies with the tree’s O(log n) nodes.

Complexity Analysis

Let n be the array length. The tree has O(n) nodes: in a binary segment tree covering n leaves, the total node count is at most 2·2^{⌈log2 n⌉} − 1 = O(n). Each element from the original array is stored at every node on the path from its leaf to the root, which contains O(log n) nodes. Therefore, the total number of stored elements across all nodes is O(n log n), establishing the space complexity. Building the tree proceeds bottom-up. Leaves store singletons. For each internal node, we merge two sorted child vectors whose combined size equals the size of the node’s segment. Across any fixed level, the total length of merged arrays equals n, and there are O(log n) levels. Hence, the total build time is O(n log n). For a count-less-than (or count-less-or-equal) query on [l, r], we decompose the query interval into O(log n) disjoint node segments. On each visited node v, we perform a binary search (lowe or uppe) in , which costs O(log ) = O(log n). Summing over O(log n) nodes yields O(lo n) time per query. For k-th order statistics via value-domain binary search, if we compress values into a discrete domain of size , we perform O(log ) feasibility checks, each taking O(lo n), so O(log · lo n). If nodes are augmented with prefix sums to compute sums under thresholds, the asymptotic complexities remain the same, with a constant-factor increase in space. Note that point updates are not efficient with simple vectors; supporting them while maintaining order typically requires heavier per-node structures, otherwise updates can degrade to linear time per affected node, making MST primarily suitable for static datasets.

Code Examples

Static Merge Sort Tree: count elements < k and in a value interval
1#include <bits/stdc++.h>
2using namespace std;
3
4struct MergeSortTree {
5 int n; // size of the input array
6 vector<vector<int>> st; // tree: each node stores a sorted vector of values in its segment
7
8 MergeSortTree() {}
9 explicit MergeSortTree(const vector<int>& a) { build(a); }
10
11 void build(const vector<int>& a) {
12 n = (int)a.size();
13 st.assign(4 * n, {}); // 4*n is a safe upper bound for segment tree node count
14 build(1, 0, n - 1, a);
15 // Optional: shrink memory of each node to fit its content tightly
16 // for (auto &v : st) v.shrink_to_fit();
17 }
18
19 void build(int p, int L, int R, const vector<int>& a) {
20 if (L == R) {
21 st[p] = {a[L]}; // leaf: single element
22 return;
23 }
24 int M = (L + R) >> 1;
25 build(p << 1, L, M, a);
26 build(p << 1 | 1, M + 1, R, a);
27 // merge two sorted children into this node
28 st[p].resize(st[p << 1].size() + st[p << 1 | 1].size());
29 merge(st[p << 1].begin(), st[p << 1].end(),
30 st[p << 1 | 1].begin(), st[p << 1 | 1].end(),
31 st[p].begin());
32 }
33
34 // Count how many elements in A[l..r] are < k
35 int countLess(int l, int r, int k) const {
36 return countLess(1, 0, n - 1, l, r, k);
37 }
38
39 int countLess(int p, int L, int R, int i, int j, int k) const {
40 if (j < L || R < i) return 0; // disjoint
41 if (i <= L && R <= j) {
42 // lower_bound gives first position of element >= k; its index equals count of < k
43 return (int)(lower_bound(st[p].begin(), st[p].end(), k) - st[p].begin());
44 }
45 int M = (L + R) >> 1;
46 return countLess(p << 1, L, M, i, j, k) + countLess(p << 1 | 1, M + 1, R, i, j, k);
47 }
48
49 // Count how many elements in A[l..r] are <= k
50 int countLE(int l, int r, int k) const {
51 return countLE(1, 0, n - 1, l, r, k);
52 }
53
54 int countLE(int p, int L, int R, int i, int j, int k) const {
55 if (j < L || R < i) return 0; // disjoint
56 if (i <= L && R <= j) {
57 // upper_bound gives first position of element > k; index equals count of <= k
58 return (int)(upper_bound(st[p].begin(), st[p].end(), k) - st[p].begin());
59 }
60 int M = (L + R) >> 1;
61 return countLE(p << 1, L, M, i, j, k) + countLE(p << 1 | 1, M + 1, R, i, j, k);
62 }
63
64 // Count how many elements in A[l..r] fall in [a, b]
65 int countInRange(int l, int r, int a, int b) const {
66 if (a > b) return 0;
67 return countLE(l, r, b) - countLess(l, r, a);
68 }
69};
70
71int main() {
72 ios::sync_with_stdio(false);
73 cin.tie(nullptr);
74
75 int n; cin >> n;
76 vector<int> a(n);
77 for (int i = 0; i < n; ++i) cin >> a[i];
78
79 MergeSortTree mst(a);
80
81 int q; cin >> q;
82 // Query format examples (1-based indices from input, converted to 0-based):
83 // type 1 l r k : count elements < k in [l, r]
84 // type 2 l r a b : count elements in [a, b] within [l, r]
85 while (q--) {
86 int type; cin >> type;
87 if (type == 1) {
88 int l, r, k; cin >> l >> r >> k; --l; --r;
89 cout << mst.countLess(l, r, k) << "\n";
90 } else if (type == 2) {
91 int l, r, aV, bV; cin >> l >> r >> aV >> bV; --l; --r;
92 cout << mst.countInRange(l, r, aV, bV) << "\n";
93 }
94 }
95 return 0;
96}
97

This program builds a Merge Sort Tree over an input array. Each node stores the sorted values for its segment. Queries decompose [l, r] into O(log n) nodes and use binary search per node: lower_bound for counting < k and upper_bound for counting ≤ k. The count in a closed value interval [a, b] is obtained by the identity #(≤ b) − #(< a). Indices are normalized to 0-based internally.

Time: Build: O(n log n). Each query: O(log^2 n).Space: O(n log n).
K-th smallest in a range using value-domain binary search with Merge Sort Tree
1#include <bits/stdc++.h>
2using namespace std;
3
4struct MergeSortTree {
5 int n; vector<vector<int>> st;
6 MergeSortTree() {}
7 explicit MergeSortTree(const vector<int>& a) { build(a); }
8 void build(const vector<int>& a) {
9 n = (int)a.size(); st.assign(4*n, {}); build(1,0,n-1,a);
10 }
11 void build(int p, int L, int R, const vector<int>& a){
12 if(L==R){ st[p] = {a[L]}; return; }
13 int M=(L+R)>>1; build(p<<1,L,M,a); build(p<<1|1,M+1,R,a);
14 st[p].resize(st[p<<1].size()+st[p<<1|1].size());
15 merge(st[p<<1].begin(), st[p<<1].end(), st[p<<1|1].begin(), st[p<<1|1].end(), st[p].begin());
16 }
17 // count of elements <= k in [l,r]
18 int countLE(int l,int r,int k) const { return countLE(1,0,n-1,l,r,k);}
19 int countLE(int p,int L,int R,int i,int j,int k) const{
20 if(j<L||R<i) return 0;
21 if(i<=L&&R<=j) return (int)(upper_bound(st[p].begin(),st[p].end(),k)-st[p].begin());
22 int M=(L+R)>>1; return countLE(p<<1,L,M,i,j,k)+countLE(p<<1|1,M+1,R,i,j,k);
23 }
24};
25
26int main(){
27 ios::sync_with_stdio(false);
28 cin.tie(nullptr);
29
30 int n; cin>>n; vector<int>a(n); for(int i=0;i<n;++i) cin>>a[i];
31
32 // Coordinate compress values to binary search over indices [0..m-1]
33 vector<int> vals = a; sort(vals.begin(), vals.end()); vals.erase(unique(vals.begin(), vals.end()), vals.end());
34 vector<int> comp(n); for(int i=0;i<n;++i) comp[i] = (int)(lower_bound(vals.begin(), vals.end(), a[i]) - vals.begin());
35
36 MergeSortTree mst(comp);
37
38 auto kth_in_range = [&](int l, int r, int k)->int{ // 0-based l,r; k is 1-based order statistic
39 int lo = 0, hi = (int)vals.size()-1, ans = -1;
40 while(lo<=hi){
41 int mid = (lo+hi)>>1;
42 // Check how many elements <= vals[mid] in [l,r]
43 int cnt = mst.countLE(l, r, mid);
44 if(cnt >= k){ ans = mid; hi = mid-1; } else { lo = mid+1; }
45 }
46 return (ans==-1)? INT_MIN : vals[ans]; // return actual value, or INT_MIN if k is invalid
47 };
48
49 int q; cin>>q;
50 // Query: l r k => find k-th smallest in A[l..r]
51 while(q--){
52 int l,r,k; cin>>l>>r>>k; --l; --r;
53 if(l>r || k<1 || k>r-l+1){ cout << "INVALID\n"; continue; }
54 cout << kth_in_range(l,r,k) << "\n";
55 }
56 return 0;
57}
58

We compress values so that the value domain becomes [0..m−1] while preserving order. The Merge Sort Tree is built over compressed ranks. To find the k-th smallest in [l, r], we binary search over the compressed domain and use the MST to count how many elements are ≤ a candidate rank. If at least k elements are ≤ mid, we search left; otherwise, we search right. The final answer is mapped back to the original value using the compression array.

Time: Build: O(n log n). Each k-th query: O(log |V| · log n) where |V| is the number of distinct values.Space: O(n log n).
Merge Sort Tree with per-node prefix sums: sum of elements ≤ K in a range
1#include <bits/stdc++.h>
2using namespace std;
3
4struct Node {
5 vector<int> vals; // sorted values
6 vector<long long> pref; // prefix sums over vals (pref[i] = sum of vals[0..i-1], pref.size() = vals.size()+1)
7};
8
9struct MSTSum {
10 int n; vector<Node> st;
11 MSTSum() {}
12 explicit MSTSum(const vector<int>& a){ build(a); }
13
14 void build(const vector<int>& a){
15 n = (int)a.size(); st.assign(4*n, {}); build(1,0,n-1,a);
16 }
17
18 void build(int p, int L, int R, const vector<int>& a){
19 if(L==R){
20 st[p].vals = {a[L]};
21 st[p].pref = {0LL, (long long)a[L]};
22 return;
23 }
24 int M=(L+R)>>1; build(p<<1,L,M,a); build(p<<1|1,M+1,R,a);
25 auto &A = st[p<<1].vals; auto &B = st[p<<1|1].vals;
26 auto &V = st[p].vals; V.resize(A.size()+B.size());
27 merge(A.begin(),A.end(),B.begin(),B.end(),V.begin());
28 auto &P = st[p].pref; P.resize(V.size()+1); P[0]=0;
29 for(size_t i=0;i<V.size();++i) P[i+1] = P[i] + (long long)V[i];
30 }
31
32 // Sum of elements <= k in A[l..r]
33 long long sumLE(int l,int r,int k) const { return sumLE(1,0,n-1,l,r,k); }
34
35 long long sumLE(int p,int L,int R,int i,int j,int k) const{
36 if(j<L||R<i) return 0LL;
37 if(i<=L && R<=j){
38 const auto &V = st[p].vals; const auto &P = st[p].pref;
39 int idx = (int)(upper_bound(V.begin(), V.end(), k) - V.begin()); // number of elements <= k
40 return P[idx]; // prefix sum up to idx-1
41 }
42 int M=(L+R)>>1; return sumLE(p<<1,L,M,i,j,k) + sumLE(p<<1|1,M+1,R,i,j,k);
43 }
44};
45
46int main(){
47 ios::sync_with_stdio(false);
48 cin.tie(nullptr);
49
50 int n; cin>>n; vector<int>a(n); for(int i=0;i<n;++i) cin>>a[i];
51 MSTSum tree(a);
52
53 int q; cin>>q;
54 // Query: l r k => sum of elements <= k in A[l..r]
55 while(q--){
56 int l,r,k; cin>>l>>r>>k; --l; --r;
57 cout << tree.sumLE(l,r,k) << "\n";
58 }
59 return 0;
60}
61

Each node stores both the sorted values in its segment and a prefix sum array over those values. To compute the sum of elements ≤ k in [l, r], we visit O(log n) nodes and use upper_bound to find how many values are ≤ k at each node, then take the corresponding prefix sum in O(1). Summing across visited nodes gives the answer. This is useful for thresholded sums and can be extended to weighted counts.

Time: Build: O(n log n). Each query: O(log^2 n).Space: O(n log n) for values plus O(n log n) for prefix sums (same order).