Mo's Algorithm
Key Points
- •Mo's algorithm answers many range queries offline by reordering them to minimize pointer movement along the array.
- •It keeps a sliding window [L, R] and supports O(1) or O(log n) add/remove operations at the boundaries to update the current answer.
- •Sorting queries by blocks of size about \(\) yields total movement and time about \(O((n+q))\).
- •It shines when the query answer can be updated incrementally (e.g., distinct count, equal pairs, frequency-based sums).
- •Coordinate compression is often required so frequency arrays are small and cache-friendly.
- •Variants include Mo with updates (time dimension) and Hilbert order for speed in practice.
- •Common pitfalls are incorrect add/remove symmetry, 1-based vs 0-based indices, and integer overflow in pair counts.
- •Mo is offline; if you need real-time answers or arbitrary updates, consider segment trees/Fenwick trees instead.
Prerequisites
- →Big-O notation and asymptotic analysis — To understand the O((n+q)√n) bound and how block size affects runtime.
- →Two-pointer/sliding window technique — Mo’s window updates are sliding one step at a time at both ends.
- →Sorting and custom comparators in C++ — Queries must be sorted by blocks, and a correct comparator ensures deterministic order.
- →Coordinate compression — Allows using compact frequency arrays for large value ranges.
- →Discrete mathematics: combinations — Pair-count queries use k choose 2 = k(k−1)/2.
- →Hashing and frequency tables — Core to implementing O(1) add/remove via counts.
- →Segment tree and Fenwick tree (contrast) — Helps decide when not to use Mo and what alternatives exist for online or update-heavy tasks.
Detailed Explanation
Tap terms for definitions01Overview
Mo's algorithm is a technique to answer many range queries efficiently by reordering them. Imagine you have an array and many questions of the form: “What is the answer for subarray [l, r]?” If you answer each question from scratch, you repeat a lot of work. Mo’s algorithm avoids this by sorting the queries in a special order so that moving from one query’s interval to the next only changes the window’s edges slightly. Then, with fast add/remove operations at the edges, you can update the current answer incrementally. The key idea is block decomposition: split the index range into blocks of size about sqrt(n), and sort queries by the block of l, then by r. This ensures that across the entire sorted list, the window [L, R] doesn’t jump wildly too often, keeping the total pointer movement near O((n+q)√n). Because Mo is offline, you can’t process queries as they arrive—you must read them all first, sort them, and then compute answers. It’s commonly used for problems where the answer depends on element frequencies in the current range, like counting distinct values, counting equal pairs, or frequency-weighted sums. The algorithm is simple to implement and very effective in practice when add/remove are O(1).
02Intuition & Analogies
Think of a librarian answering questions about pages in a book. Each question asks: “From page l to page r, how many distinct words appear?” If the librarian opens to l and scans to r from scratch every time, it’s slow. A smarter approach is to order the questions so that the next question is close to the current one—maybe it starts on a nearby page and ends a little further. Then the librarian only flips a few pages forward or backward to adjust, and reuses previous counts. Mo’s algorithm is this strategy for arrays. We walk a window [L, R] across the array. Between consecutive queries, we slide the left edge L and the right edge R in small steps. Each one-step move adds or removes exactly one element at the border, so if we can update the current answer fast when a single element enters or leaves, we can keep the total cost low. Why block sorting? If we group l by blocks of size B ≈ sqrt(n), we can guarantee that l doesn’t change block too often—only O(n/B) times—and within each block, we sort by r, minimizing large jumps of the right edge. The total edge movement then becomes bounded by a near-linear factor times the block size. This is like visiting houses in a neighborhood by streets: first finish all houses on street 1 in increasing house numbers, then street 2, and so on—you walk far less than if you visit houses in random order.
03Formal Definition
04When to Use
Use Mo’s algorithm when: (1) you have many offline range queries over a static array; (2) the answer can be incrementally maintained by adding/removing a single boundary element in O(1) or O(log n); and (3) exact online order isn’t required. Typical applications include: counting distinct elements; counting pairs with equal values; computing sums like Σ f(freq[x]) over the current range; or maintaining bitwise OR/AND if add/remove can be updated quickly with counters. It also extends to Mo with updates (a third dimension “time” for point updates between queries) and Mo on trees (via Euler tour to turn path queries into intervals). Avoid Mo when queries are few (brute force may suffice), when add/remove are expensive (e.g., complex data structures per element), or when online/real-time answers are mandatory. For problems where answers compose associatively and support fast range aggregation (e.g., sum, min, max), segment trees or Fenwick trees may be simpler and faster. For dynamic range updates and queries, consider Fenwick/segment trees with lazy propagation or sqrt decomposition instead.
⚠️Common Mistakes
- Incorrect add/remove symmetry: add(x) and remove(x) must be exact inverses on the maintained state. A tiny mismatch yields subtle wrong answers. Test with small arrays and random queries. - Indexing errors: Input is often 1-based; implementations are 0-based internally. Convert consistently and define whether ranges are inclusive [l, r]. - Block size choice: Using B = n or B = 1 destroys performance. Use B ≈ √n (or n^{2/3} for Mo with updates) and consider even–odd ordering of r to reduce pointer oscillation. - Missing coordinate compression: If values are large (e.g., up to 1e9), using them as frequency indices causes memory blowups and cache misses. Compress first. - Overflow in aggregated answers: Pair counts use combinations like cnt·(cnt−1)/2; store answers in 64-bit (long long) to avoid overflow. - Forgetting sorting stability conditions: The comparator for queries must define a total order; beware of ties and ensure deterministic ordering (e.g., by r and then id). - Slow I/O and unnecessary clearing: Don’t clear frequency arrays per query; reuse the sliding window. Use fast I/O for large inputs. - Assuming Mo is online: It’s offline. You must read all queries first and cannot output in the input order without storing answers and reordering back by id.
Key Formulas
Movement Cost Trade-off
Explanation: Total time is proportional to left-edge movement ((n+q)·B) plus right-edge movement (q·n/B). Choosing B balances these terms. This captures why B ≈ √n is optimal for standard Mo.
Optimal Block Size
Explanation: Minimizing the trade-off yields B on the order of √n when q≈n. In practice we use B = ⌈√n⌉, which is robust when q and n are same order.
Standard Mo Complexity
Explanation: With B ≈ √n and O(1) add/remove, total time is roughly (n+q)√n, plus O(q log q) for sorting. This is the common rule of thumb.
Pairs From Frequency
Explanation: If a value appears k times in a range, it contributes k choose 2 equal pairs. Summing over all values gives the total number of equal pairs.
Mo With Updates Tuning
Explanation: When adding a time dimension with m updates, sorting by (l-block, r-block, time) and using B ≈ balances movements in three axes to get the typical bound.
Frequency Functional
Explanation: Many Mo-compatible answers depend only on frequencies in the current window. The function f must support O(1) updates when a single frequency changes by ±1.
Space Usage
Explanation: We store the array and auxiliary frequency structures over a universe of size U (often the number of distinct values after compression).
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct Query { 5 int l, r, id; 6 int block_l; // precomputed block of l for faster sort 7 }; 8 9 int main() { 10 ios::sync_with_stdio(false); 11 cin.tie(nullptr); 12 13 // Input format: 14 // n q 15 // a[0..n-1] 16 // q lines: l r (1-based, inclusive) 17 int n, q; 18 if (!(cin >> n >> q)) return 0; 19 vector<long long> a(n); 20 for (int i = 0; i < n; ++i) cin >> a[i]; 21 22 // Coordinate compression so we can index frequency arrays compactly 23 vector<long long> vals = a; 24 sort(vals.begin(), vals.end()); 25 vals.erase(unique(vals.begin(), vals.end()), vals.end()); 26 vector<int> arr(n); 27 for (int i = 0; i < n; ++i) { 28 arr[i] = int(lower_bound(vals.begin(), vals.end(), a[i]) - vals.begin()); 29 } 30 int U = (int)vals.size(); // universe size after compression 31 32 // Read queries and convert to 0-based inclusive indices 33 vector<Query> queries(q); 34 for (int i = 0; i < q; ++i) { 35 int l, r; cin >> l >> r; --l; --r; 36 queries[i] = {l, r, i, 0}; 37 } 38 39 // Choose block size ~ sqrt(n) 40 int B = max(1, (int)sqrt((double)n)); 41 for (auto &qr : queries) qr.block_l = qr.l / B; 42 43 // Sort by (block_l, r) with parity trick on r to reduce movement 44 sort(queries.begin(), queries.end(), [&](const Query &A, const Query &Bq){ 45 if (A.block_l != Bq.block_l) return A.block_l < Bq.block_l; 46 if (A.block_l & 1) return A.r > Bq.r; // odd blocks: sort r descending 47 return A.r < Bq.r; // even blocks: sort r ascending 48 }); 49 50 vector<int> cnt(U, 0); // frequency of each value in current window 51 vector<int> ans(q, 0); 52 53 auto add = [&](int x, int &cur){ 54 // x is the compressed value 55 if (cnt[x] == 0) ++cur; // new distinct value enters the window 56 ++cnt[x]; 57 }; 58 auto remove_one = [&](int x, int &cur){ 59 // remove one occurrence of x from the window 60 --cnt[x]; 61 if (cnt[x] == 0) --cur; // value no longer present 62 }; 63 64 int cur = 0; // current number of distinct values 65 int L = 0, R = -1; // current window is empty 66 67 for (const auto &qr : queries) { 68 int l = qr.l, r = qr.r; 69 // Expand / shrink to [l, r] 70 while (L > l) add(arr[--L], cur); 71 while (R < r) add(arr[++R], cur); 72 while (L < l) remove_one(arr[L++], cur); 73 while (R > r) remove_one(arr[R--], cur); 74 ans[qr.id] = cur; 75 } 76 77 // Output in input order 78 for (int i = 0; i < q; ++i) { 79 cout << ans[i] << '\n'; 80 } 81 return 0; 82 } 83
We compress values to 0..U-1 and maintain a frequency array cnt. The current answer is the number of values with nonzero frequency. As we reorder queries by blocks of l and then by r (with alternating r order), we slide the [L, R] window and update the answer via add/remove in O(1) per step. This yields about O((n+q)√n) time overall.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct Query { int l, r, id, block_l; }; 5 6 int main(){ 7 ios::sync_with_stdio(false); 8 cin.tie(nullptr); 9 10 int n, q; if(!(cin >> n >> q)) return 0; 11 vector<long long> a(n); 12 for(int i=0;i<n;++i) cin >> a[i]; 13 14 // Coordinate compression 15 vector<long long> vals = a; sort(vals.begin(), vals.end()); 16 vals.erase(unique(vals.begin(), vals.end()), vals.end()); 17 vector<int> arr(n); 18 for(int i=0;i<n;++i) arr[i] = int(lower_bound(vals.begin(), vals.end(), a[i]) - vals.begin()); 19 int U = (int)vals.size(); 20 21 vector<Query> qs(q); 22 for(int i=0;i<q;++i){ 23 int l,r; cin>>l>>r; --l; --r; 24 qs[i] = {l,r,i,0}; 25 } 26 27 int B = max(1, (int)sqrt((double)n)); 28 for(auto &qq: qs) qq.block_l = qq.l / B; 29 30 sort(qs.begin(), qs.end(), [&](const Query& A, const Query& Bq){ 31 if(A.block_l != Bq.block_l) return A.block_l < Bq.block_l; 32 if(A.block_l & 1) return A.r > Bq.r; // parity trick 33 return A.r < Bq.r; 34 }); 35 36 vector<long long> cnt(U, 0); 37 vector<long long> ans(q, 0); 38 39 auto add = [&](int x, long long &cur){ 40 // When we add one more x, the number of equal pairs increases by current cnt[x] 41 cur += cnt[x]; 42 ++cnt[x]; 43 }; 44 auto remove_one = [&](int x, long long &cur){ 45 // When removing one x, decrement cnt first; pairs drop by new cnt[x] 46 --cnt[x]; 47 cur -= cnt[x]; 48 }; 49 50 long long cur = 0; 51 int L = 0, R = -1; 52 53 for(const auto &qq : qs){ 54 int l = qq.l, r = qq.r; 55 while(L > l) add(arr[--L], cur); 56 while(R < r) add(arr[++R], cur); 57 while(L < l) remove_one(arr[L++], cur); 58 while(R > r) remove_one(arr[R--], cur); 59 ans[qq.id] = cur; // cur equals sum over x of C(cnt[x], 2) 60 } 61 62 for(int i=0;i<q;++i) cout << ans[i] << '\n'; 63 return 0; 64 } 65
For each value x with frequency k in the current range, it contributes k choose 2 equal pairs. Adding an x creates new pairs equal to the current count of x; removing an x destroys pairs equal to the (updated) count. This leads to O(1) updates per element move and the same Mo complexity as before.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct Query { int l, r, t, id; int bl, br; }; 5 struct Update { int pos, prev, next; }; 6 7 int main(){ 8 ios::sync_with_stdio(false); 9 cin.tie(nullptr); 10 11 // Input format: 12 // n m 13 // a[0..n-1] 14 // then m operations, each either: 15 // Q l r (1-based, inclusive) 16 // U p x (point update: set a[p]=x) 17 // We answer all Q after interleaving with U. 18 19 int n, m; if(!(cin >> n >> m)) return 0; 20 vector<long long> orig(n); 21 for(int i=0;i<n;++i) cin >> orig[i]; 22 23 struct Op { char type; int x, y; }; // 'Q' l r or 'U' p x 24 vector<Op> ops(m); 25 vector<long long> allVals = orig; 26 27 for(int i=0;i<m;++i){ 28 char tp; cin >> tp; 29 if(tp=='Q'){ 30 int l,r; cin>>l>>r; ops[i] = {tp, l-1, r-1}; 31 }else{ // 'U' 32 int p; long long x; cin>>p>>x; ops[i] = {tp, p-1, 0}; allVals.push_back(x); 33 } 34 } 35 36 // Coordinate compression over initial and update values 37 sort(allVals.begin(), allVals.end()); 38 allVals.erase(unique(allVals.begin(), allVals.end()), allVals.end()); 39 40 vector<int> arr(n); 41 for(int i=0;i<n;++i) arr[i] = int(lower_bound(allVals.begin(), allVals.end(), orig[i]) - allVals.begin()); 42 43 vector<Query> qs; qs.reserve(m); 44 vector<Update> us; us.reserve(m); 45 46 // Build queries with time index; record updates with prev/next 47 int t = 0; // number of updates processed so far 48 for(int i=0;i<m;++i){ 49 if(ops[i].type=='Q'){ 50 Query qy; qy.l = ops[i].x; qy.r = ops[i].y; qy.t = t; qy.id = (int)qs.size(); qy.bl = qy.br = 0; 51 qs.push_back(qy); 52 }else{ 53 int p = ops[i].x; // value in ops[i].y is unused placeholder 54 // next compressed value from allVals using the next read value stored earlier 55 // We cannot access the raw x now; but we pushed it to allVals and will reconstruct via input? 56 // Simpler: store next in a secondary pass. For clarity, rebuild updates: 57 } 58 } 59 // Re-parse to build updates with known compressed values 60 // We need to track current value to know prev and next 61 vector<int> cur = arr; // current compressed array during building 62 us.clear(); t = 0; qs.clear(); 63 64 // For reproducibility, read input again is not possible; instead, we stored ops but not x for updates. 65 // Let's enhance Op to store 64-bit x. We'll redo reading above accordingly. 66 // Restart program if not compiled with that; to keep this self-contained, we implement properly below. 67 return 0; 68 } 69
Mo with updates needs three kinds of moves: L, R, and time t. Each query is associated with a time—the number of updates that have occurred before it. We sort by (l-block, r-block, t) and, when moving in time, apply or rollback updates while adjusting the current window if the updated position lies inside. Due to input handling complexity, see the next corrected version below.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct Query { int l, r, t, id; int bl, br; }; 5 struct UpRaw { int pos; long long x; }; // raw value for reading 6 struct Update { int pos, prev, next; }; // compressed values for processing 7 8 int main(){ 9 ios::sync_with_stdio(false); 10 cin.tie(nullptr); 11 12 int n, m; if(!(cin >> n >> m)) return 0; 13 vector<long long> a0(n); 14 for(int i=0;i<n;++i) cin >> a0[i]; 15 16 struct Op { char type; int p, l, r; long long x; }; 17 vector<Op> ops(m); 18 19 vector<long long> allVals = a0; 20 for(int i=0;i<m;++i){ 21 char tp; cin >> tp; ops[i].type = tp; 22 if(tp=='Q'){ 23 int l,r; cin>>l>>r; ops[i].l = l-1; ops[i].r = r-1; 24 }else{ // 'U' 25 int p; long long x; cin>>p>>x; ops[i].p = p-1; ops[i].x = x; allVals.push_back(x); 26 } 27 } 28 29 // Coordinate compression over initial and all update values 30 sort(allVals.begin(), allVals.end()); 31 allVals.erase(unique(allVals.begin(), allVals.end()), allVals.end()); 32 33 auto compress = [&](long long v){ 34 return (int)(lower_bound(allVals.begin(), allVals.end(), v) - allVals.begin()); 35 }; 36 37 vector<int> arr(n); 38 for(int i=0;i<n;++i) arr[i] = compress(a0[i]); 39 40 vector<Query> qs; qs.reserve(m); 41 vector<Update> us; us.reserve(m); 42 43 // Build query list with time indices and update list with prev/next (compressed) 44 int t = 0; // number of updates before current op 45 vector<int> cur = arr; // current compressed values while constructing updates 46 for(int i=0;i<m;++i){ 47 if(ops[i].type=='Q'){ 48 Query qy; qy.l = ops[i].l; qy.r = ops[i].r; qy.t = t; qy.id = (int)qs.size(); qy.bl = qy.br = 0; qs.push_back(qy); 49 }else{ // update 50 int p = ops[i].p; int nxt = compress(ops[i].x); int prv = cur[p]; 51 us.push_back({p, prv, nxt}); 52 cur[p] = nxt; ++t; 53 } 54 } 55 56 int q = (int)qs.size(); int U = (int)allVals.size(); 57 58 // Choose block size ~ n^(2/3) 59 int B = max(1, (int)pow(n, 2.0/3.0)); 60 for(auto &qr: qs){ qr.bl = qr.l / B; qr.br = qr.r / B; } 61 62 sort(qs.begin(), qs.end(), [&](const Query &A, const Query &Bq){ 63 if (A.bl != Bq.bl) return A.bl < Bq.bl; 64 if (A.br != Bq.br) return A.br < Bq.br; 65 return A.t < Bq.t; 66 }); 67 68 vector<int> cnt(U, 0); 69 vector<int> ans(q, 0); 70 71 auto add = [&](int x, int &curAns){ if(cnt[x]==0) ++curAns; ++cnt[x]; }; 72 auto rem = [&](int x, int &curAns){ --cnt[x]; if(cnt[x]==0) --curAns; }; 73 74 int L = 0, R = -1, T = 0; // current window [L,R] and time index T 75 int curAns = 0; 76 vector<int> A = arr; // mutable array for applying updates 77 78 auto apply = [&](int tIdx){ // apply us[tIdx]: set A[pos] from prev -> next 79 int p = us[tIdx].pos; int from = us[tIdx].prev; int to = us[tIdx].next; 80 if(L <= p && p <= R){ 81 rem(from, curAns); 82 add(to, curAns); 83 } 84 A[p] = to; 85 }; 86 auto rollback = [&](int tIdx){ // rollback us[tIdx-1]: set A[pos] from next -> prev 87 int p = us[tIdx-1].pos; int from = us[tIdx-1].next; int to = us[tIdx-1].prev; 88 if(L <= p && p <= R){ 89 rem(from, curAns); 90 add(to, curAns); 91 } 92 A[p] = to; 93 }; 94 95 for(const auto &qr : qs){ 96 int l = qr.l, r = qr.r, targetT = qr.t; 97 while (T < targetT) { apply(T); ++T; } 98 while (T > targetT) { rollback(T); --T; } 99 while (L > l) add(A[--L], curAns); 100 while (R < r) add(A[++R], curAns); 101 while (L < l) rem(A[L++], curAns); 102 while (R > r) rem(A[R--], curAns); 103 ans[qr.id] = curAns; 104 } 105 106 for(int i=0;i<q;++i) cout << ans[i] << '\n'; 107 return 0; 108 } 109
We read a mix of queries and point updates. After compressing all values, we build queries annotated with their time (number of updates before them) and a list of updates that know prev and next compressed values. We sort queries by (l-block, r-block, t). While processing, we move L, R, and T; moving in time applies or rolls back updates and adjusts the answer if the updated position is within [L, R]. Distinct count uses O(1) add/remove/apply/rollback.