Coordinate Compression
Key Points
- •Coordinate compression replaces large, sparse, or arbitrary values with small consecutive integers while preserving relative order.
- •You collect all values that will be referenced, sort them, remove duplicates, and use binary search to map each original value to its index.
- •It enables array-based data structures like Fenwick Trees and Segment Trees to work efficiently on sparse domains.
- •Compression is typically an offline step, so you must include every value that will ever be queried or updated.
- •Equal original values map to the same compressed index, guaranteeing consistent treatment of duplicates.
- •The core operations are O(n n) for building the compressor and O( n) per lookup via lowe.
- •When intervals matter, store the sorted coordinates to use their gaps (differences) for weighted segment trees.
- •Always keep a reverse map (the unique sorted values) if you need to recover original values.
Prerequisites
- →Sorting and Comparators — Compression relies on sorting values and removing duplicates.
- →Binary Search / lower_bound — Mapping a value to its compressed index uses binary search on the unique array.
- →Dynamic Arrays (std::vector) — Storing collected coordinates and building the unique array requires comfortable use of vectors.
- →Maps and Hash Maps — Optional optimization for O(1) expected-time lookups if you prebuild a value->index map.
- →Fenwick Tree (Binary Indexed Tree) — A primary consumer of compressed indices for frequency and prefix-sum queries.
- →Segment Tree — Commonly paired with compression for range queries/updates and length-aware computations.
- →Prefix Sums / Difference Arrays — Useful with compressed coordinates for interval coverage and sweep-line style problems.
Detailed Explanation
Tap terms for definitions01Overview
Coordinate compression is a preprocessing technique that converts a set of possibly large or sparse values into a compact range of consecutive integers [0, u), where u is the number of unique values. The fundamental idea is to preserve ordering—if a < b originally, then comp(a) < comp(b) after compression—while eliminating unnecessary gaps. In practice, you gather every value that a data structure will need to access (array values, event times, interval endpoints, query points), sort them, and remove duplicates. Each original value is then mapped to its index in this sorted unique array using binary search. This approach is a cornerstone for competitive programming and systems where memory locality and array indexing matter, such as Fenwick Trees (Binary Indexed Trees), Segment Trees, sweep-line algorithms, and difference arrays. Because many problems reference only a small subset of a massive value space (e.g., timestamps up to 10^18), compression allows you to simulate operations as if they were on the full space but pay only for the unique values actually used. The technique is lightweight, language-agnostic, and easy to implement, yet it often turns an otherwise infeasible solution into an efficient one.
02Intuition & Analogies
Imagine numbering houses along a very long street where house numbers range from 1 to 10^12, but only 20 houses are actually occupied. If you want to deliver mail efficiently, it's wasteful to prepare a huge array with a slot for every possible house number. Instead, you could put all occupied house numbers in a list, sort them, and then call the smallest one 0, the next one 1, and so on. You will still deliver to houses in the right order, and you can store your delivery data in a compact array of size 20. That's coordinate compression. Another analogy: think of a subway map; actual geographic distances might be large and irregular, but the map compresses stations to evenly spaced dots that keep the correct left-to-right or up-down order. Navigation becomes simpler while the relative structure remains intact. In algorithms, this "map" lets you index by position quickly (like looking up a station's row on the map) and use contiguous arrays where raw values would be too big or sparse. When we do interval operations (like painting road segments), we sometimes also need to remember the real distances between consecutive compressed points (the gaps), just like knowing the real distance between two consecutive subway stations if we care about total travel length.
03Formal Definition
04When to Use
Use coordinate compression whenever you need to index or query a data structure by values drawn from a huge or sparse domain, but only a small subset of distinct values occurs. Classic applications include: 1) Fenwick Tree/Segment Tree over values up to 10^9 when only n \ll 10^9 positions are touched; 2) Inversion counting or frequency queries on large integers; 3) Sweep-line algorithms for events happening at many distinct timestamps; 4) Interval problems where endpoints are large but few, such as painting segments, counting coverage, or computing the union length of intervals; 5) Offline query processing where you can collect all update and query coordinates first; 6) Ranking values while preserving order (e.g., compressing ratings, scores, or coordinates in 2D/3D by compressing each dimension independently). If your algorithm must handle online inputs with unseen coordinates arriving later, either redesign to collect them first (offline) or choose a dynamic order-statistics structure (like an order_of_key tree) instead of static compression. Compression is especially helpful in competitive programming when memory limits and time constraints make direct indexing impossible.
⚠️Common Mistakes
- Forgetting to include coordinates that appear only in queries: you must collect values from both updates and queries before building the compressor; otherwise, lookups may fail or map wrongly. 2) Using upper_bound instead of lower_bound: the compressed index must be the first position where value could be inserted, i.e., rank; upper_bound shifts by duplicates and breaks consistency. 3) Assuming equal spacing after compression: compressed indices are consecutive but real gaps between original values can vary; use the differences U[i+1] - U[i] when lengths matter (e.g., union of segments). 4) Mixing 0-based and 1-based indexes with Fenwick/Segment Trees: compress to 0-based, then convert carefully when building 1-based trees. 5) Rebuilding compression per operation: compression is an offline preprocessing step; repeatedly sorting per query is inefficient. 6) Not storing the reverse map: if you need to output or interpret answers in original coordinates, keep U so you can map back. 7) Overflow in differences: when coordinates are up to 10^18, store U in 64-bit types (long long) and compute gaps in long long. 8) Attempting to use compression for online streams with unseen values: either buffer inputs to compress later or use balanced trees/ordered sets that support order statistics dynamically.
Key Formulas
Unique Sorted Set
Explanation: Build the sorted array U of all distinct values you will reference. This is the backbone for mapping values to ranks.
Compression Mapping
Explanation: The compressed index of x is the number of unique values strictly less than x. This is exactly lowe(U, x).
Reverse Mapping
Explanation: To recover the original value of a compressed index i, look up U[i]. Keep U if you need to print or compute in original units.
Build Time
Explanation: Sorting n values dominates the construction of the compression mapping. The big-O means time grows on the order of n log n.
Lookup Time
Explanation: Each value-to-index mapping uses binary search on the sorted unique array, which is logarithmic in the number of unique values.
Space Usage
Explanation: You store only the unique values and optional hash maps; space scales with the number of distinct coordinates.
Gap Weights
Explanation: When measuring lengths across segments, weight each elementary segment by its original coordinate gap. This restores real distances after compression.
Union Length by Gaps
Explanation: The total covered length equals the sum over covered elementary segments times their gap lengths. A segment tree can maintain covered[i] efficiently.
Inversion Count
Explanation: Count how many earlier elements are greater than the current element. Compression lets a Fenwick Tree work when A[i] values are large.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Generic compression for a vector of values T (supports long long, int, string, etc. if comparable) 5 template <class T> 6 struct Compressor { 7 vector<T> vals; // sorted unique values (reverse map) 8 9 Compressor() {} 10 explicit Compressor(const vector<T>& v) { build(v); } 11 12 void build(const vector<T>& v) { 13 vals = v; 14 sort(vals.begin(), vals.end()); 15 vals.erase(unique(vals.begin(), vals.end()), vals.end()); 16 } 17 18 // Map a value x to its compressed index in [0, vals.size()) 19 int index(const T& x) const { 20 auto it = lower_bound(vals.begin(), vals.end(), x); 21 if (it == vals.end() || *it != x) { 22 // If x was not collected beforehand, this is an error in offline setup 23 // In production, you might choose to insert or handle separately 24 throw runtime_error("Value not found in compressor (did you collect all values?)"); 25 } 26 return int(it - vals.begin()); 27 } 28 29 // Try to map x; returns -1 if not present (safer alternative to index()) 30 int index_or_neg1(const T& x) const { 31 auto it = lower_bound(vals.begin(), vals.end(), x); 32 return (it != vals.end() && *it == x) ? int(it - vals.begin()) : -1; 33 } 34 35 // Reverse map: from compressed index to original value 36 const T& value(int idx) const { return vals[idx]; } 37 38 // Size of the compressed domain 39 int size() const { return (int)vals.size(); } 40 }; 41 42 int main() { 43 ios::sync_with_stdio(false); 44 cin.tie(nullptr); 45 46 // Example: compress possibly large or negative integers 47 vector<long long> a = {1000000000000LL, -5, 7, -5, 7, 42}; 48 49 Compressor<long long> C(a); 50 51 cout << "Unique sorted values (reverse map):\n"; 52 for (auto v : C.vals) cout << v << ' '; 53 cout << "\n"; 54 55 cout << "Compressed indices for original array:\n"; 56 for (auto v : a) cout << C.index(v) << ' '; 57 cout << "\n"; 58 59 // Demonstrate reverse mapping 60 int idx = C.index(42); 61 cout << "Index of 42: " << idx << ", reverse value: " << C.value(idx) << "\n"; 62 63 // Safe lookup that won't throw 64 cout << "Index of missing value 123 (or -1): " << C.index_or_neg1(123) << "\n"; 65 66 return 0; 67 } 68
This program defines a reusable Compressor that builds the sorted unique array and provides forward (value to index) and reverse (index to value) mapping. It demonstrates compression on a list with duplicates and large/negative values, shows how duplicates map to the same index, and how to recover original values.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct Fenwick { 5 int n; vector<long long> bit; // 1-based Fenwick Tree 6 Fenwick(int n=0): n(n), bit(n+1, 0) {} 7 void add(int idx, long long val) { // add val at idx (1-based) 8 for (; idx <= n; idx += idx & -idx) bit[idx] += val; 9 } 10 long long sum_prefix(int idx) const { // sum [1..idx] 11 long long res = 0; 12 for (; idx > 0; idx -= idx & -idx) res += bit[idx]; 13 return res; 14 } 15 long long sum_range(int l, int r) const { // sum [l..r] 16 if (l > r) return 0; 17 return sum_prefix(r) - sum_prefix(l - 1); 18 } 19 }; 20 21 int main() { 22 ios::sync_with_stdio(false); 23 cin.tie(nullptr); 24 25 int n; 26 // Example input: 5\n 8 4 2 1 3 27 if (!(cin >> n)) return 0; 28 vector<long long> a(n); 29 for (int i = 0; i < n; ++i) cin >> a[i]; 30 31 // 1) Compress values so we can index a Fenwick Tree 32 vector<long long> vals = a; 33 sort(vals.begin(), vals.end()); 34 vals.erase(unique(vals.begin(), vals.end()), vals.end()); 35 auto get = [&](long long x) { 36 return int(lower_bound(vals.begin(), vals.end(), x) - vals.begin()); 37 }; 38 39 int u = (int)vals.size(); 40 Fenwick fw(u); // will use 1-based indices, so add +1 when accessing 41 42 // 2) Count inversions: for each a[i], count how many previous elements > a[i] 43 long long inv = 0; 44 for (int i = 0; i < n; ++i) { 45 int r = get(a[i]) + 1; // to 1-based 46 long long leq = fw.sum_prefix(r); // number of previous elements <= a[i] 47 long long seen = fw.sum_prefix(u); // total previous elements 48 inv += (seen - leq); // those strictly greater than a[i] 49 fw.add(r, 1); 50 } 51 52 cout << inv << "\n"; 53 return 0; 54 } 55
We compress the array values to ranks so a Fenwick Tree can index them. Iterating from left to right, the tree stores counts of seen elements by rank. For each a[i], the number of previous elements greater than a[i] is total_seen minus count of elements <= a[i]. The Fenwick Tree keeps operations at O(log u) even if values are up to 10^9.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 int main() { 5 ios::sync_with_stdio(false); 6 cin.tie(nullptr); 7 8 int M, Q; 9 // Input format: 10 // M Q 11 // M lines: L R (cover interval [L, R], inclusive) 12 // Q lines: X (query coverage at point X) 13 // Example: 14 // 3 3 15 // 1 5 16 // 10 20 17 // 7 10 18 // 1 19 // 10 20 // 15 21 if (!(cin >> M >> Q)) return 0; 22 23 vector<long long> L(M), R(M), X(Q); 24 vector<long long> all; all.reserve(2*M + Q); 25 for (int i = 0; i < M; ++i) { 26 cin >> L[i] >> R[i]; 27 if (L[i] > R[i]) swap(L[i], R[i]); 28 all.push_back(L[i]); 29 all.push_back(R[i] + 1); // use R+1 for inclusive intervals in diff array 30 } 31 for (int i = 0; i < Q; ++i) { 32 cin >> X[i]; 33 all.push_back(X[i]); 34 } 35 36 // Coordinate compression 37 sort(all.begin(), all.end()); 38 all.erase(unique(all.begin(), all.end()), all.end()); 39 auto get = [&](long long v){ return int(lower_bound(all.begin(), all.end(), v) - all.begin()); }; 40 41 // Difference array over compressed axis 42 vector<long long> diff(all.size() + 1, 0); 43 for (int i = 0; i < M; ++i) { 44 int li = get(L[i]); 45 int ri = get(R[i] + 1); // end at R+1 to make [L, R] inclusive 46 diff[li] += 1; 47 diff[ri] -= 1; 48 } 49 50 // Prefix sum to get coverage counts at every compressed coordinate 51 for (size_t i = 1; i < all.size(); ++i) diff[i] += diff[i-1]; 52 53 // Answer queries: for a point X, its coverage is diff[ index(X) ] 54 for (int i = 0; i < Q; ++i) { 55 int xi = get(X[i]); 56 cout << diff[xi] << "\n"; 57 } 58 return 0; 59 } 60
We must include all relevant coordinates: interval starts L, R+1 (for inclusive ranges in a difference array), and query points X. After compression, we mark +1 at L and −1 at R+1 in a diff array and take a prefix sum. Each query point’s compressed index gives its coverage count. Compression makes the memory proportional to the number of unique coordinates instead of the raw coordinate bounds.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Segment tree that maintains the total covered length of a union of intervals. 5 // We maintain a cover count per node; if count > 0, the node's segment is fully covered. 6 // Otherwise, covered length is the sum of children's covered lengths. We use compressed 7 // coordinates and the real gaps (U[i+1]-U[i]) to accumulate actual lengths. 8 9 struct SegTree { 10 struct Node { int cover = 0; long long len = 0; }; // cover count, covered length 11 int n; // number of elementary segments = U.size()-1 12 vector<Node> st; // segment tree 13 const vector<long long>& U; // reference to coordinate array 14 15 SegTree(int n, const vector<long long>& U) : n(n), st(4*n), U(U) {} 16 17 void pull(int p, int L, int R) { 18 if (st[p].cover > 0) { 19 st[p].len = U[R+1] - U[L]; // fully covered 20 } else if (L == R) { 21 st[p].len = 0; // leaf, uncovered 22 } else { 23 st[p].len = st[p<<1].len + st[p<<1|1].len; 24 } 25 } 26 27 // Update [ql, qr] over elementary segments (each represents [U[i], U[i+1])) 28 void update(int p, int L, int R, int ql, int qr, int delta) { 29 if (qr < L || R < ql) return; 30 if (ql <= L && R <= qr) { 31 st[p].cover += delta; 32 pull(p, L, R); 33 return; 34 } 35 int M = (L + R) >> 1; 36 update(p<<1, L, M, ql, qr, delta); 37 update(p<<1|1, M+1, R, ql, qr, delta); 38 pull(p, L, R); 39 } 40 41 // Convenience wrapper 42 void add_interval(int l, int r, int delta) { // add delta to [l, r-1] elementary segments 43 if (l >= r) return; 44 update(1, 0, n-1, l, r-1, delta); 45 } 46 47 long long covered_length() const { return st[1].len; } 48 }; 49 50 int main() { 51 ios::sync_with_stdio(false); 52 cin.tie(nullptr); 53 54 int m; 55 // Input: m intervals [L, R) in long long, compute union length 56 // Example: 57 // 3 58 // 1 5 59 // 2 7 60 // 10 12 61 if (!(cin >> m)) return 0; 62 vector<long long> L(m), R(m); 63 vector<long long> U; U.reserve(2*m); 64 for (int i = 0; i < m; ++i) { 65 cin >> L[i] >> R[i]; 66 if (L[i] > R[i]) swap(L[i], R[i]); 67 U.push_back(L[i]); 68 U.push_back(R[i]); 69 } 70 71 sort(U.begin(), U.end()); 72 U.erase(unique(U.begin(), U.end()), U.end()); 73 74 auto get = [&](long long x){ return int(lower_bound(U.begin(), U.end(), x) - U.begin()); }; 75 76 int nSeg = max(0, (int)U.size() - 1); // number of elementary segments 77 SegTree st(nSeg, U); 78 79 for (int i = 0; i < m; ++i) { 80 int l = get(L[i]); 81 int r = get(R[i]); 82 st.add_interval(l, r, +1); // cover [L[i], R[i]) 83 } 84 85 cout << st.covered_length() << "\n"; 86 return 0; 87 } 88
We compress interval endpoints and build a segment tree over elementary segments [U[i], U[i+1]). Each node stores a cover count and the total covered length. If a node’s cover count is positive, it is fully covered and contributes its full real length U[R+1] − U[L]; otherwise it combines children. This computes the union length of many large-coordinate intervals in O(m log U).