Parallel Binary Search
Key Points
- β’Parallel Binary Search (PBS) lets you binary-search the answers of many queries at once by batching them by their current mid value.
- β’It relies on a monotone predicate: for each query q and answer candidate x, feasibility P(q, x) must be monotonic in x.
- β’Each round groups queries by mid, sweeps updates up to mid once, evaluates all queries in that bucket, and refines their search intervals.
- β’With an add-only data structure per round, the total work is roughly O((U + Q) log A Γ processing cost), instead of O(Q Γ U Γ log A).
- β’Classic use cases include earliest time of connectivity with incremental edges and offline k-th number in a subarray using a Fenwick tree.
- β’PBS is an offline technique; you must rebuild or reset the data structure each round or carefully support rollbacks.
- β’Correctness hinges on monotonicity, , and clean data structure state between rounds.
- β’Use PBS when a per-query binary search is too slow but all queries share the same ordered answer domain or timeline.
Prerequisites
- βBinary search and monotonic predicates β PBS generalizes classic binary search by applying it to many queries at once; understanding monotonicity is essential.
- βDisjoint Set Union (Union-Find) β Many PBS problems use DSU for connectivity checks when sweeping time or thresholds.
- βFenwick tree / Binary Indexed Tree β Counting and coverage checks during sweeps are often implemented with Fenwick trees.
- βCoordinate compression β PBS commonly searches over large value domains; compression keeps buckets and arrays small.
- βOffline algorithms β PBS requires batching and reordering queries by current mid, which is inherently offline.
- βSweep-line / pointer techniques β Efficient PBS relies on incrementally applying updates as the mid increases.
Detailed Explanation
Tap terms for definitions01Overview
Parallel Binary Search (PBS) is an offline algorithmic technique to solve many binary-searchable queries simultaneously. Instead of running a separate binary search for each query (which multiplies work), PBS batches queries that are considering the same mid value and answers them together using a single data structure sweep. The key assumption is monotonicity: each query has an answer over an ordered domain (like time, value, or index), and the feasibility predicate P(q, x) toggles from false to true once and never flips back. In each iteration (round), PBS assigns every active query a current mid = floor((lo + hi)/2), partitions queries by mid, and then processes mids in sorted order. While the mid increases, we incrementally apply all necessary updates to a data structure (e.g., DSU for connectivity, Fenwick tree for counts). When we reach a particular mid, we answer all queries in that bucket using the current data structure state and shrink each query's [lo, hi] interval appropriately. After O(log A) rounds, where A is the answer domain size, all queries converge to their final answers. PBS is especially effective when the per-mid data structure can be built incrementally (add-only) and reset inexpensively per round, yielding substantial speedups on large batches of queries.
02Intuition & Analogies
Imagine 1000 friends each playing the same 'guess my number' game on different hidden numbers between 1 and 1,000,000. If you try to guess each personβs number independently with classic binary search, you will ask roughly 20 questions per person, totaling ~20,000 questions. But what if you could ask the same question to everyone at once: 'Is your number β€ 500,000?' Everyone answers simultaneously, and you split the group into two halves. Next, you ask the left half 'β€ 250,000?' and the right half 'β€ 750,000?'. With only ~20 rounds overall, everyone finishes at once. PBS is this collective version of binary search for algorithmic queries. The 'question' you ask in each round is your predicate P(q, mid): 'If I apply all updates up to mid, is query q satisfied?' For connectivity over time, your question is 'Are u and v connected after the first mid edges?' For k-th smallest in a subarray, your question is 'Are there at least k elements β€ mid in [l, r]?' Crucially, you process mids in increasing order in a single sweep, updating a shared data structure (like adding edges to DSU or inserting array elements into a Fenwick tree) so you don't rebuild from scratch for each query. After answering all queries at the current mid, you refine their intervals (go left or right) and repeat. This way, you convert many tiny binary searches into a small number of synchronized rounds with broad reuse of work.
03Formal Definition
04When to Use
Use PBS when you have many queries sharing the same ordered answer domain and a monotone feasibility predicate, especially when incremental updates are natural. Typical scenarios include: (1) Earliest time when a property holds, with time-ordered updates, like dynamic connectivity under edge additions only; here P(q, t) asks if the property is satisfied after the first t updates. (2) Threshold queries over values, like 'k-th smallest in [l, r]' or 'Is count of elements β€ x in [l, r] at least k?' processed with Fenwick/segment trees while sweeping x. (3) Coverage thresholds: given a stream of range-add updates over time, find earliest time a pointβs coverage reaches a target; P(q, t) checks coverage after t updates. (4) Feasibility tests parameterized by a single numeric parameter: for capacity, budget, or distance thresholds where enabling all items with key β€ x or β₯ x makes P monotone. PBS is most valuable when a naive solution would run a separate binary search per query, rebuilding the data structure each time. If you can incrementally add updates in an ordered sweep per round and re-use the same data structure state across many queries, PBS typically reduces an O(|Q| Γ log |A|) multiplier on the full build cost down to O(log |A|). If the predicate is not monotone or the data structure must support deletions between mids within a round (without rollback), PBS may be unsuitable.
β οΈCommon Mistakes
- Violating monotonicity: If P(q, x) is not monotone (e.g., toggles true/false multiple times as x increases), PBS yields incorrect answers. Always prove monotonicity and pick the correct direction (β€ vs β₯). 2) Mishandling '<' vs 'β€': Decide whether P(q, mid) sends the query left or right and keep it consistent. A common off-by-one error is mapping 'count β₯ k' incorrectly to bounds updates. 3) Data structure contamination between rounds: Each PBS round assumes a fresh data structure state as you sweep mids. Either rebuild/clear the DS per round or use a persistent/rollback-capable DS. Forgetting to reset leads to subtle bugs. 4) Inefficient sweep order: Not iterating mids in sorted order removes the incremental-add advantage and can degrade performance to near per-query rebuilds. 5) Over-allocating buckets: Re-creating huge vectors every round can be costly; prefer reusing memory or using an array of vectors sized to the answer domain. 6) Early termination logic: Ensure the outer loop stops when all queries are resolved (lo > hi). An incorrect stopping condition can cause infinite loops or missed answers. 7) Ignoring unreachable cases: Some queries never become feasible; in that case, lo exits the domain range. Decide and document what you return (e.g., -1). 8) Forgetting coordinate compression: When the domain is large (values up to 1e9), compress to the set of distinct candidates so buckets are manageable and updates are meaningful.
Key Formulas
Query answer via monotone predicate
Explanation: For each query q, the answer is the smallest x in the domain where the predicate becomes true. This relies on monotonicity so that once true, it's true for all larger x.
Binary search rounds
Explanation: Binary search over an ordered domain of size |A| halves the interval each time. After about log2(|A|) rounds, each query's interval collapses to a single value.
PBS time complexity
Explanation: Each round processes all updates in a sweep (U) and answers all active queries (Q), multiplied by the number of rounds and per-operation costs of the data structure.
DSU union-find cost
Explanation: Using union by rank and path compression, DSU supports m union/find operations in near-linear time, with the inverse Ackermann function behaving like a constant.
Prefix sum
Explanation: A prefix sum up to i is the sum of the first i elements. Fenwick trees maintain these sums to allow O(log n) updates and queries.
Count elements up to threshold
Explanation: The number of in a subarray is the sum over indicator variables. With a Fenwick tree, insert and take a range sum.
Range add, point query via difference array
Explanation: To add v to a range [l, r], update the difference array at l and r+1. The current value at x is the prefix sum of D up to x, implementable with a Fenwick tree.
Total predicate evaluations
Explanation: Each query is evaluated at most once per round. Over all rounds, each query has O(log |A|) predicate checks.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // DSU with path compression and union by size 5 struct DSU { 6 int n; 7 vector<int> p, sz; 8 DSU(int n=0) { init(n); } 9 void init(int n_) { 10 n = n_; 11 p.resize(n+1); 12 sz.assign(n+1,1); 13 iota(p.begin(), p.end(), 0); 14 } 15 int find(int x){ return p[x]==x?x:p[x]=find(p[x]); } 16 void unite(int a, int b){ 17 a = find(a); b = find(b); 18 if(a==b) return; 19 if(sz[a]<sz[b]) swap(a,b); 20 p[b]=a; sz[a]+=sz[b]; 21 } 22 bool same(int a, int b){ return find(a)==find(b); } 23 }; 24 25 // Problem: m edges are added one by one to a graph with n nodes. 26 // For each query (u, v), find the earliest time t (1..m) such that u and v are connected 27 // after adding the first t edges. Return -1 if they never become connected. 28 29 int main(){ 30 ios::sync_with_stdio(false); 31 cin.tie(nullptr); 32 33 // Example input (you can replace with cin >> ...): 34 // n=5, m=5 edges, q=3 queries 35 int n = 5, m = 5, q = 3; 36 vector<pair<int,int>> edges = { 37 {1,2}, {2,3}, {4,5}, {3,4}, {1,5} 38 }; // 1-based times: edges[0] at t=1, ... edges[4] at t=5 39 vector<pair<int,int>> queries = { {1,3}, {2,5}, {1,4} }; 40 41 // Parallel Binary Search setup 42 vector<int> lo(q,1), hi(q,m), ans(q, -1); 43 vector<char> active(q, 1); 44 45 DSU dsu(n); 46 47 // Temporary buckets per round: bucket[t] holds indices of queries with mid=t 48 vector<vector<int>> bucket(m+2); 49 50 while(true){ 51 bool any = false; 52 for(int i=0;i<=m+1;i++) bucket[i].clear(); 53 for(int i=0;i<q;i++) if(active[i]){ 54 if(lo[i] <= hi[i]){ 55 any = true; 56 int mid = (lo[i] + hi[i]) >> 1; 57 bucket[mid].push_back(i); 58 } else { 59 // interval collapsed: finalize answer 60 ans[i] = (lo[i] <= m ? lo[i] : -1); 61 active[i] = 0; 62 } 63 } 64 if(!any) break; 65 66 // Reset DSU and sweep times in increasing order 67 dsu.init(n); 68 int ptr = 0; // number of edges applied so far (0..m) 69 for(int t=1; t<=m; t++){ 70 // apply edge t 71 auto [u,v] = edges[t-1]; 72 dsu.unite(u,v); 73 // answer all queries with mid=t 74 for(int idx : bucket[t]){ 75 auto [qu, qv] = queries[idx]; 76 if(dsu.same(qu, qv)){ 77 // predicate true at mid => go left 78 hi[idx] = t - 1; 79 } else { 80 // predicate false => go right 81 lo[idx] = t + 1; 82 } 83 } 84 } 85 // Note: queries with mid > m are impossible here since domain is [1..m] 86 } 87 88 // Print results 89 // Expected for the sample: (1,3)->2 (via edges 1-2-3), (2,5)->4, (1,4)->4 90 for(int i=0;i<q;i++){ 91 cout << ans[i] << (i+1==q?'\n':' '); 92 } 93 return 0; 94 } 95
We binary-search, in parallel, the earliest edge index t for which each pair (u, v) becomes connected. In each round, we bucket queries by mid time, rebuild the DSU from scratch, sweep t from 1 to m, and after each union, answer all queries whose mid equals t. If connected, we shrink to the left half; otherwise to the right. When a queryβs interval collapses, we finalize its answer (or -1 if it never becomes connected).
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Fenwick tree for point updates and prefix sums 5 struct Fenwick { 6 int n; vector<int> bit; 7 Fenwick(int n=0){init(n);} 8 void init(int n_){ n=n_; bit.assign(n+1,0);} 9 void add(int idx, int val){ for(; idx<=n; idx+=idx&-idx) bit[idx]+=val; } 10 int sumPrefix(int idx){ int r=0; for(; idx>0; idx-=idx&-idx) r+=bit[idx]; return r; } 11 int sumRange(int l, int r){ if(r<l) return 0; return sumPrefix(r)-sumPrefix(l-1);} 12 }; 13 14 // Problem: Given array A[1..n] and q queries (l, r, k), find the k-th smallest in A[l..r]. 15 // PBS over value domain: compress distinct values, for each mid (value rank), insert all elements with rank <= mid, and 16 // answer how many fall in [l, r]. If count >= k, go left; else go right. 17 18 struct Query { int l, r, k; int id; }; 19 20 int main(){ 21 ios::sync_with_stdio(false); cin.tie(nullptr); 22 23 // Example 24 int n = 8; 25 vector<int> A = {0, 5, 1, 2, 6, 3, 7, 4, 8}; // 1-based; A[0] unused 26 int q = 3; 27 vector<Query> qs = { {2,7,3,0}, {1,8,5,1}, {3,5,2,2} }; 28 29 // Coordinate compression of values 30 vector<int> vals; vals.reserve(n); 31 for(int i=1;i<=n;i++) vals.push_back(A[i]); 32 vector<int> sorted_vals = vals; sort(sorted_vals.begin(), sorted_vals.end()); sorted_vals.erase(unique(sorted_vals.begin(), sorted_vals.end()), sorted_vals.end()); 33 int V = (int)sorted_vals.size(); 34 35 // Pair (rank, position), sorted by rank 36 vector<pair<int,int>> elems; elems.reserve(n); 37 for(int i=1;i<=n;i++){ 38 int rank = int(lower_bound(sorted_vals.begin(), sorted_vals.end(), A[i]) - sorted_vals.begin()) + 1; // 1..V 39 elems.emplace_back(rank, i); 40 } 41 sort(elems.begin(), elems.end()); 42 43 // PBS intervals over value ranks 44 vector<int> lo(q,1), hi(q,V), ansRank(q, -1); 45 vector<char> active(q, 1); 46 47 // Buckets per mid rank 48 vector<vector<int>> bucket(V+2); 49 Fenwick fw(n); 50 51 while(true){ 52 bool any=false; 53 for(int i=0;i<=V+1;i++) bucket[i].clear(); 54 for(int i=0;i<q;i++) if(active[i]){ 55 if(lo[i] <= hi[i]){ 56 any = true; 57 int mid = (lo[i] + hi[i]) >> 1; 58 bucket[mid].push_back(i); 59 } else { 60 ansRank[i] = (lo[i] >= 1 && lo[i] <= V) ? lo[i] : -1; 61 active[i] = 0; 62 } 63 } 64 if(!any) break; 65 66 fw.init(n); 67 int ptr = 0; // how many elements inserted (by rank) 68 for(int mid=1; mid<=V; mid++){ 69 // Insert all elements with rank <= mid 70 while(ptr < n && elems[ptr].first <= mid){ 71 int pos = elems[ptr].second; 72 fw.add(pos, 1); 73 ++ptr; 74 } 75 // Answer all queries aiming at this mid 76 for(int qi : bucket[mid]){ 77 auto qu = qs[qi]; 78 int cnt = fw.sumRange(qu.l, qu.r); 79 if(cnt >= qu.k){ 80 // enough elements <= mid in [l,r] => go left 81 hi[qi] = mid - 1; 82 } else { 83 lo[qi] = mid + 1; 84 } 85 } 86 } 87 } 88 89 // Materialize answers 90 vector<int> ans(q, -1); 91 for(int i=0;i<q;i++) if(ansRank[i] != -1) ans[i] = sorted_vals[ansRank[i]-1]; 92 93 // Expected: 94 // Query (2,7,3): values in [2..7] are {1,2,6,3,7,4} -> sorted {1,2,3,4,6,7}, 3rd=3 95 // Query (1,8,5): full array {1,2,6,3,7,4,8,5} sorted -> 5th=5 96 // Query (3,5,2): {2,6,3} sorted -> 2nd=3 97 for(int i=0;i<q;i++) cout << ans[i] << (i+1==q?'\n':' '); 98 return 0; 99 } 100
We binary-search, in parallel, the answer value for each (l, r, k) query. Compress values to 1..V, and in each round sweep mid from 1..V while inserting all elements with rank β€ mid into a Fenwick tree. For each query in bucket[mid], we count how many inserted elements lie in [l, r]; if at least k, the k-th smallest is β€ mid (go left), otherwise go right. After convergence, mapping the final rank back yields the k-th smallest value.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Fenwick supporting point-prefix sums (we simulate range-add via difference array technique) 5 struct Fenwick { 6 int n; vector<int> bit; 7 Fenwick(int n=0){init(n);} 8 void init(int n_){ n=n_; bit.assign(n+2,0);} // n+1 safe for r+1 9 void add(int idx, int val){ for(; idx<=n+1; idx+=idx&-idx) bit[idx]+=val; } 10 int sumPrefix(int idx){ int r=0; for(; idx>0; idx-=idx&-idx) r+=bit[idx]; return r; } 11 }; 12 13 // Problem: Positions 1..N. We receive M intervals [L_i, R_i], added one per time step i. 14 // Let coverage_t(x) be how many intervals among 1..t cover position x. For each query (x, k), 15 // find the earliest time t (1..M) such that coverage_t(x) >= k, else -1 if never. 16 17 struct Interval{int l, r;}; 18 struct Query{int x, k;}; 19 20 int main(){ 21 ios::sync_with_stdio(false); cin.tie(nullptr); 22 23 int N = 10; // positions 1..N 24 vector<Interval> intervals = { {1,3}, {2,5}, {4,6}, {7,9}, {3,8} }; // M=5 25 int M = (int)intervals.size(); 26 vector<Query> qs = { {3,2}, {5,3}, {7,1}, {10,1} }; // Q=4 27 int Q = (int)qs.size(); 28 29 vector<int> lo(Q,1), hi(Q,M), ans(Q,-1); 30 vector<char> active(Q,1); 31 32 vector<vector<int>> bucket(M+2); 33 Fenwick fw(N+2); 34 35 while(true){ 36 bool any=false; 37 for(int i=0;i<=M+1;i++) bucket[i].clear(); 38 for(int i=0;i<Q;i++) if(active[i]){ 39 if(lo[i] <= hi[i]){ 40 any = true; 41 int mid = (lo[i] + hi[i]) >> 1; 42 bucket[mid].push_back(i); 43 } else { 44 ans[i] = (lo[i] <= M ? lo[i] : -1); 45 active[i] = 0; 46 } 47 } 48 if(!any) break; 49 50 fw.init(N+2); 51 // Sweep time t=1..M, maintaining coverage via difference add: +1 at l, -1 at r+1 52 for(int t=1; t<=M; t++){ 53 auto [l, r] = intervals[t-1]; 54 fw.add(l, +1); 55 fw.add(r+1, -1); 56 for(int qi : bucket[t]){ 57 int x = qs[qi].x; 58 int k = qs[qi].k; 59 int cover = fw.sumPrefix(x); // coverage at x after t intervals 60 if(cover >= k) hi[qi] = t - 1; else lo[qi] = t + 1; 61 } 62 } 63 } 64 65 // Convert collapsed intervals to final answers (or -1 if beyond M) 66 for(int i=0;i<Q;i++) if(ans[i] != -1 && ans[i] <= M) ; else if(lo[i] <= M) ans[i] = lo[i]; else ans[i] = -1; 67 68 // Expected (intuitively): 69 // t=1: cover {1..3}; t=2 adds {2..5}; t=3 adds {4..6}; t=4 adds {7..9}; t=5 adds {3..8} 70 // x=3,k=2 -> earliest t=2; x=5,k=3 -> earliest t=5; x=7,k=1 -> earliest t=4; x=10,k=1 -> -1 71 for(int i=0;i<Q;i++) cout << ans[i] << (i+1==Q?'\n':' '); 72 return 0; 73 } 74
We add one interval per time step and want the earliest time a position reaches a target coverage. Using a Fenwick as a difference array, adding [l, r] is +1 at l and -1 at r+1, and coverage at x is the prefix sum at x. PBS groups queries by time mid; sweeping t in increasing order updates coverage and simultaneously answers all queries bucketed at t.