Closest Pair of Points
Key Points
- •The closest pair of points problem asks for the minimum Euclidean distance between any two points in the plane.
- •A divide-and-conquer algorithm solves it in O(n n) time by splitting points by median x, solving each half, and checking only a thin strip near the split.
- •In the strip, you need to compare each point with at most a constant number (≤ 7–8) of subsequent points when the strip is sorted by y, thanks to a geometric packing argument.
- •You should compute squared distances to avoid expensive square roots inside loops and take a single at the end.
- •A plane-sweep algorithm with a balanced BST also achieves O(n n) time and is easier to implement but requires careful active-set maintenance.
- •A randomized incremental grid-based algorithm can run in expected near-linear time by checking only O(1) neighbor cells per insertion.
- •Be careful with duplicates and numerical precision; identical points imply distance 0 immediately.
- •Stable preprocessing (sorting, consistent tie-breaking) and preserving the y-sorted order during merging are crucial for the O(n n) divide-and-conquer bound.
Prerequisites
- →Euclidean geometry basics — Understanding distance formulas and coordinate reasoning is essential to follow the algorithm’s correctness.
- →Sorting and stability — Both divide-and-conquer and plane sweep rely on initial sorting and consistent tie-breaking.
- →Recursion and divide-and-conquer — The optimal deterministic algorithm relies on recursive splitting and merging.
- →Master Theorem — Used to analyze the T(n)=2T(n/2)+O(n) recurrence and conclude O(n log n) time.
- →Balanced BSTs / ordered sets (C++) — The plane-sweep implementation uses std::set for efficient ordered queries.
- →Hash tables (unordered_map) — The randomized grid needs a hash map from grid cells to buckets of points.
- →Floating-point precision — Squared distances and comparisons within strips require careful numeric handling.
- →Asymptotic complexity (Big-O) — To compare O(n^2) brute force vs O(n log n) optimal methods and understand trade-offs.
- →Randomization basics — Explains expected-time guarantees for the grid-based algorithm.
- →C++ STL algorithms and containers — Implementations rely on sort, set, unordered_map, and standard idioms.
Detailed Explanation
Tap terms for definitions01Overview
The closest pair of points problem is a foundational task in computational geometry: given n points on a 2D plane, find two points with the smallest Euclidean distance between them. A naive approach checks every pair, taking O(n^2) time, which becomes impractical even for moderately large n. The classical optimal solution for general inputs uses a divide-and-conquer strategy that achieves O(n \log n) time. The idea is to sort points by their x-coordinates, split them around the median, compute the closest pair in each half recursively, and then consider only points lying in a vertical strip of width 2d around the split line, where d is the best distance from the two halves. A key geometric insight shows that, within this strip, each point needs to be compared with only a constant number of neighbors when the strip is sorted by y, yielding linear work in the merge step. Besides divide-and-conquer, there are alternative methods: a plane-sweep algorithm with a balanced BST and a randomized incremental grid method with expected near-linear time. These algorithms not only provide fast solutions but also illustrate powerful techniques—like recursion with careful ordering, geometric packing arguments, and data structures for efficiently filtering candidates. The problem appears frequently in contests and real systems (graphics, clustering, nearest-neighbor preprocessing) and is a great vehicle for learning algorithm design and analysis.
02Intuition & Analogies
Imagine placing several pins on a corkboard. To find the closest two pins, you could measure every pair with a ruler—accurate but slow. Instead, try a smarter plan. First, line up the pins by how far they are from the left edge (sort by x). Split the board down the middle and find the closest pair in the left half and the right half separately—like having two helpers. Suppose their best distance is d. Now, could the global best cross the middle line? If so, it must involve pins that are within distance d of the split—otherwise, they can’t be closer than the current best. This creates a narrow vertical strip in the center that we need to check. Here’s the magic: if you walk up the strip from bottom to top (sorting by y), you don’t need to check every pin against every other. Picture placing coins of diameter d around each pin—geometry says only a handful of pins can fit in a small neighborhood without violating the current best distance. Practically, that means for each pin you only compare it to the next few pins above it (about 7). This turns what could have been a quadratic search into a linear pass during the merge. Another viewpoint is sweeping a vertical line from left to right (plane sweep). Keep only those pins whose x is within d of the sweep line in a small active set ordered by height (y). When a new pin arrives, only a few nearby neighbors in y can possibly be closer than d—again a small, local check. Or, randomize the order and drop pins into a grid of boxes of size about d; only the pin’s cell and a tiny number of neighboring cells can contain a closer partner. All three perspectives capitalize on the same idea: most pairs are obviously too far apart, and geometry lets us ignore them safely.
03Formal Definition
04When to Use
Use the divide-and-conquer closest-pair algorithm when you need the exact nearest-neighbor distance for a static set of 2D points and n is large enough that O(n^2) is too slow (typically n ≥ 5e4 in practice). It is well-suited for offline computations like preprocessing in clustering (single-linkage step), collision detection prefilters, graphics (finding minimal feature separation), and computational geometry tasks that require exactness. The plane-sweep variant is preferable for simpler implementations when you are comfortable with balanced BSTs; it often runs fast in practice and is concise. If you expect very large inputs and can accept randomized expected bounds, the randomized incremental grid algorithm is attractive, especially if coordinates are floating-point and well-distributed; it can approach linear expected time and is easy to parallelize by chunks. Avoid these methods if you need dynamic updates (frequent insertions/deletions)—consider k-d trees or Delaunay triangulations for dynamic nearest-neighbor queries. If the metric is not Euclidean (e.g., Manhattan distance), the structure of the argument changes; specialized transforms or algorithms exist. For very small n (say n ≤ 1000), a well-optimized O(n^2) brute force may be simpler and fast enough.
⚠️Common Mistakes
- Taking square roots in the inner loops. Use squared distances to compare and only take one \sqrt{} at the end to avoid both numeric noise and unnecessary cost.
- Re-sorting by y inside every recursive call. The optimal divide-and-conquer runs in O(n \log n) only if you preserve y-order with a linear-time merge per level; otherwise, the complexity can degrade to O(n \log^2 n).
- Forgetting to handle duplicates. If two identical points exist, the answer is 0. Detect this early by checking during sorting or via a hash set when reading input.
- Using double improperly for huge coordinates. Squaring 64-bit integers can overflow 64-bit; either cast to long double or use 128-bit integers (builtins) for squared distances.
- In the strip scan, comparing against too many neighbors (or all of them). The proof guarantees a constant bound (typically ≤ 7) if the strip is sorted by y; implement the y-sorted strip and a bounded neighbor loop.
- Plane sweep without purging old points. You must remove points whose x is more than current best away from the sweep line, or the set grows and performance degrades.
- Unstable tie-breaking around the median. When splitting by x, be consistent handling equal x to avoid missing cross-half pairs or creating imbalanced recursion.
- Floating-point comparisons without epsilon care. When checking |x - x_m| < d or y-d ≤ y' ≤ y+d, be robust to tiny rounding errors (e.g., add a tiny epsilon).
Key Formulas
Euclidean Distance
Explanation: This is the straight-line distance between two points in 2D. In algorithms, compare squared distances (omit the square root) and take one square root at the end to get the final answer.
Divide-and-Conquer Recurrence
Explanation: The closest-pair divide-and-conquer splits the points into two halves, solves both, and performs O(n) work to merge. By the Master Theorem, this solves to O(n log n).
Asymptotic Time Complexity
Explanation: The overall time for the deterministic optimal algorithms (divide-and-conquer or plane sweep) grows near-linearly with an additional logarithmic factor. Doubling n increases running time by about an additive n log 2 term, not by a factor of n.
Brute-Force Pair Count
Explanation: Checking all pairs among n points takes about n(n-1)/2 comparisons, which is O(). This is too slow for large n compared to O(n log n).
Grid Cell Size for Uniqueness
Explanation: If grid cell side length is d/, at most one point can occupy a cell when the current best distance is d. This supports constant-time neighbor checks in randomized grid methods.
Strip Definition
Explanation: Only points within distance d of the median x-coordinate can form the cross-partition closest pair. This filters candidates to a narrow vertical strip.
Packing Bound (Informal)
Explanation: A geometric packing lemma shows that within the strip, each point needs to be compared to only a small constant number of following points in y-sorted order, , making the merge linear.
Linear Work per Level
Explanation: Across each level of recursion, work sums to a linear amount due to partitioning. With log n levels, total work is O(n log n).
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct Point { 5 long double x, y; 6 int id; // unique index to break ties if needed 7 }; 8 9 static inline long double sqr(long double v) { return v * v; } 10 static inline long double dist2(const Point& a, const Point& b) { 11 return sqr(a.x - b.x) + sqr(a.y - b.y); 12 } 13 14 // Recursively compute closest pair on pts[l..r), 15 // maintaining that before the call the segment is sorted by x. 16 // On return, pts[l..r) will be sorted by y to enable linear merges up the recursion. 17 long double closest_pair_rec(int l, int r, vector<Point>& pts, vector<Point>& buf) { 18 int n = r - l; 19 if (n <= 40) { 20 long double best = numeric_limits<long double>::infinity(); 21 for (int i = l; i < r; ++i) { 22 for (int j = i + 1; j < r; ++j) { 23 best = min(best, dist2(pts[i], pts[j])); 24 } 25 } 26 // sort by y for parent merge 27 sort(pts.begin() + l, pts.begin() + r, [](const Point& a, const Point& b){ 28 if (a.y != b.y) return a.y < b.y; 29 if (a.x != b.x) return a.x < b.x; 30 return a.id < b.id; 31 }); 32 return best; 33 } 34 35 int m = l + n / 2; 36 long double midx = pts[m].x; // median x before we reorder by y 37 38 long double dl = closest_pair_rec(l, m, pts, buf); 39 long double dr = closest_pair_rec(m, r, pts, buf); 40 long double d = min(dl, dr); 41 42 // Merge by y: pts[l..m) and pts[m..r) are y-sorted now; merge to buf, copy back. 43 int i = l, j = m, k = 0; 44 while (i < m && j < r) { 45 if (pts[i].y < pts[j].y || (pts[i].y == pts[j].y && (pts[i].x < pts[j].x || (pts[i].x == pts[j].x && pts[i].id < pts[j].id)))) 46 buf[k++] = pts[i++]; 47 else 48 buf[k++] = pts[j++]; 49 } 50 while (i < m) buf[k++] = pts[i++]; 51 while (j < r) buf[k++] = pts[j++]; 52 for (int t = 0; t < k; ++t) pts[l + t] = buf[t]; 53 54 // Build the strip: points within distance d from the vertical line x = midx 55 static const long double EPS = 1e-18L; 56 vector<Point> strip; 57 strip.reserve(n); 58 for (int t = l; t < r; ++t) { 59 if (fabsl(pts[t].x - midx) <= d + EPS) strip.push_back(pts[t]); 60 } 61 62 // Since pts[l..r) is y-sorted, 'strip' is also y-sorted. 63 // Compare each point to the next few (<= 7) points. 64 for (size_t a = 0; a < strip.size(); ++a) { 65 // check following up to 7 points 66 for (size_t b = a + 1; b < strip.size() && b <= a + 8; ++b) { 67 // Early break on y difference if it already exceeds current d 68 if (sqr(strip[b].y - strip[a].y) > d + EPS) break; 69 d = min(d, dist2(strip[a], strip[b])); 70 } 71 } 72 73 return d; 74 } 75 76 long double closest_pair(vector<Point> pts) { 77 // Sort by x once before recursion. Use stable tie-breaking. 78 sort(pts.begin(), pts.end(), [](const Point& a, const Point& b){ 79 if (a.x != b.x) return a.x < b.x; 80 if (a.y != b.y) return a.y < b.y; 81 return a.id < b.id; 82 }); 83 vector<Point> buf(pts.size()); 84 long double best2 = closest_pair_rec(0, (int)pts.size(), pts, buf); 85 return sqrt((double)best2); 86 } 87 88 int main() { 89 ios::sync_with_stdio(false); 90 cin.tie(nullptr); 91 92 int n; 93 if (!(cin >> n)) return 0; 94 vector<Point> pts(n); 95 for (int i = 0; i < n; ++i) { 96 long double x, y; cin >> x >> y; 97 pts[i] = {x, y, i}; 98 } 99 100 if (n < 2) { cout << 0 << "\n"; return 0; } 101 // Optional early duplicate check could set answer to 0. 102 103 cout.setf(std::ios::fixed); cout << setprecision(10) << closest_pair(pts) << "\n"; 104 return 0; 105 } 106
Implements the optimal O(n log n) divide-and-conquer solution. Points are initially sorted by x. The recursive function returns the best squared distance and ensures the current segment is sorted by y upon return. The merge step builds the strip in O(n) and compares each point to at most a constant number of successors in y-order. Using squared distances avoids sqrt in the hot loop.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct Point { long double x, y; int id; }; 5 static inline long double sqr(long double v){ return v*v; } 6 static inline long double dist2(const Point& a, const Point& b){ return sqr(a.x-b.x) + sqr(a.y-b.y); } 7 8 int main(){ 9 ios::sync_with_stdio(false); 10 cin.tie(nullptr); 11 12 int n; if(!(cin>>n)) return 0; 13 vector<Point> p(n); 14 for(int i=0;i<n;++i){ long double x,y; cin>>x>>y; p[i]={x,y,i}; } 15 if(n<2){ cout<<0<<"\n"; return 0; } 16 17 sort(p.begin(), p.end(), [](const Point& a, const Point& b){ 18 if(a.x!=b.x) return a.x<b.x; 19 if(a.y!=b.y) return a.y<b.y; 20 return a.id<b.id; 21 }); 22 23 long double INF = numeric_limits<long double>::infinity(); 24 long double best2 = INF; 25 26 // Active set ordered by y. Store (y, id) -> index in p for lookup. 27 set<pair<long double,int>> active; 28 int left = 0; // leftmost index still in 'active' 29 30 for(int i=0;i<n;++i){ 31 Point cur = p[i]; 32 long double d = (best2==INF? INF : sqrt((double)best2)); 33 34 // Remove points whose x is too far from current point 35 while(left<i){ 36 if (best2==INF) break; // nothing to remove yet 37 if (cur.x - p[left].x > d) { 38 active.erase({p[left].y, p[left].id}); 39 ++left; 40 } else break; 41 } 42 43 // Query candidates within [y-d, y+d] 44 long double ylow = cur.y - (best2==INF? INF : d); 45 long double yhigh = cur.y + (best2==INF? INF : d); 46 auto itlow = active.lower_bound({ylow, INT_MIN}); 47 auto ithigh = active.upper_bound({yhigh, INT_MAX}); 48 49 for(auto it = itlow; it != ithigh; ++it){ 50 const Point& q = p[ lower_bound(p.begin(), p.end(), it->second, [](const Point& a, int id){ return a.id < id; }) - p.begin() ]; 51 best2 = min(best2, dist2(cur, q)); 52 } 53 54 active.insert({cur.y, cur.id}); 55 } 56 57 cout.setf(std::ios::fixed); cout<<setprecision(10)<<sqrt((double)best2)<<"\n"; 58 return 0; 59 } 60
The plane-sweep algorithm sorts points by x and maintains an active set of points whose x is within the current best distance of the sweep line. The set is ordered by y so that, for the current point, only a small y-range needs to be scanned. Each point is inserted once and eventually removed; per point we do O(log n) set operations and a constant number of distance checks.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct Point { long double x, y; int id; }; 5 static inline long double sqr(long double v){ return v*v; } 6 static inline long double dist2(const Point& a, const Point& b){ return sqr(a.x-b.x) + sqr(a.y-b.y); } 7 8 // Hash for pair<long long,long long> 9 struct PairHash { 10 size_t operator()(const pair<long long,long long>& p) const noexcept { 11 return std::hash<long long>()((p.first<<32) ^ (p.second + 0x9e3779b97f4a7c15ULL + (p.first<<6) + (p.first>>2))); 12 } 13 }; 14 15 pair<long long,long long> cellOf(const Point& a, long double cell) { 16 // Use floor division to handle negatives correctly 17 long double cx = floor(a.x / cell); 18 long double cy = floor(a.y / cell); 19 return { (long long)cx, (long long)cy }; 20 } 21 22 int main(){ 23 ios::sync_with_stdio(false); 24 cin.tie(nullptr); 25 26 int n; if(!(cin>>n)) return 0; 27 vector<Point> p(n); 28 for(int i=0;i<n;++i){ long double x,y; cin>>x>>y; p[i]={x,y,i}; } 29 if(n<2){ cout<<0<<"\n"; return 0; } 30 31 // Randomize order to ensure expected guarantees 32 mt19937_64 rng(chrono::steady_clock::now().time_since_epoch().count()); 33 shuffle(p.begin(), p.end(), rng); 34 35 long double best2 = dist2(p[0], p[1]); 36 if(best2 == 0) { cout<<0<<"\n"; return 0; } 37 38 long double cell = max((long double)1e-18L, sqrt((double)best2)); // grid cell size ~ current best 39 40 // Hash grid: map cell -> list of point indices already inserted 41 unordered_map<pair<long long,long long>, vector<int>, PairHash> grid; 42 auto rebuild = [&](int upto){ 43 grid.clear(); 44 for(int i=0;i<upto;++i){ 45 auto c = cellOf(p[i], cell); 46 grid[c].push_back(i); 47 } 48 }; 49 50 // Insert first two points 51 grid[cellOf(p[0], cell)].push_back(0); 52 grid[cellOf(p[1], cell)].push_back(1); 53 54 for(int i=2;i<n;++i){ 55 // If best improved sufficiently (cell too big), shrink cell and rebuild 56 long double newCell = max((long double)1e-18L, sqrt((double)best2)); 57 if (newCell < cell * 0.5L) { cell = newCell; rebuild(i); } 58 59 auto c = cellOf(p[i], cell); 60 // Check neighbor cells in a 5x5 neighborhood (covers d-sized neighborhood when cell ~ d) 61 for(long long dx=-2; dx<=2; ++dx){ 62 for(long long dy=-2; dy<=2; ++dy){ 63 auto it = grid.find({c.first+dx, c.second+dy}); 64 if(it==grid.end()) continue; 65 for(int j : it->second){ 66 best2 = min(best2, dist2(p[i], p[j])); 67 if(best2 == 0) { cout<<0<<"\n"; return 0; } 68 } 69 } 70 } 71 grid[c].push_back(i); 72 } 73 74 cout.setf(std::ios::fixed); cout<<setprecision(10)<<sqrt((double)best2)<<"\n"; 75 return 0; 76 } 77
Randomly permutes the points, then inserts them into a spatial hash grid whose cell size tracks the current best distance. For each new point, only a small constant neighborhood of grid cells is checked. When the best distance shrinks substantially, the grid is rebuilt at a smaller cell size. With random order, the expected total rebuild and neighbor-check work is near-linear. This approach is simple and fast on well-distributed inputs but retains worst-case guarantees only in expectation.