Finding The Closest Pair Of Points

Finding the closest pair of points: We now consider the problem of finding the closest pair of points in a set Q of n ≥ 2 points. "Closest" refers to the usual euclidean distance: the distance between points p₁ = (x₁, y₁) and p₂ = (x₂, y₂) is . Two points in set Q may be coincident, in which case the distance between them is zero. This problem has applications in, for example, traffic-control systems. A system for controlling air or sea traffic might need to know which are the two closest vehicles in order to detect potential collisions.

A brute-force closest-pair algorithm simply looks at all the pairs of points. In this section, we shall describe a divide-and-conquer algorithm for this problem whose running time is described by the familiar recurrence T (n) = 2T(n/2) O(n). Thus, this algorithm uses only O(n lg n) time.

The divide-and-conquer algorithm: Each recursive invocation of the algorithm takes as input a subset P ⊆ Q and arrays X and Y, each of which contains all the points of the input subset P. The points in array X are sorted so that their x-coordinates are monotonically increasing. Similarly, array Y is sorted by monotonically increasing y-coordinate. Note that in order to attain the O(n lg n) time bound, we cannot afford to sort in each recursive call; if we did, the recurrence for the running time would be T(n) = 2T(n/2) O(n lg n), whose solution is T(n) = O(n lg² n). We shall see a little later how to use "presorting" to maintain this sorted property without actually sorting in each recursive call.

A given recursive invocation with inputs P, X, and Y first checks whether |P| ≤ 3. If so, the invocation simply performs the brute-force method described above: try all pairs of points and return the closest pair. If |P| > 3, the recursive invocation carries out the divide-and-conquer paradigm as follows.

Divide: It finds a vertical line l that bisects the point set P into two sets P_L and P_R such that |P_L| = ⌈|P|/2⌉, |P_R| = ⌊|P|/2⌋, all points in P_L are on or to the left of line l, and all points in P_R are on or to the right of l. The array X is divided into arrays X_L and X_R, which contain the points of P_L and P_R respectively, sorted by monotonically increasing x-coordinate. Similarly, the array Y is divided into arrays Y_L and Y_R, which contain the points of P_L and P_R respectively, sorted by monotonically increasing y-coordinate.
Conquer: Having divided P into P_L and P_R, it makes two recursive calls, one to find the closest pair of points in P_L and the other to find the closest pair of points in P_R. The inputs to the first call are the subset P_L and arrays X_L and Y_L; the second call receives the inputs P_R, X_R, and Y_R. Let the closest-pair distances returned for P_L and P_R be δ_L and δ_R, respectively, and let δ = min(δ_L, δ_R).
Combine: The closest pair is either the pair with distance δ found by one of the recursive calls, or it is a pair of points with one point in P_L and the other in P_R. The algorithm determines if there is such a pair whose distance is less than δ. Observe that if there is a pair of points with distance less than δ, both points of the pair must be within δ units of line l. Thus, as Figure 33.11(a) shows, they both must reside in the 2δ-wide vertical strip centered at line l. To find such a pair, if one exists, the algorithm does the following.

Figure 33.11: Key concepts in the proof that the closest-pair algorithm needs to check only 7 points following each point in the array Y′. (a) If p_L ∈ P_L and p_R ∈ P_R are less than δ units apart, they must reside within a δ × 2δ rectangle centered at line l. (b) How 4 points that are pairwise at least δ units apart can all reside within a δ × δ square. On the left are 4 points in P_L, and on the right are 4 points in P_R. There can be 8 points in the δ × 2δ rectangle if the points shown on line l are actually pairs of coincident points with one point in P_L and one in P_R.

It creates an array Y′, which is the array Y with all points not in the 2δ-wide vertical strip removed. The array Y′ is sorted by y-coordinate, just as Y is.
For each point p in the array Y′, the algorithm tries to find points in Y′ that are within δ units of p. As we shall see shortly, only the 7 points in Y′ that follow p need be considered. The algorithm computes the distance from p to each of these 7 points and keeps track of the closest-pair distance δ′ found over all pairs of points in Y′.
If δ′ < δ, then the vertical strip does indeed contain a closer pair than was found by the recursive calls. This pair and its distance δ′ are returned. Otherwise, the closest pair and its distance δ found by the recursive calls are returned.

The above description omits some implementation details that are necessary to achieve the O(n lg n) running time. After proving the correctness of the algorithm, we shall show how to implement the algorithm to achieve the desired time bound.

Correctness: The correctness of this closest-pair algorithm is obvious, except for two aspects. First, by bottoming out the recursion when |P| ≤ 3, we ensure that we never try to solve a subproblem consisting of only one point. The second aspect is that we need only check the 7 points following each point p in array Y′; we shall now prove this property.

Suppose that at some level of the recursion, the closest pair of points is p_L ∈ P_L and p_R ∈ P_R. Thus, the distance δ′ between p_L and p_R is strictly less than δ. Point p_L must be on or to the left of line l and less than δ units away. Similarly, p_R is on or to the right of l and less than δ units away. Moreover, p_L and p_R are within δ units of each other vertically. Thus, as Figure 33.11(a) shows, p_L and p_R are within a δ × 2δ rectangle centered at line l. (There may be other points within this rectangle as well.)