Finding The Closest Pair Of Points
Finding the closest pair of points: We now consider the problem of finding the closest pair of points in a set Q of n ≥ 2 points. "Closest" refers to the usual euclidean distance: the distance between points p1 = (x1, y1) and p2 = (x2, y2) is . Two points in set Q may be coincident, in which case the distance between them is zero. This problem has applications in, for example, traffic-control systems. A system for controlling air or sea traffic might need to know which are the two closest vehicles in order to detect potential collisions.
A brute-force closest-pair algorithm simply looks at all the pairs of points. In this section, we shall describe a divide-and-conquer algorithm for this problem whose running time is described by the familiar recurrence T (n) = 2T(n/2) O(n). Thus, this algorithm uses only O(n lg n) time.
The divide-and-conquer algorithm: Each recursive invocation of the algorithm takes as input a subset P ⊆ Q and arrays X and Y, each of which contains all the points of the input subset P. The points in array X are sorted so that their x-coordinates are monotonically increasing. Similarly, array Y is sorted by monotonically increasing y-coordinate. Note that in order to attain the O(n lg n) time bound, we cannot afford to sort in each recursive call; if we did, the recurrence for the running time would be T(n) = 2T(n/2) O(n lg n), whose solution is T(n) = O(n lg2 n). We shall see a little later how to use "presorting" to maintain this sorted property without actually sorting in each recursive call.
A given recursive invocation with inputs P, X, and Y first checks whether |P| ≤ 3. If so, the invocation simply performs the brute-force method described above: try all pairs of points and return the closest pair. If |P| > 3, the recursive invocation carries out the divide-and-conquer paradigm as follows.
-
Divide: It finds a vertical line l that bisects the point set P into two sets PL and PR such that |PL| = ⌈|P|/2⌉, |PR| = ⌊|P|/2⌋, all points in PL are on or to the left of line l, and all points in PR are on or to the right of l. The array X is divided into arrays XL and XR, which contain the points of PL and PR respectively, sorted by monotonically increasing x-coordinate. Similarly, the array Y is divided into arrays YL and YR, which contain the points of PL and PR respectively, sorted by monotonically increasing y-coordinate.
-
Conquer: Having divided P into PL and PR, it makes two recursive calls, one to find the closest pair of points in PL and the other to find the closest pair of points in PR. The inputs to the first call are the subset PL and arrays XL and YL; the second call receives the inputs PR, XR, and YR. Let the closest-pair distances returned for PL and PR be δL and δR, respectively, and let δ = min(δL, δR).
-
Combine: The closest pair is either the pair with distance δ found by one of the recursive calls, or it is a pair of points with one point in PL and the other in PR. The algorithm determines if there is such a pair whose distance is less than δ. Observe that if there is a pair of points with distance less than δ, both points of the pair must be within δ units of line l. Thus, as Figure 33.11(a) shows, they both must reside in the 2δ-wide vertical strip centered at line l. To find such a pair, if one exists, the algorithm does the following.
-
Figure 33.11: Key concepts in the proof that the closest-pair algorithm needs to check only 7 points following each point in the array Y′. (a) If pL ∈ PL and pR ∈ PR are less than δ units apart, they must reside within a δ × 2δ rectangle centered at line l. (b) How 4 points that are pairwise at least δ units apart can all reside within a δ × δ square. On the left are 4 points in PL, and on the right are 4 points in PR. There can be 8 points in the δ × 2δ rectangle if the points shown on line l are actually pairs of coincident points with one point in PL and one in PR.
-
It creates an array Y′, which is the array Y with all points not in the 2δ-wide vertical strip removed. The array Y′ is sorted by y-coordinate, just as Y is.
-
For each point p in the array Y′, the algorithm tries to find points in Y′ that are within δ units of p. As we shall see shortly, only the 7 points in Y′ that follow p need be considered. The algorithm computes the distance from p to each of these 7 points and keeps track of the closest-pair distance δ′ found over all pairs of points in Y′.
-
If δ′ < δ, then the vertical strip does indeed contain a closer pair than was found by the recursive calls. This pair and its distance δ′ are returned. Otherwise, the closest pair and its distance δ found by the recursive calls are returned.
The above description omits some implementation details that are necessary to achieve the O(n lg n) running time. After proving the correctness of the algorithm, we shall show how to implement the algorithm to achieve the desired time bound.
Correctness: The correctness of this closest-pair algorithm is obvious, except for two aspects. First, by bottoming out the recursion when |P| ≤ 3, we ensure that we never try to solve a subproblem consisting of only one point. The second aspect is that we need only check the 7 points following each point p in array Y′; we shall now prove this property.
Suppose that at some level of the recursion, the closest pair of points is pL ∈ PL and pR ∈ PR. Thus, the distance δ′ between pL and pR is strictly less than δ. Point pL must be on or to the left of line l and less than δ units away. Similarly, pR is on or to the right of l and less than δ units away. Moreover, pL and pR are within δ units of each other vertically. Thus, as Figure 33.11(a) shows, pL and pR are within a δ × 2δ rectangle centered at line l. (There may be other points within this rectangle as well.)