Design Analysis of Algorithm

Selection In Expected Linear Time

Selection in expected linear time: The general selection problem appears more difficult than the simple problem of finding a minimum. Yet, surprisingly, the asymptotic running time for both problems is the same: Θ(n). In this section, we present a divide-and-conquer algorithm for the selection problem. As in quicksort, the idea is to partition the input array recursively. But unlike quicksort, which recursively processes both sides of the partition, RANDOMIZED-SELECT only works on one side of the partition. This difference shows up in the analysis: whereas quicksort has an expected running time of Θ(n lg n), the expected time of RANDOMIZED-SELECT is Θ(n).

RANDOMIZED-SELECT uses the procedure RANDOMIZED-PARTITION introduced in Section 7.3. Thus, like RANDOMIZED-QUICKSORT, it is a randomized algorithm, since its behavior is determined in part by the output of a random-number generator. The following code for RANDOMIZED-SELECT returns the ith smallest element of the array A[p .. r].

	RANDOMIZED-SELECT(A, p, r, i)
1  if p = r
2      then return A[p]
3  q  RANDOMIZED-PARTITION(A, p, r)
4  k  q - p   1
5  if i = k           the pivot value is the answer
6      then return A[q]
7  elseif i < k
8      then return RANDOMIZED-SELECT(A, p, q - 1, i)
9  else return RANDOMIZED-SELECT(A, q   1, r, i - k)

After RANDOMIZED-PARTITION is executed in line 3 of the algorithm, the array A[p .. r] is partitioned into two (possibly empty) subarrays A[p .. q - 1] and A[q 1 .. r] such that each element of A[p .. q - 1] is less than or equal to A[q], which in turn is less than each element of A[q 1 .. r]. As in quicksort, we will refer to A[q] as the pivot element. Line 4 of RANDOMIZED-SELECT computes the number k of elements in the subarray A[p .. q], that is, the number of elements in the low side of the partition, plus one for the pivot element. Line 5 then checks whether A[q] is the ith smallest element. If it is, then A[q] is returned. Otherwise, the algorithm determines in which of the two subarrays A[p .. q - 1] and A[q 1 .. r] the ith smallest element lies. If i < k, then the desired element lies on the low side of the partition, and it is recursively selected from the subarray in line 8. If i > k, however, then the desired element lies on the high side of the partition. Since we already know k values that are smaller than the ith smallest element of A[p .. r]-namely, the elements of A[p .. q]-the desired element is the (i - k)th smallest element of A[q 1 .. r], which is found recursively in line 9.

The worst-case running time for RANDOMIZED-SELECT is Θ(n2), even to find the minimum, because we could be extremely unlucky and always partition around the largest remaining element, and partitioning takes Θ(n) time. The algorithm works well in the average case, though, and because it is randomized, no particular input elicits the worst-case behavior.

The time required by RANDOMIZED-SELECT on an input array A[p .. r] of n elements is a random variable that we denote by T(n), and we obtain an upper bound on E [T(n)] as follows. Procedure RANDOMIZED-PARTITION is equally likely to return any element as the pivot. Therefore, for each k such that 1 k n, the subarray A[p .. q] has k elements (all less than or equal to the pivot) with probability 1/n. For k = 1, 2,..., n, we define indicator random variables Xk where

Xk = I{the subarray A[p .. q] has exactly k elements} ,

and so we have

When we call RANDOMIZED-SELECT and choose A[q] as the pivot element, we do not know, a priori, if we will terminate immediately with the correct answer, recurse on the subarray A[p .. q - 1], or recurse on the subarray A[q 1 .. r]. This decision depends on where the ith smallest element falls relative to A[q]. Assuming that T(n) is monotonically increasing, we can bound the time needed for the recursive call by the time needed for the recursive call on the largest possible input. In other words, we assume, to obtain an upper bound, that the ith element is always on the side of the partition with the greater number of elements. For a given call of RANDOMIZED-SELECT, the indicator random variable Xk has the value 1 for exactly one value of k, and it is 0 for all other k. When Xk = 1, the two subarrays on which we might recurse have sizes k - 1 and n - k. Hence, we have the recurrence

Taking expected values, we have