The Divide-and-conquer Approach
The divide-and-conquer approach: Many useful algorithms are recursive in structure: to solve a given problem, they call themselves recursively one or more times to deal with closely related subproblems. These algorithms typically follow a divide-and-conquer approach: they break the problem into several subproblems that are similar to the original problem but smaller in size, solve the subproblems recursively, and then combine these solutions to create a solution to the original problem.
The divide-and-conquer paradigm involves three steps at each level of the recursion:
- Dividethe problem into a number of subproblems.
- Conquerthe subproblems by solving them recursively. If the subproblem sizes are small enough, however, just solve the subproblems in a straightforward manner.
- Combinethe solutions to the subproblems into the solution for the original problem.
The merge sort algorithm closely follows the divide-and-conquer paradigm. Intuitively, it operates as follows.
- Divide:Divide the n-element sequence to be sorted into two subsequences of n/2 elements each.
- Conquer:Sort the two subsequences recursively using merge sort.
- Combine:Merge the two sorted subsequences to produce the sorted answer.
The recursion "bottoms out" when the sequence to be sorted has length 1, in which case there is no work to be done, since every sequence of length 1 is already in sorted order.
The key operation of the merge sort algorithm is the merging of two sorted sequences in the "combine" step. To perform the merging, we use an auxiliary procedure MERGE(A, p, q, r), where A is an array and p, q, and r are indices numbering elements of the array such that p ≤ q < r. The procedure assumes that the subarrays A[p ‥q] and A[q 1 ‥r] are in sorted order. It merges them to form a single sorted subarray that replaces the current subarray A[p ‥r].
Our MERGE procedure takes time Θ(n), where n = r - p 1 is the number of elements being merged, and it works as follows. Returning to our card-playing motif, suppose we have two piles of cards face up on a table. Each pile is sorted, with the smallest cards on top. We wish to merge the two piles into a single sorted output pile, which is to be face down on the table. Our basic step consists of choosing the smaller of the two cards on top of the face-up piles, removing it from its pile (which exposes a new top card), and placing this card face down onto the output pile. We repeat this step until one input pile is empty, at which time we just take the remaining input pile and place it face down onto the output pile. Computationally, each basic step takes constant time, since we are checking just two top cards. Since we perform at most n basic steps, merging takes Θ(n) time.
The following pseudocode implements the above idea, but with an additional twist that avoids having to check whether either pile is empty in each basic step. The idea is to put on the bottom of each pile a sentinel card, which contains a special value that we use to simplify our code. Here, we use ∞ as the sentinel value, so that whenever a card with ∞ is exposed, it cannot be the smaller card unless both piles have their sentinel cards exposed. But once that happens, all the nonsentinel cards have already been placed onto the output pile. Since we know in advance that exactly r - p 1 cards will be placed onto the output pile, we can stop once we have performed that many basic steps.
In detail, the MERGE procedure works as follows. Line 1 computes the length n1 of the subarray A[p ‥q], and line 2 computes the length n2 of the subarray A[q 1 ‥r]. We create arrays L and R ("left" and "right"), of lengths n1 1 and n2 1, respectively, in line 3. The for loop of lines 4-5 copies the subarray A[p ‥q] into L[1 ‥n1], and the for loop of lines 6-7 copies the subarray A[q 1 ‥r] into R[1 ‥n2]. Lines 8-9 put the sentinels at the ends of the arrays L and R. Lines 10-17, illustrated in Figure 2.3, perform the r - p 1 basic steps by maintaining the following loop invariant:
At the start of each iteration of the for loop of lines 12-17, the subarray A[p ‥k - 1] contains the k - p smallest elements of L[1 ‥n1 1] and R[1 ‥n2 1], in sorted order. Moreover, L[i] and R[j] are the smallest elements of their arrays that have not been copied back into A.
Figure 2.3: The operation of lines 10-17 in the call MERGE(A, 9, 12, 16), when the subarray A[9 ‥16] contains the sequence 〈2, 4, 5, 7, 1, 2, 3, 6〉. After copying and inserting sentinels, the array L contains 〈2, 4, 5, 7, ∞〉, and the array R contains 〈1, 2, 3, 6, ∞〉. Lightly shaded positions in A contain their final values, and lightly shaded positions in L and R contain values that have yet to be copied back into A. Taken together, the lightly shaded positions always comprise the values originally in A[9 ‥16], along with the two sentinels. Heavily shaded positions in A contain values that will be copied over, and heavily shaded positions in L and R contain values that have already been copied back into A. (a)-(h) The arrays A, L, and R, and their respective indices k, i, and j prior to each iteration of the loop of lines 12-17. (i) The arrays and indices at termination. At this point, the subarray in A[9 ‥16] is sorted, and the two sentinels in L and R are the only two elements in these arrays that have not been copied into A.