Linked-list Representation Of Disjoint Sets

Linked-list representation of disjoint sets: A simple way to implement a disjoint-set data structure is to represent each set by a linked list. The first object in each linked list serves as its set's representative. Each object in the linked list contains a set member, a pointer to the object containing the next set member, and a pointer back to the representative. Each list maintains pointers head, to the representative, and tail, to the last object in the list. Figure 21.2(a) shows two sets. Within each linked list, the objects may appear in any order (subject to our assumption that the first object in each list is the representative).

Figure 21.2: (a) Linked-list representations of two sets. One contains objects b, c, e, and h, with c as the representative, and the other contains objects d, f, and g, with f as the representative. Each object on the list contains a set member, a pointer to the next object on the list, and a pointer back to the first object on the list, which is the representative. Each list has pointers head and tail to the first and last objects, respectively. (b) The result of UNION(e, g). The representative of the resulting set is f.

With this linked-list representation, both MAKE-SET and FIND-SET are easy, requiring O(1) time. To carry out MAKE-SET(x), we create a new linked list whose only object is x. For FIND-SET(x), we just return the pointer from x back to the representative.

A simple implementation of union

The simplest implementation of the UNION operation using the linked-list set representation takes significantly more time than MAKE-SET or FIND-SET. As Figure 21.2(b) shows, we perform UNION(x, y) by appending x's list onto the end of y's list. We use the tail pointer for y's list to quickly find where to append x's list. The representative of the new set is the element that was originally the representative of the set containing y. Unfortunately, we must update the pointer to the representative for each object originally on x's list, which takes time linear in the length of x's list.

In fact, it is not difficult to come up with a sequence of m operations on n objects that requires Θ(n²) time. Suppose that we have objects x₁, x₂, ..., x_n. We execute the sequence of n MAKE-SET operations followed by n - 1 UNION operations shown in Figure 21.3, so that m = 2n - 1. We spend Θ(n) time performing the n MAKE-SET operations. Because the ith UNION operation updates i objects, the total number of objects updated by all n - 1 UNION operations is

Operation	Number of objects updated

MAKE-SET(x₁)	1
MAKE-SET(x₂)	1
⋮	⋮
MAKE-SET(x_n	1
UNION(x₁, x₂)	1
UNION(x₂, x₃)	2
UNION(x₃, x₄)	3
⋮	⋮
UNION(x_n-1, x_n)	n - 1

Figure 21.3: A sequence of 2n - 1 operations on n objects that takes Θ(n²) time, or Θ(n) time per operation on average, using the linked-list set representation and the simple implementation of UNION.

The total number of operations is 2n - 1, and so each operation on average requires Θ(n) time. That is, the amortized time of an operation is Θ(n).

A weighted-union heuristic: In the worst case, the above implementation of the UNION procedure requires an average of Θ(n) time per call because we may be appending a longer list onto a shorter list; we must update the pointer to the representative for each member of the longer list. Suppose instead that each list also includes the length of the list (which is easily maintained) and that we always append the smaller list onto the longer, with ties broken arbitrarily. With this simple weighted-union heuristic, a single UNION operation can still take Ω(n) time if both sets have Ω(n) members. As the following theorem shows, however, a sequence of m MAKE-SET, UNION, and FIND-SET operations, n of which are MAKE-SET operations, takes O(m n lg n) time.