Open Addressing

Open addressing: In open addressing, all elements are stored in the hash table itself. That is, each table entry contains either an element of the dynamic set or NIL. When searching for an element, we systematically examine table slots until the desired element is found or it is clear that the element is not in the table. There are no lists and no elements stored outside the table, as there are in chaining. Thus, in open addressing, the hash table can "fill up" so that no further insertions can be made; the load factor α can never exceed 1.

Of course, we could store the linked lists for chaining inside the hash table, in the otherwise unused hash-table slots, but the advantage of open addressing is that it avoids pointers altogether. Instead of following pointers, we compute the sequence of slots to be examined. The extra memory freed by not storing pointers provides the hash table with a larger number of slots for the same amount of memory, potentially yielding fewer collisions and faster retrieval.

To perform insertion using open addressing, we successively examine, or probe, the hash table until we find an empty slot in which to put the key. Instead of being fixed in the order 0, 1, ..., m - 1 (which requires Θ(n) search time), the sequence of positions probed depends upon the key being inserted. To determine which slots to probe, we extend the hash function to include the probe number (starting from 0) as a second input. Thus, the hash function becomes

h : U × {0, 1, ..., m - 1} → {0, 1, ..., m - 1}.

With open addressing, we require that for every key k, the probe sequence

〈h(k,0),h(k,1), ..., h(k,m - 1)〉

be a permutation of 〈0, 1, ..., m -1〉, so that every hash-table position is eventually considered as a slot for a new key as the table fills up. In the following pseudocode, we assume that the elements in the hash table T are keys with no satellite information; the key k is identical to the element containing key k. Each slot contains either a key or NIL (if the slot is empty).

	HASH-INSERT(T, k)
1  i ← 0
2  repeat j ← h(k, i)
3         if T[j] = NIL
4            then T[j] ← k
5                 return j
6            else i ← i   1
7   until i = m
8 error "hash table overflow"

The algorithm for searching for key k probes the same sequence of slots that the insertion algorithm examined when key k was inserted. Therefore, the search can terminate (unsuccessfully) when it finds an empty slot, since k would have been inserted there and not later in its probe sequence. (This argument assumes that keys are not deleted from the hash table.) The procedure HASH-SEARCH takes as input a hash table T and a key k, returning j if slot j is found to contain key k, or NIL if key k is not present in table T.

	HASH-SEARCH(T, k)
1  i ← 0
2  repeat j ← h(k, i)
3         if T[j] = k
4            then return j
5         i ← i   1
6    until T[j] = NIL or i = m
7  return NIL

Deletion from an open-address hash table is difficult. When we delete a key from slot i, we cannot simply mark that slot as empty by storing NIL in it. Doing so might make it impossible to retrieve any key k during whose insertion we had probed slot i and found it occupied. One solution is to mark the slot by storing in it the special value DELETED instead of NIL. We would then modify the procedure HASH-INSERT to treat such a slot as if it were empty so that a new key can be inserted. No modification of HASH-SEARCH is needed, since it will pass over DELETED values while searching. When we use the special value DELETED, however, search times are no longer dependent on the load factor α, and for this reason chaining is more commonly selected as a collision resolution technique when keys must be deleted.

In our analysis, we make the assumption of uniform hashing: we assume that each key is equally likely to have any of the m! permutations of 〈0, 1, ..., m - 1〉 as its probe sequence. Uniform hashing generalizes the notion of simple uniform hashing defined earlier to the situation in which the hash function produces not just a single number, but a whole probe sequence. True uniform hashing is difficult to implement, however, and in practice suitable approximations (such as double hashing, defined below) are used.

Three techniques are commonly used to compute the probe sequences required for open addressing: linear probing, quadratic probing, and double hashing. These techniques all guarantee that 〈h(k, 0), h(k, 1), ..., h(k, m - 1)〉 is a permutation of 〈0, 1, ..., m - 1〉 for each key k. None of these techniques fulfills the assumption of uniform hashing, however, since none of them is capable of generating more than m² different probe sequences (instead of the m! that uniform hashing requires). Double hashing has the greatest number of probe sequences and, as one might expect, seems to give the best results.

Linear probing

Given an ordinary hash function h' : U → {0, 1, ..., m - 1}, which we refer to as an auxiliary hash function, the method of linear probing uses the hash function

h(k, i) = (h'(k) i) mod m

for i = 0, 1, ..., m - 1. Given key k, the first slot probed is T[h'(k)], i.e., the slot given by the auxiliary hash function. We next probe slot T[h'(k) 1], and so on up to slot T[m - 1]. Then we wrap around to slots T[0], T[1], ..., until we finally probe slot T[h'(k) - 1]. Because the initial probe determines the entire probe sequence, there are only m distinct probe sequences.

Linear probing is easy to implement, but it suffers from a problem known as primary clustering. Long runs of occupied slots build up, increasing the average search time. Clusters arise since an empty slot preceded by i full slots gets filled next with probability (i 1)/m. Long runs of occupied slots tend to get longer, and the average search time increases.