Perfect Hashing
Perfect hashing: Although hashing is most often used for its excellent expected performance, hashing can be used to obtain excellent worst-case performance when the set of keys is static: once the keys are stored in the table, the set of keys never changes. Some applications naturally have static sets of keys: consider the set of reserved words in a programming language, or the set of file names on a CD-ROM. We call a hashing technique perfect hashing if the worst-case number of memory accesses required to perform a search is O(1).
The basic idea to create a perfect hashing scheme is simple. We use a two-level hashing scheme with universal hashing at each level. Figure 11.6 illustrates the approach.
The first level is essentially the same as for hashing with chaining: the n keys are hashed into m slots using a hash function h carefully selected from a family of universal hash functions.
Instead of making a list of the keys hashing to slot j, however, we use a small secondary hash table Sj with an associated hash function hj. By choosing the hash functions hj carefully, we can guarantee that there are no collisions at the secondary level.
In order to guarantee that there are no collisions at the secondary level, however, we will need to let the size mj of hash table Sj be the square of the number nj of keys hashing to slot j. While having such a quadratic dependence of mj on nj may seem likely to cause the overall storage requirements to be excessive, we shall show that by choosing the first level hash function well, the expected total amount of space used is still O(n).
We shall proceed in two steps. First, we shall determine how to ensure that the secondary tables have no collisions. Second, we shall show that the expected amount of memory used overall-for the primary hash table and all the secondary hash tables-is O(n).