|Written by Mike James|
|Thursday, 22 April 2021|
Page 3 of 3
This use of a table to construct a hash function produces excellent hash function behaviour but it also opens up another possibility.
As the table determines where any particular key will be hashed to and the table is something that we create why not try to create tables with advantageous properties. For example, why not test the quality of the hashing function by trying it out on a random selection of keys and see where they are hashed to. If the hash function produces a lot of collisions then you can scrap it and try again.
Of course being programmers the obvious thing to do is to write a program that generates hash functions and tests them automatically - a sort of designer hash function factory!
Now take this search for a good hashing function one tiny step further. Let's look for a perfect hash function! If you know that there is a particular set of data that you want to work with you could use this as your test set of keys and leave the hash function factory searching until it found a table based hash function that produces zero collisions! There is no guarantee that such a table will be found in a reasonable amount of time but this is a gamble worth taking.
A perfect hash function can be used to store the test set of keys without collision and so you can find them again with a single lookup.
Of course as soon as you move beyond the test set of keys collisions will happen but in some applications this might not be important.
For example, if you were writing a compiler then you could search for a perfect hash function for the keywords of the language as part of the tokenisation pass. In this case it would be even better if the hash function hashed each keyword onto consecutive values without gaps i.e. the first keyword hashed to 1, the second to 2 and so on.
Minimal perfect hash function
Such a hash function is called a minimal perfect hash function for the set of keys.
The uses of a minimal perfect hash function aren't that wide but they are important.
For example, in the case of the compiler tokenisation routine each identifier could be hashed in turn. The hash values of the keywords could be arranged to lie in a known range by using a minimal perfect hash function constructed using them. The non-keyword identifiers would be stored in the table in the usual way with collision handling provided.
Consider the problem of detecting particular key words within an input stream of characters. A minimal perfect hash function constructed for the words that you are trying to detect in the input stream would work very well as an alarm system. Simply hash the data stream and look out for the particular range of values that the keywords occupy. Notice that using this method you are testing for the presence of all the keywords with a single hash function!
Perfect hash functions sound useful but the difficulty is in constructing them. The crude trial and error method is very slow and to do the job properly you need to invent a few heuristics to guide the search by modifying the existing table, rather than starting with a completely fresh table each time.
What Programmers Know
* Recently revised
or email your comment to: firstname.lastname@example.org
|Last Updated ( Thursday, 22 April 2021 )|