Advanced Hashing
Written by Mike James   
Thursday, 22 April 2021
Article Index
Advanced Hashing
Extensible Hashing
Perfect Hashing

Perfect hashing

This use of a table to construct a hash function produces excellent hash function behaviour but it also opens up another possibility.

As the table determines where any particular key will be hashed to and the table is something that we create why not try to create tables with advantageous properties. For example, why not test the quality of the hashing function by trying it out on a random selection of keys and see where they are hashed to. If the hash function produces a lot of collisions then you can scrap it and try again.

Of course being programmers the obvious thing to do is to write a program that generates hash functions and tests them automatically - a sort of designer hash function factory!

Now take this search for a good hashing function one tiny step further. Let's look for a perfect hash function! If you know that there is a particular set of data that you want to work with you could use this as your test set of keys and leave the hash function factory searching until it found a table based hash function that produces zero collisions! There is no guarantee that such a table will be found in a reasonable amount of time but this is a gamble worth taking.

A perfect hash function can be used to store the test set of keys without collision and so you can find them again with a single lookup.

Of course as soon as you move beyond the test set of keys collisions will happen but in some applications this might not be important.

For example, if you were writing a compiler then you could search for a perfect hash function for the keywords of the language as part of the tokenisation pass. In this case it would be even better if the hash function hashed each keyword onto consecutive values without gaps i.e. the first keyword hashed to 1, the second to 2 and so on.

Minimal perfect hash function

Such a hash function is called a minimal perfect hash function for the set of keys.

The uses of a minimal perfect hash function aren't that wide but they are important.

For example, in the case of the compiler tokenisation routine each identifier could be hashed in turn. The hash values of the keywords could be arranged to lie in a known range by using a minimal perfect hash function constructed using them. The non-keyword identifiers would be stored in the table in the usual way with collision handling provided.

Consider the problem of detecting particular key words within an input stream of characters. A minimal perfect hash function constructed for the words that you are trying to detect in the input stream would work very well as an alarm system. Simply hash the data stream and look out for the particular range of values that the keywords occupy. Notice that using this method you are testing for the presence of all the keywords with a single hash function!

Perfect hash functions sound useful but the difficulty is in constructing them. The crude trial and error method is very slow and to do the job properly you need to invent a few heuristics to guide the search by modifying the existing table, rather than starting with a completely fresh table each time. 

 

 

hash3

Related Articles

Hashing - The Greatest Idea In Programming

The Bloom Filter

The Invertible Bloom Filter

Universal Hashing

Storage Mapping Functions       

Inside Bitcoin - virtual currency

Assemblers and assembly language      

 

What Programmers Know

knowcover

Contents

  1. The Computer - What's The Big Idea?*
  2. The Memory Principle - Computer Memory and Pigeonholes*
  3. Principles of Execution - The CPU
  4. The Essence Of Programming
  5. Variables - Scope, Lifetime And More*
  6. Binary Arithmetic
  7. Hexadecimal*
  8. Binary - Negative Numbers*
  9. Floating Point Numbers*
  10. Inside the Computer - Addressing
  11. The Mod Function
  12. Recursion
  13. The Lost Art Of The Storage Mapping Function *
  14. Hashing - The Greatest Idea In Programming
  15. Advanced Hashing
  16. XOR - The Magic Swap*
  17. Programmer's Introduction to XML
  18. From Data To Objects*
  19. What Exactly Is A First Class Function - And Why You Should Care*
  20. Stacks And Trees*
  21. The LIFO Stack - A Gentle Guide*
  22. Data Structures - Trees
  23. Inside Random Numbers
  24. The Monte Carlo Method
  25. Cache Memory And The Caching Principle
  26. Data Compression The Dictionary Way
  27. Dates Are Difficult*
  28. Sequential Storage*
  29. Magic of Merging*
  30. Power of Operators
  31. The Heart Of A Compiler*
  32. The Fundamentals of Pointers
  33. Functional And Dysfunctional Programming*

* Recently revised

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Hexadecimal

Hexadecimal is the most common way of displaying the raw data sitting in a machine's memory, but if you are not familiar with it you might ask "What the hex..?"



Power of Operators

This article  is more or less everything that the working programmer should know about operators and their associated expressions and, of course, the use of parentheses.


Other Articles

 

<ASIN:0071460519>

<ASIN:3540431012>

<ASIN:1848000693>



Last Updated ( Thursday, 22 April 2021 )