The Lost Art Of The Storage Mapping Function

Written by Harry Fairhead

Monday, 14 April 2025

Article Index
The Lost Art Of The Storage Mapping Function
SMF For Trees!?
Hashing is just an SMF!

Page 3 of 3

Hashing - Just Another SMF

If you are still unimpressed by the idea of an SMF then perhaps my last example will please you. The whole idea of an SMF can be generalised to include a function that maps some elements into the same storage location.

This may seem like a crazy idea but you might have come across it before under the names "scatter storage" or "hash functions". Whatever you call it, it has to be one of the nicest ideas in the whole of computing - see:
Hashing - The Greatest Idea In Programming.

The principle of an SMF is that given one data value, the key, you can find the location of an associated data value using f(key).

All of the SMFs we have looked at so far have been very regular. They make use of the regularity of the data to map it to a one-after the other sequential storage - but sometimes this isn't necessary.

Suppose you can find a function, any old function, f(key) that gives you a location for all possible values of the key and in most but not all cases gives you different locations for different keys - then why not use it?

For example, suppose you want to store words in an array. You could use the SMF given by adding together the ASCII codes of the first two letters minus 128.

For example, f(CAT) would be 67+65-128 (ASCII codes of C and A minus 128) or 4. This means that you could store CAT in location 4. In the same way f(DOG) is 68+79-128 = 19 and so DOG would be stored in location 19.

This works just as well as a regular SMF but with one problem sometimes two different keys will be mapped to the same location. For example as only the first two letters are used f(CAR) is the same as f(CAT) and we would attempt to store the two at the same place.

This is called a collision and different scatter storage or hashing schemes deal with the problem in different ways. The easiest thing to do is to check to see if the location given by f(key) has been used and if it has check f(key)+1, f(key)+2 and so on until you find a free location. There are lots of variations on collision management but this linear search is simple and fairly efficient.

If you don't see the point of hashing functions then try the problem of storing and subsequently finding names in an array. Without hash functions you either have to perform a sequential search of the array or sort the array and perform a quadratic (binary) search. The former is inefficient and the latter complex and has the overhead of a complete sort. A hashing function gives you the location of any word in one evaluation and even if there is a collision you should find the word after a short linear search.

There is a lot more to be said about hash functions but the main thing is that you see them as nothing more than slightly odd SMFs.

You can think of hashing as using a chaotic or pseudo random SMF.

SMFs Hidden Rather Than Forgotten

So it looks as if Storage Mapping Functions are alive and well after all. Perhaps the sub-title of this article should have been the "hidden" rather than "lost" art.

Hashing - The Greatest Idea In Programming

Advanced Hashing

The Bloom Filter

The Invertible Bloom Filter

Universal Hashing

Inside Bitcoin - virtual currency

Assemblers and assembly language

What Programmers Know

knowcover

* Recently revised

Comments

or email your comment to: comments@i-programmer.info

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Quick Median

You have probably heard of Quicksort, but what about Quick Median? This is another of the many partitioning algorithms that work in clever ways to do things faster. Quick Median is a useful and [ ... ]

+ Full Article

Compilers, Interpreters, VMs and JIT

The distinction between a compiler and an interpreter is one that can cause controversy. One programmer's compiler is another's interpreter and the whole subject gets very murky when you throw in the [ ... ]

+ Full Article

Hashing - Just Another SMF

SMFs Hidden Rather Than Forgotten

Related Articles

What Programmers Know

Contents

Comments