The Lost Art Of The Storage Mapping Function
Written by Harry Fairhead   
Thursday, 04 October 2018
Article Index
The Lost Art Of The Storage Mapping Function
SMF For Trees!?
Hashing is just an SMF!

Hashing - Just Another SMF

If you are still unimpressed by the idea of an SMF then perhaps my last example will please you. The whole idea of an SMF can be generalised to include a function that maps some elements into the same storage location. 

This may seem like a crazy idea but you might have come across it before under the names "scatter storage" or "hash functions". Whatever you call it, it has to be one of the nicest ideas in the whole of computing - see:
Hashing - The Greatest Idea In Programming.

The principle of an SMF is that given one data value, the key, you can find the location of an associated data value using f(key).

All of the SMFs we have looked at so far have been very regular. They make use of the regularity of the data to map it to a one-after the other sequential storage - but sometimes this isn't necessary. 

Suppose you can find a function, any old function, f(key) that gives you a location for all possible values of the key and in most but not all cases gives you different locations for different keys - then why not use it?

For example, suppose you want to store words in an array. You could use the SMF given by adding together the ASCII codes of the first two letters minus 128.

For example, f(CAT) would be 67+65-128 (ASCII codes of C and A minus 128) or 4. This means that you could store CAT in location 4. In the same way f(DOG) is 68+79-128 = 19 and so DOG would be stored in location 19. 

This works just as well as a regular SMF but with one problem sometimes two different keys will be mapped to the same location. For example as only the first two letters are used f(CAR) is the same as f(CAT) and we would attempt to store the two at the same place.

This is called a collision and different scatter storage or hashing schemes deal with the problem in different ways. The easiest thing to do is to check to see if the location given by f(key) has been used and if it has check f(key)+1, f(key)+2 and so on until you find a free location. There are lots of variations on collision management but this linear search is simple and fairly efficient.

If you don't see the point of hashing functions then try the problem of storing and subsequently finding names in an array. Without hash functions you either have to perform a sequential search of the array or sort the array and perform a quadratic (binary) search. The former is inefficient and the latter complex and has the overhead of a complete sort. A hashing function gives you the location of any word in one evaluation and even if there is a collision you should find the word after a short linear search.

There is a lot more to be said about hash functions but the main thing is that you see them as nothing more than slightly odd SMFs.

You can think of hashing as using a chaotic or pseudo random SMF. 

SMFs Hidden Rather Than Forgotten

So it looks as if Storage Mapping Functions are alive and well after all. Perhaps the sub-title of this article should have been the "hidden" rather than "lost" art.

Related Articles

Hashing - The Greatest Idea In Programming

Advanced Hashing 

The Bloom Filter

The Invertible Bloom Filter

Universal Hashing

Inside Bitcoin - virtual currency

Assemblers and assembly language

      

What Programmers Know

knowcover

Contents

  1. The Computer - What's The Big Idea?*
  2. The Memory Principle - Computer Memory and Pigeonholes*
  3. Principles of Execution - The CPU
  4. The Essence Of Programming
  5. Variables - Scope, Lifetime And More*
  6. Binary Arithmetic
  7. Hexadecimal*
  8. Binary - Negative Numbers*
  9. Floating Point Numbers*
  10. Inside the Computer - Addressing
  11. The Mod Function
  12. Recursion
  13. The Lost Art Of The Storage Mapping Function *
  14. Hashing - The Greatest Idea In Programming
  15. Advanced Hashing
  16. XOR - The Magic Swap*
  17. Programmer's Introduction to XML
  18. From Data To Objects*
  19. What Exactly Is A First Class Function - And Why You Should Care*
  20. Stacks And Trees*
  21. The LIFO Stack - A Gentle Guide*
  22. Data Structures - Trees
  23. Inside Random Numbers
  24. The Monte Carlo Method
  25. Cache Memory And The Caching Principle
  26. Data Compression The Dictionary Way
  27. Dates Are Difficult*
  28. Sequential Storage*
  29. Magic of Merging*
  30. Power of Operators
  31. The Heart Of A Compiler*
  32. The Fundamentals of Pointers
  33. Functional And Dysfunctional Programming*

* Recently revised

espbook

 

Comments




or email your comment to: comments@i-programmer.info

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Programmer's Guide To Theory - Gödel And All That

Given infinite computing power surely there cannot be any problem or puzzle that is incapable of solution? The famous, or infamous, incompleteness theory of Kurt Gödel says different, but what d [ ... ]



Programmer's Guide To Theory - Splitting the Bit

Information theory – perhaps one of the most remarkable inventions of the twentieth century - naturally leads on to the consideration of how information can be coded and hence coding theory.


Other Articles

 

 

<ASIN:1584504951>

<ASIN:0201000237>



Last Updated ( Thursday, 04 October 2018 )