|Floating Point Numbers|
|Written by Mike James|
|Sunday, 17 September 2023|
Page 1 of 3
Inconvenient though they may be, fractions are the real stuff of numbers and to work with them we need to know about floating point numbers ...
What Programmers Know
* Recently revised
Binary numbers are almost boring!
Well that’s a common sentiment among high-level language programmers but then they don’t have to go delving into the depths of exactly how the machine does arithmetic. What usually happens is that their disinterest comes back to haunt them at the point when it is most inconvenient, i.e when something doesn't work.
You need to know a little about how the simple binary counting numbers can be turned into the sort of numbers every banker, scientist and engineer knows and loves.
Other articles cover how to do arithmetic in simple binary and how to represent negative numbers using one's or two's complement so that they don’t need special treatment when performing arithmetic. One of the things I haven't already mentioned is that with only a little extra trouble we can make two's complement work with multiplication and division so that the usual rules of arithmetic, i.e. a minus times a minus is a plus, work automatically.
However, if you use the obvious algorithms that work for positive numbers the extra trouble that you have to go to seems very complex and involves adding different correction factors depending on the signs of the values involved in the result.
A much simpler method exists. Booth’s algorithm, invented in 1951, always gives you the correct answers no matter what the sign. I say simple but while the algorithm may be simple to perform it isn’t easy to see how it actually works. see Computer Arithmetic Algorithms (2001) by Israel Koren (see the sidebar for details) which also goes into more detail about other concepts introduced here.
Andrew Booth invented an easy way to multiply and divide, even if the numbers were negative
So with this complication out of the way it looks as though the matter is closed – but what about fractions?
Inconvenient though they may be, fractions are the real stuff of number. After you have mastered the art of counting whole things and adding, subtracting, multiplying and dividing whole things then you have to move on to arithmetic that involves fractions.
I could at this point tell you all about the mathematical theory of rational and irrational numbers and tell you how they were invented and why, but this would be just a diversion from the main problem. Very interesting from a theoretical point of view but in most cases the ideas go well beyond what is needed for practical calculations.
Put simply a rational number is what you usually refer to as a fraction i.e. a ratio of two integers, hence the name. It might come as something of a shock to discover that there are numbers that you cannot represent as a ratio of two integers. One of the earliest to be discovered was the square root of 2. It was proved by the early Greek mathematicians that there was no ratio like a/b that was the square root of 2 by an argument that involves the even or odd-ness of a and b and a contradiction. It turns out that there are lots of such numbers, the irrationals, in fact a whole order of infinity more than the rationals.
If you can't write down an irrational as a ratio - how do you write it down?
This is a very deep question and it was only at the start of the 20th century that mathematicians got the details sorted out. Put simply, an irrational number corresponds to a non-terminating and non-repeating decimal. Terminating and repeating decimal values correspond to rational numbers. When we do calculations in the real world we make use of rational approximations to irrational numbers - i.e. we use a rational that is so close to the square root of 2 that the error doesn't make any important difference.
As far as a computer is concerned there are only rational numbers and rational approximations to irrational numbers so let’s stay simple and worry about the representation of rational fractions.
The first notation that we use for fractions is based on the operation of division. We write a/b to stand for the fraction that you get when you divide a by b.
As this is a “ratio” this sort of fraction is referred to as “rational”.
There is an alternative way of representing a rational fraction using the place value system that we have come to rely on in both decimal and binary. What is surprising is that this well-known and almost obvious method took a great deal of time to invent and catch on.
The first mention of the idea of a decimal point occurs in the 10th century but it didn’t prove popular and it was forgotten. Decimals appeared again in the 15th century as “the Turkish” method. It took until the 16th century and many re-inventions of the idea before it started to make sense to the people who had to use it.
But it wasn’t until the 17th century and the invention of logarithms that the decimal point became at all common.
Enter the radix
The place value system uses increasing powers of the “radix” with each place you move to the left, e.g. if the radix is ten then the powers are:
and the number 1234, for example, is just:
To extend this system to fractions we need to invent the decimal point – or more generally the “radix” point.
Places to the right of the decimal point correspond to negative powers of the radix. That is:
and the number 1234.56 is
You can see that you need to know the mathematical facts that 100 is 1 and 10-x is 1/10x for this to make perfect sense.
Factions are easy enough to write down as long as you know how to find the sum of the negative radix powers, i.e. 1/10, 1/100, 1/1000, 1/1000 and so on, that equal the fraction.
It is sometimes the case that it is impossible to find an exact place value representation of a rational fraction.
For example, 1/3 is 0.3333r, where the r means “recurring” and the threes go on forever. Clearly though, as the negative radix powers get smaller and smaller, we very quickly get as close as we like to any rational fraction.
The same ideas apply to fractions in any radix place value system. In binary for example everything works in the same way but we call the “point” a binary point and the powers to the right are powers of two, e.g.
Notice that now the rational fractions have to be expressed as sums of 1/2, 1/4, 1/8, 1/16 and so on.
What this means is that the rational fractions that don’t have exact representations in binary are different to those in decimal. For example, in decimal 1/5 is exactly 0.2 but in binary it is 0.0011r where the “r” once again means recurring.
|Last Updated ( Sunday, 17 September 2023 )|