Applying C - Floating Point |
Written by Harry Fairhead | ||||||||
Tuesday, 21 May 2024 | ||||||||
Page 4 of 4
Floating Point ReconsideredThis is by no means all you need to know about floating point arithmetic. What really matters is that you don't take it for granted that you will get the right answer when you make use of it. It should now be obvious why: float a = 0.1f; float total; for(int i = 0;i<1000;i++){ total = total+a; } printf("%.7f\n", total); printf("%d\n",total == 100.0); prints 99.9990463 and 0, 0.1. isn't exactly representable as a binary fraction it is 0.0001100110011.. This is the reason for the usual advice of "don't test floating point numbers for equality". However, in this case it is more general in that the same problem arises with the corresponding fixed point value. That is, it is more to do with binary fractions than it is to do with floating point representation. If you think that such a small error could never make a difference, consider the error in the Patriot missile system. The system used an integer timing register which was incremented at intervals of 0.1 seconds. However, the integers were converted to decimal numbers by multiplying by the binary approximation of 0.1. After 100 hours, an error of approximately 0.3433 seconds was present in the conversion. As a result, an Iraqi Scud missile could not be accurately targeted and was allowed to detonate on a barracks, killing 28 people. The recommended way of testing for equality between floating point values is to use something like: if(fabsf((total - 100.0))/100.0 <=FLT_EPSILON))... FLT_EPSILON is a macro that gives you the accuracy of a float. There are other useful constants defined in float.h. The idea is that if two numbers differ by less than the accuracy of the representation then they can be considered equal. Of course in practice numbers computed in different ways accumulate errors that are larger than the representational error. In the case of the example above with a = 0.1, the two numbers are very much further apart than FLT_EPSILON due to the inability to represent 0.1 in binary. In practice is usual to include a factor that summarizes the errors in the computation something like: if(fabsf((total - 100.0))/100.0 <=K*FLT_EPSILON))... To get our example to test equal, K has to be 80 or more. However, a small change and K has to be bigger. Run the loop a thousand times and compare the result to 1000 and K has to be even bigger. The point is that there is no single way of setting a reasonable interval that works for a range of computations. You have to analyze the computation to find out what it is safe to regard as being equal. This leads us into the realm of numerical analysis. If you applying any formula then it is always worth checking what the best way to compute it is. It is rare that the form given in a textbook is the best way to compute a quantity. For example, the mean is traditionally computed using: float total; int n = 1000000; for (int i = 0; i < n; i++) { total = total + (float)i; } total = total/(float)n; printf("%f\n",total); This forms a total and then divides by the number of items. The problem with this is that the total gets very big and we lose precision by adding comparatively small values to it. If you try it, you will discover that instead of 500000.00 the result is 499940.375000. Using the alternative iterative method, which keeps the size of the running estimate down: total = 0; for (int i = 0; i < n; i++) { total = total + ((float)i-total)/(i+1); } printf("%f\n",total); gives a result of 499999.500000 which is only wrong by 0.5. There are even better methods of computing the mean - see Kahan Summation and Pairwise Summation. In many cases you can't avoid a detailed analysis of a calculation but it helps to have an idea of why things go wrong when you are using floating point. Imagine that you are working with three significant digits. For addition everything is fine as long as the exponents allow the digits to interact. For example consider: written out like this: 123 + 4670 4793 Normalizing this gives 4.79x103 and you can see that, ignoring rounding etc, only two digits of each value "overlapped" in the sum. If the exponents differ by 4 then none of digits are involved in the sum. For example 1.23 x 102 + 4.67 x 106 = 123 + 4670000 4670123 and after normalizing the result we have 4.67x106. Clearly for addition and subtraction if you are working with floating point numbers with a precision of d then the accuracy of adding and subtracting goes down as the difference between the exponents approaches d. This is the sense in which you need to be careful about floating point arithmetic involving large and small numbers. There are no similar problems with multiplication and division, apart from the accumulation of errors if operations are performed in succession. Finally, if possible always use double or larger floating point types. Whereas float has 7 decimal digits of precision, double has 15 digits and this provides useful latitude. Summary
Now available as a paperback or ebook from Amazon.Applying C For The IoT With Linux
Also see the companion book: Fundamental C <ASIN:1871962609> <ASIN:1871962617> Related ArticlesRemote C/C++ Development With NetBeans Getting Started With C/C++ On The Micro:bit To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info |
||||||||
Last Updated ( Tuesday, 21 May 2024 ) |