Undefined Behavior Begone! |
Written by Harry Fairhead | |||
Wednesday, 02 April 2025 | |||
C++ guru Herb Sutter has a new take on taming the UB monsters in C++, but there is a sense in which the monster is of our own creation and slaying it isn't essential - just tell it to begone. The demon inside C++, and to a lesser extent in C, is undefined behavior or UB. As you probably know, UB is just a language construct that has an undefined interpretation. If you write such code then you have just written a non-program because UB should never occur in a run-able program. This of course is complete nonsense. We seem to be the victims of a misunderstanding of an intial attempt at machine independence and one that has been commandeered by a group of programmers intent on something quite different from creating run-able programs in C/C++. Back in the days when C was being formalized, it was thought to be OK to leave some constructs unspecified so as to let the hardware the machine was running on "decide" on what they meant. For example, the specification of negative number formats wasn't included in the language - it was UB - but if you did some arithmetic you expected to get the right answer even if the machine used one's complement, two's complement or sign magnitude etc. However, many bitwise operations are different depending on the representation and these too are UB. This is fine because generally C programmers know the architecture of the target machine and will adjust what they do according to how the UB pans out. The point is that the behavior is only undefined in the specification or standard, it isn't undefined in any sense in the real world. The hardware always makes UB very defined. Now the problem is that a very large group of programmers misunderstood UB. The compiler writers, in particular, noticed that if UB was really intended to be undefined then they could treat it as defining anything and any program that contained UB could be compiled to anything - or nothing even - and all in the spirit of optimization. This is simply crazy and deep down we all know that it is and it is a joke. Of course, UB in C++ is a little more subtle as many C++ programmers don't know the architecture of the machine that they are targeting and hence UB should be avoided, but when it occurs it still should be left to the machine to decide what happens not the compiler. The solution is obvious - get rid of UB by finding all instances of it and renaming it "machine defined" or something similar. This is so simple that everyone seems to think it's impossible and indeed Sutter's new post starts off from this position. He notes that there has been a lot of progress in removing UB from the language libraries. He credits the use of constexpr to detect UB at compile time rather than runtime - but only if the code is determined at compile time and much (most?) isn't. Then there is a list of UBs that have been eliminated from language features - uninitalized variables, adding access bounds to data types and so on. Not much of this is new, but his final point offers some hope: there are proposers and volunteers to
So at last we have a sane solution to an insane problem. There never should have been UB in C or C++ and now the compiler writers and language standards writers need to get the work done. The machine has no undefined behavior - programming, up to hardware problems and issues of timing, is always determinstitc. My final word, read Sutter's post. It has a lot more subtlety, humor and information than my take on the matter.
More InformationCrate-training Tiamat, un-calling Cthulhu:Taming the UB monsters in C++ Related ArticlesC Undefined Behavior - Depressing and Terrifying (Updated) C Pointer Declaration And Dereferencing GCC Gets An Award From ACM And A Blast From Linus To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info |
|||
Last Updated ( Wednesday, 02 April 2025 ) |