C Undefined Behavior - Depressing and Terrifying (Updated) |
Written by Harry Fairhead | |||
Wednesday, 06 June 2018 | |||
Even if you are a fairly expert C user, you may not know about undefined behavior, and if you do you might not realize what a problem it has become. A new critique suggests it is every bit as bad as you might expect once you know what it is and what use is being made of it. I'm a heavy C user and I've been using it for a long time, mostly for low level tasks that are close to the metal. I'm not an academic user of C in the sense I don't spend a lot of time worrying about the theory of the language or how it might be made better. I see it as a low level assembler that allows me to write programs that are close to the metal without being in touch with the metal - if you see what I mean. A few years ago I started to hear about "undefined behaviour" and at first I dismissed it as academic nonsense; interesting academic nonsense, but not something I needed to worry about. I was wrong. I've been trying to work out how to explain undefined behaviour and its effects for some time, but the problem is this is an ongoing problem and not really news as in something just happened. So when I read an excellent blog post by Victor Yodaiken, C Standard Undefined Behavior Versus Wittgenstein, I decided it was time to do something to help. Undefined behaviour is used to mark syntactically correct C code as being outside of the standard. This seems like a reasonable idea at first, and can only help to make it clear what is and what is not covered by the standard. The problem is that undefined behaviour seems to cover things that it shouldn't. What is worse, it has given compiler writers permission to do almost anything with it when they encounter it, even to the point of changing the meaning of parts of the program well away from the undefined behaviour. For example, signed overflow is undefined. In any real machine, however, it is very much well-defined. Any compiler optimization that removes a test that is based on signed overflow is going to break a program that is running on a real machine with defined behavior. To be clear, if I write a loop that tests for a value that can only occur by a signed overflow, then the test for the overflow and the code that would be executed can be removed in the interests of optimization. This is not optimization, this is vandalism. Of course, well before undefined behavior became a thing, low-level programmers have had to fight the compiler optimizer; for example, having to mark a variable as volatile to stop busy-wait loops, i.e. empty loops, being optimized away. To quote Dennis Ritchie, undefined behaviour is: a license for the compiler to undertake aggressive optimizations that are completely legal by the committee’s rules, but make hash of apparently safe programs What is worse is that the standards people seem to be deaf to the complaints from sources you might think that they should take notice of. Take Linus Torvalds comment: Yeah, let's just say that the original C designers were better at their job than a gaggle of standards people who were making bad crap up to make some Fortran-style programs go faster. They don't speed up normal code either, they just introduce undefined behavior in a lot of code. And deleting NULL pointer checks because somebody made a mistake, and then turning that small mistake into a real and exploitable security hole? Not so smart either. It is not just the standards people we need to worry about. The compiler writers seem to have come off the rails as well. Once upon a time a compiler writer's aim was to translate the higher level code into low level code as faithfully as possible. Now we seem to have entered an era where compiler writing is seen as getting to the top of the leaderboard in the optimization game. Don't worry if your compiler wrecks a program that has some undefined behavior, just look at how much faster it runs. To quote Victor Yodaiken "Under pressure from, particularly Linux, the GCC compiler has introduced a number of flags to turn off UB behavior - essentially forking the language." What is the solution to the problem? Assuming that the standards people were listening, then one possible solution is to change undefined behavior into machine-defined behavior. After all a signed overflow might not be defined behavior in the abstract, but it usually is in the concrete. Writing a C program that works on any hardware isn't really the name of the game - although you can get close to it even if you have to incorporate machine dependencies. Sometimes undefined behavior is jokingly referred to as nasal demons because of a posting on comp.std.c that when the compiler finds undefined behavior it can do anything it likes including "to make demons fly our of your nose". This is silly beyond reasonable. As using an uninitialized automatic variable is undefined behavior then:
could be compiled to "format the hard disk". Compiler writers - No more nasal demons! Update:Soon after this was written Linus hit his stride with another blast against the C standards people. Perhaps it isn't just undefined behavior we have to worry about. "...So I want to see actual real arguments, not "the standard is unclear". When documented gcc behavior says one thing, and the standard might be unclear, we really don't care one whit about the lack of clarity in some standard. So what's the _real_ reason for avoiding union aliasing? There are competent people on standards bodies. But they Standards too need to be questioned."' read the rest of the blast. More InformationC Standard Undefined Behavior Versus Wittgenstein Related ArticlesC Pointer Declaration And Dereferencing Why Is C Top Language In IEEE Ranking? To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info
|
|||
Last Updated ( Wednesday, 06 June 2018 ) |