|Microsoft Making C Safe - Checked C|
|Written by Harry Fairhead|
|Wednesday, 12 September 2018|
We all know that C gives you so much freedom that it is easy to make big mistakes. Usually the call is to abandon C and adopt something more modern like Rust, but why not add to C to make it safe?
This is the aim of Checked C. It isn't the first attempt to correct the language, but it is being promoted by Microsoft Research in a new paper by Archibald Samuel Elliott, University of Washington; Andrew Ruef and Michael Hicks, University of Maryland; and David Tarditi, Microsoft Research. The language has a longer history than this might suggest and it was open sourced back in 2016.
The basic idea is to introduce new pointer types and the concept of checked and unchecked areas of the program. The two new pointer types are _Ptrtype and _Array_ptrtype - the difference is that the array pointer allows pointer arithmetic, the raw pointer doesn't. The compiler checks that the new pointers are valid when they are dereferenced.
Array pointers are bounds checked, but the compiler will remove the check if it can be deduced that the bounds cannot be exceeded.
In other improved versions of C, pointers the bounds are stored along with the pointer. In Checked C you place a bounds expression indicating where the bounds are kept. For example:
There are a number of different ways to specify array bounds and you will have to learn how best to use them and you will have to add them to existing code.
There is also a new checked array type:
int buf _Checked;
and a type that handles null terminated arrays, aka strings.
This approach is not unlike the idea of gradual type annotation used in, say, TypeScript. The "gradual" approach to improvement is also helped by the ability to mark sections of code as unchecked. In an unchecked portion of C you can do anything you like and there are no compiled in checks. As the paper says, this may not be 100% safe, but at least it highlights the unsafe portions of the code.
There are many other checks performed by Checked C to ensure that pointers are not dangerous. In particular, variables have to be initialized. There are also restrictions on taking the address of variables and stucts used in bounds.
The problem with any modification to C is that efficiency is paramount and additions tend to make things slower and bigger. At the moment, Checked C is available only as an extension to Clang/LLVM. The compiler was tested on two pointer benchmark libraries. The changes to make the code safe amounted to around 17.5% of the lines. Most of these changes were declarations, initializers and so on; not changes to the core logic. On average, just less than 10% of the code was left unsafe - surprisingly due to the use of variable-argument printf statements. The average run-time overhead was 8.6%, which is good compared to other attempts at "safe " C, but you still might have to use an unchecked section for really critical code.
The paper concludes:
We have presented Checked C, an extension to C to help ensure spatial safety. Checked C’s design is focused on interoperability with legacy C, usability, and high performance. Any part of a program may contain, and benefit from, checked pointers. Such pointers are binary-compatible with legacy, uncheck pointers but have explicitly annotated and enforced bounds. Code units annotated as checked regions provide guaranteed safety: The code within may not use unchecked pointers or unsafe casts that could result in spatial safety violations. Checked C’s bounds-safe interfaces provide checked types to unchecked code, which is useful for retrofitting third party and standard libraries. Together, these features permit incrementally adding safety to a legacy program, rather than making it an all-ornothing proposition. Our implementation of Checked C as an LLVM extension enjoys good performance, with relatively low run-time and compilation overheads. It is freely available at https://github.com/Microsoft/checkedc and continues to be actively developed.
Checked C does a lot for bounds checking, but C still suffers from fundamental problems with undefined behavior and, until this is settled in some way, it cannot be safe in the sense of a "higher" level language.
To appear in IEEE Cybersecurity Development Conference 2018
or email your comment to: firstname.lastname@example.org
|Last Updated ( Wednesday, 12 September 2018 )|