Microsoft Making C Safe - Checked C
Written by Harry Fairhead   
Wednesday, 12 September 2018

We all know that C gives you so much freedom that it is easy to make big mistakes. Usually the call is to abandon C and adopt something more modern like Rust, but why not add to C to make it safe?

Microsoft has done it with JavaScript, so why not C? TypeScript has JavaScript as a subset, but it is arguably a better language. Why not do the same for C and rescue the language and the many programs that have been written in it?

operators

This is the aim of Checked C. It isn't the first attempt to correct the language, but it is being promoted by Microsoft Research in a new paper by Archibald Samuel Elliott, University of Washington; Andrew Ruef and Michael Hicks, University of Maryland; and David Tarditi, Microsoft Research. The language has a longer history than this might suggest and it was open sourced back in 2016.

The basic idea is to introduce new pointer types and the concept of checked and unchecked areas of the program. The two new pointer types are _Ptrtype and _Array_ptrtype - the difference is that the array pointer allows pointer arithmetic, the raw pointer doesn't. The compiler checks that the new pointers are valid when they are dereferenced.

Array pointers are bounds checked, but the compiler will remove the check if it can be deduced that the bounds cannot be exceeded.

In other improved versions of C, pointers the bounds are stored along with the pointer. In Checked C you place a bounds expression indicating where the bounds are kept. For example:

void append(
_Array_ptr<char> dst : count(dst_count),
_Array_ptr<char> src : count(src_count),
size_t dst_count, size_t src_count)
{
_Dynamic_check(src_count <= dst_count);
for (size_t i = 0; i < src_count; i++) {
if (src[i] == ’\0’) {
break;
}
dst[i] = src[i];
}
}

There are a number of different ways to specify array bounds and you will have to learn how best to use them and you will have to add them to existing code.

There is also a new checked array type:

int buf _Checked[10];

and a type that handles null terminated arrays, aka strings.

This approach is not unlike the idea of gradual type annotation used in, say, TypeScript. The "gradual" approach to improvement is also helped by the ability to mark sections of code as unchecked. In an unchecked portion of C you can do anything you like and there are no compiled in checks. As the paper says, this may not be 100% safe, but at least it highlights the unsafe portions of the code. 

There are many other checks performed by Checked C to ensure that pointers are not dangerous. In particular, variables have to be initialized. There are also restrictions on taking the address of variables and stucts used in bounds.

The problem with any modification to C is that efficiency is paramount and additions tend to make things slower and bigger. At the moment, Checked C is available only as an extension to Clang/LLVM. The compiler was tested on two pointer benchmark libraries. The changes to make the code safe amounted to around 17.5% of the lines. Most of these changes were declarations, initializers and so on; not changes to the core logic. On average, just less than 10% of the code was left unsafe - surprisingly due to the use of variable-argument printf statements. The average run-time overhead was 8.6%, which is good compared to other attempts at "safe " C, but you still might have to use an unchecked section for really critical code.

The paper concludes:

We have presented Checked C, an extension to C to help ensure spatial safety. Checked C’s design is focused on interoperability with legacy C, usability, and high performance. Any part of a program may contain, and benefit from, checked pointers. Such pointers are binary-compatible with legacy, uncheck pointers but have explicitly annotated and enforced bounds. Code units annotated as checked regions provide guaranteed safety: The code within may not use unchecked pointers or unsafe casts that could result in spatial safety violations. Checked C’s bounds-safe interfaces provide checked types to unchecked code, which is useful for retrofitting third party and standard libraries. Together, these features permit incrementally adding safety to a legacy program, rather than making it an all-ornothing proposition. Our implementation of Checked C as an LLVM extension enjoys good performance, with relatively low run-time and compilation overheads. It is freely available at https://github.com/Microsoft/checkedc and continues to be actively developed.

Checked C does a lot for bounds checking, but C still suffers from fundamental problems with undefined behavior and, until this is settled in some way, it cannot be safe in the sense of a "higher" level language.

 operators

More Information

Checked C: Making C Safe by Extension

To appear in IEEE Cybersecurity Development Conference 2018

Related Articles

Fundamental C - Pointers, Cast & Type Punning

C Undefined Behavior - Depressing and Terrifying (Updated)

Microsoft Open Sources Checked C

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

 

Banner


Rare Computer History Memorabilia Being Auctioned By Bonhams
23/10/2024

Invitations handwritten and signed by Charles Babbage, seminal papers by  Alan Turing and Claude Shannon, a "Blue Box" phone hacking device, a prototype Apple Macintosh and an Apple Lisa 2/10 are [ ... ]



Apache Lucene Improves Sparce Indexing
22/10/2024

Apache Lucene 10 has been released. The updated version adds a new IndexInput prefetch API, support for sparse indexing on doc values, and upgraded Snowball dictionaries resulting in improved tokeniza [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 12 September 2018 )