Applying C - Kernel Mode, User Mode & Syscall
Monday, 01 July 2019
Article Index
Applying C - Kernel Mode, User Mode & Syscall
Your Code In Kernel Mode

When writing C for Linux/POSIX operating systems the syscall is how your program communicates what it needs to happen. Find out how it works.  This extract is from my  book on C in an IoT context.

Now available as a paperback or ebook from Amazon.

Applying C For The IoT With Linux

  2. Kernel Mode, User Mode & Syscall
  3. Execution, Permissions & Systemd
    Extract Running Programs With Systemd
  4. Signals & Exceptions
    Extract  Signals
  5. Integer Arithmetic
  6. Fixed Point
    Extract Simple Fixed Point Arithmetic ***NEW
  7. Floating Point
  8. File Descriptors
  9. The Pseudo-File System
    Extract: The Pseudo File System 
  10. Graphics
    Extract: framebuffer 
  11. Sockets
  12. Threading
    Extract  Condition Variables
    Extract  Deadline Scheduling
  13. Cores Atomics & Memory Management
  14. Interupts & Polling
  15. Assembler
    Extract: Assembler 

Also see the companion book: Fundamental C







Although finding out about the detailed workings of Linux or Unix is unnecessary, it is worth knowing about the general structure of the system. In particular, it is a good idea to know what kernel and user mode are all about and how your programs running in user mode can interact with the kernel using system calls.

In book but omitted from this extract:

  • The Kernel

  • Multi-tasking - Context Switch

Modes and Syscall

You might wonder how the kernel can protect itself from modification by user mode programs and how it can stop user mode programs from accessing hardware directly. The answer varies according to the architecture in use but in a modern processor the protection is provided by the hardware. For example the x86 supports four protection "rings" - levels of decreasing privilege. Linux only uses two of these levels - ring 0 for the Kernel and ring 3 for user mode. ARM processors support multiple modes, but again Linux only makes use of user mode and supervisor mode.

In general things are arranged so that a user mode program cannot call code, a function say, that lives in the kernel. This is for reasons of security and stability.

So how does a user mode program communicate its needs to the Kernel?

The answer is that it has to make a system call. Exactly how this is done varies according to the architecture, but it usually involves a software interrupt or something similar.

The basic idea is that at the level of assembly language the parameters to be passed to the Kernel have to be loaded into specific registers. One of the registers holds a number which determines what the system call actually is. To make the call you usually have to use a software interrupt instruction.

For an x86 it is software interrupt 0x80, i.e. int 0x80; for ARM it is a SWI (SoftWare Interrupt) instruction, but the number varies according to the exact type. When the software interrupt is issued the hardware changes mode and the user thread starts to execute kernel code. The first thing that happens is that the syscall number is looked up in a table which holds the addresses of the functions which implement each of the actions. These syscall numbers don't change much, if at all, to make sure that code continues to work. However, new syscalls are added, and indeed if you have the time you can add custom syscalls to the kernel.

The actual syscall number, i.e. the number that determines what is to be done, is passed in a particular register. For example x86-64 uses rax. Parameters to a syscall are usually passed in particular registers - up to seven. These are best thought of as positional, i.e. arg1 to arg7 are passed in a specified set of registers - for example for the x86 arg1 is passed in rdi. Return values are similarly passed back in particular registers. In most cases there is s a single return value and an error number.

A syscall is defined by its number and a set of positional parameters. For example, write is syscall 4, i.e. it is fourth in the syscall table, and it has parameters:

1 file descriptor

2 pointer to buffer

3 number of characters

The return value is the number of characters written with -1 being an error indicator.

Glibc - Wrappers

Obviously you cannot make a syscall using C. The reason is that there is no C standard way of invoking a software interrupt or similar. This is completely machine dependent and there are even variations between processors that in principle share the same architecture - x64 and ARM for example. You could code the syscall using embedded assembler, see Chapter 15, but then you would have to change the code for each type of processor you wanted to support.

A much better solution is to use a library of wrapper functions which perform the syscall for you and adjust to the type of processor in use. This is what you have been doing since you first started to create C programs.

The library that most programmers use is glibc and it contains definitions for most of the Linux syscalls and the standard library functions you have been using. There are many header files associated with glibc including stdio.h and stdlib, which are included into most C programs, but there are many more. It is standard practice to add only the header files for the features of glibc you plan to use. One complication is that glibc isn't stored at a standard location and the library is usually called libc6 - the source is called glibc.

The complication with glibc is that it is focused on implementing POSIX and C11 rather than Linux. As a result, while it does wrap most of the Linux syscalls it misses out some that are non-standard. Notice that glibc contains more than just syscalls. For example, printf is defined in glibc and, while this does eventually call the kernel to print things, it does a lot more than just wrap the write syscall.

There are functions in declared in unistd.h for most Linux syscalls, but not all. For example, the write function which implements syscall 4, described in the last section, is declared as:

ssize_t write(int fd, const void *buf, size_t count);

The actual code that implements this loads the registers for each of the parameters in turn and then uses a software interrupt to transition to kernel mode. For example:

const char string[] = "Hello Syscall World \n";
write(1, string, 21);

prints the message on stdout.

There is a lower level wrapper function which can be used to implement any syscall. This is defined as:

long syscall(long number, ...);

the first parameter gives the syscall number and the rest of the parameter list consists of however many parameters are required.

For example the syscall to write given earlier can be written:

const char string[] = "Hello Syscall World \n";
syscall(4, 1, string, 21);

To make this work you not only need uinstd.h but also

#define _GNU_SOURCE

at the start of the program because syscall is not part of POSIX.

It is more usual not to hard code the syscall numbers but to use constants defined in syscall.h:

#include <syscall.h>
syscall(SYS_write, 1, string, 21);

Of course, if there is a wrapper function defined in glibc then it is best to use it. The problem is that glibc often takes a while to catch up with anything new in Linux and there is a general resistance to include anything that is Linux-specific. In Chapter 12 it is explained that a scheduling option that is unique to Linux has no wrapper in glibc. In this case the simplest thing to do is to use syscall to create your own wrappers:

int sched_setattr(pid_t pid, 
const struct sched_attr *attr,
unsigned int flags) { return syscall(__NR_sched_setattr, pid, attr, flags); } int sched_getattr(pid_t pid,struct sched_attr *attr,
unsigned int size, unsigned int flags){ return syscall(__NR_sched_getattr, pid, attr,
size, flags); }

Last Updated ( Monday, 01 July 2019 )