Applying C - Cores

Written by Harry Fairhead

Monday, 03 July 2023

Article Index
Applying C - Cores
Affinity

Page 2 of 2

Affinity

The operating system tries to keep threads associated with particular cores, but sometime you need to enforce this. There is no standard POSIX way of determining which core a thread will use, but there is a Linux extension of the Pthreads library that does the job. The setaffinity function:

int pthread_setaffinity_np(pthread_t thread, 
            size_t cpusetsize,
                    const cpu_set_t *cpuset);

sets the specified thread to run on one of a set of possible CPUs as specified by the cpuset – the affinity mask.

The getaffinity function will return the affinity mask of the specified thread:

int pthread_getaffinity_np(pthread_t thread, 
                           size_t cpusetsize,
                                  cpu_set_t *cpuset);

Notice the thread is specified as a Pthread id. You can also use a Linux process id if you use the alternative get and set functions defined in sched.h:

int sched_setaffinity(pid_t pid,size_t cpusetsize,
                              const cpu_set_t *mask); 
int sched_getaffinity(pid_t pid,size_t cpusetsize,
                                    cpu_set_t *mask);

In practice, the Pthreads function calls the functions defined in sched.h.

The only thing we need to know is how to set the affinity mask. This uses a single bit to control access to each of the physical and logical cores. You can’t simply set or reset these bits. You have to use the set of macros designed for the job. There are a large number of these, but the ones that you use most often are:

CPU_ZERO(& cpuset); 	set all bits to 0
CPU_SET(n,& cpuset); 
                  sets the bit corresponding to core n
CPU_CLR(n,&cpuset);

          	resets the bit corresponding to core n

How do you find out which core corresponds to which bit in the mask?

As long as your system is set up correctly you should be able to get details by reading the /proc/cpuinfo file or you could use the lstopo tool.

For example, suppose you want to run two threads on separate cores. First we need two functions to run:

volatile int j;
volatile int i;
void * threadA(void *p) {
    for (i = 0;; i++) {
    };
}
void * threadB(void *p) {
    for (j = 0;; j++) {
    };
}

These simply run a for loop with a global counter to let us know how many times the loop has been executed. The global counters have to be marked as volatile to stop the compiler optimizing the empty loops away.

To set the thread affinity we need to use the macros:

cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(1, &cpuset);

This sets the mask to core 1. Next we start the first thread and set its affinity:

pthread_t pthreadA;
pthread_create(&pthreadA, NULL, threadA, NULL);
pthread_setaffinity_np(pthreadA, sizeof (cpu_set_t), &cpuset);

The second thread is to run on core 2 so we need to change the mask and then start the thread:

CPU_ZERO(&cpuset);
CPU_SET(2, &cpuset);
pthread_t pthreadB;
pthread_create(&pthreadB, NULL, threadB, NULL);
pthread_setaffinity_np(pthreadB, sizeof (cpu_set_t),
                                              &cpuset);

Now we can let the main thread sleep for a few seconds and print the value of the counters to give an indication of how many loops each thread has performed.

The complete program is:

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <sched.h>
#include <unistd.h>
volatile int j;
volatile int i;
void * threadA(void *p) {
    for (i = 0;; i++) {
    };
}
void * threadB(void *p) {
    for (j = 0;; j++) {
    };
}
int main(int argc, char** argv) {
    cpu_set_t cpuset;
    CPU_ZERO(&cpuset);
    CPU_SET(1, &cpuset);
    pthread_t pthreadA;
    pthread_create(&pthreadA, NULL, threadA, NULL);
    pthread_setaffinity_np(pthreadA, 
                       sizeof (cpu_set_t), &cpuset);
    CPU_ZERO(&cpuset);
    CPU_SET(2, &cpuset);
    pthread_t pthreadB;
    pthread_create(&pthreadB, NULL, threadB, NULL);
    pthread_setaffinity_np(pthreadB, 
                      sizeof (cpu_set_t), &cpuset);
    sleep(5);
    printf("%d,%d", i, j);
    return (EXIT_SUCCESS);
}

If you run the program you will find that each thread executes roughly the same number of loops. Now if you set the second thread to run on the same core by changing:

 CPU_SET(2, &cpuset);

to:

 CPU_SET(1, &cpuset);

and run it again, you will discover that each thread now loops for about half the previous total. This is what you would expect as each of the two threads now only gets to run on the core for half of the total time.

If you run the same program without setting affinities you will discover that for a lightly loaded machine they will automatically be allocated to different cores and as the load goes up they will eventually share a core.

In the case of a hyperthreaded processor, placing the two threads on two processing units in the same core has the same result as running them both on one core, as neither has any voluntary idle time and so they get to share the core equally. It is instructive to try this program out after assigning different scheduling policies and priorities to the threads.

There are Linux tools that will allow you to discover what core a process is running on and change its affinity. There is also the cpuset facility which can be used to dynamically change what cores are used. However, if your goal is to allocate a single core to a single important thread then the best and simplest way of doing this is to first prohibit Linux from using the core by adding:

isolcpus=core_number

to the boot loader. You can use a comma separated list of cores not to use.

For example, to disable core 3 you would edit /etc/default/grub and change the line:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"

to:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash isolcpus=3"

You also have to use:

sudo update-grub

and reboot.

For the Raspberry Pi the Linux configuration is stored in /boot/cmdline.txt. Simply add isolcpus=3 to the end of the list and reboot. When the machine starts up, core 3 will not be used by the system. You can, however, still use thread affinity to run a user thread on core 3. There are other ways, such as cpu setc, to disable a core dynamically, but these suffer from problems such as not moving any thread that is already running.

For fixed tasks the best way to do the job is isolcpus. You can find out what cores are isolated using:

cat /sys/devices/system/cpu/isolated

The system will still occasionally interrupt a thread running on an isolated core, but the interference is much less than encountered in normal scheduling.

You can discover which cores are being used for interrupt handlers using the command:

cat /proc/interrupts

This gives you a list of interrupt numbers and the cores that have handled them. Some interrupts have names rather than numbers and these are the ones that you can’t tamper with. Isolated cores only handle interrupts that are essential – rescheduling interrupts for example.

It is sometimes possible to control which cores are used for particular interrupts – as long as they have an interrupt number and as long as they support IO-APIC, and many don’t - there are none on the Raspberry Pi for example. To discover which cores a particular interrupt can be handled by use:

cat /proc/irq/n/smp_affinity

where n is the interrupt number. This returns a bit mask with the lowest order bit corresponding to core 0. You can set the bit mask to determine which processors will handle the interrupt using:

echo m > /proc/irq/n/smp_affinity

where n is the interrupt number and m is the new mask.

For example, to have all timer interrupts, irq 17, handled by Core 0 you would use:

echo “1” > /proc/irq/17/smp_affinity

Note that if the interrupt is not IO-APIC compatible you will get a read/write error. You also have to give the entire command as root e.g. use sudo -i.

In chapter but not in this extract

Memory Barrier and Fences
Compiler Reordering and Optimization
C11 Atomics
Atomic Structs
C11 Memory Models and Barriers

Summary

Even small machines now have multi-core processors and this introduces true parallelism. You can take control of how threads are allocated to processing units but it isn’t standard.
Another problem for the multi-threaded program is the reordering of instructions by the processor or by the compiler. To control this we can use memory barriers and fences.
C11 introduces atomics which can be used to create safe programs that don’t use explicit locks. If the processor has the facilities to support atomics this can be much faster.
C11 also introduced a memory model which provides a standard way to control instruction reordering. In practice this is not much used.

Now available as a paperback or ebook from Amazon.

Applying C For The IoT With Linux

C,IoT, POSIX & LINUX
Kernel Mode, User Mode & Syscall
Execution, Permissions & Systemd
Extract Running Programs With Systemd
Signals & Exceptions
Extract Signals
Integer Arithmetic
Extract: Basic Arithmetic As Bit Operations
Extract: BCD Arithmetic
Fixed Point
Extract: Simple Fixed Point Arithmetic
Floating Point
File Descriptors
Extract: Simple File Descriptors
Extract: Pipes
The Pseudo-File System
Extract: The Pseudo File System
Extract: Memory Mapped Files
Graphics
Extract: framebuffer
Sockets
Extract: Sockets The Client
Extract: Socket Server
Threading
Extract: Pthreads
Extract: Locking ***NEW
Extract: Condition Variables
Extract: Deadline Scheduling
Cores Atomics & Memory Management
Extract: Applying C - Cores
Interupts & Polling
Extract: Interrupts & Polling
Assembler
Extract: Assembler

Also see the companion book: Fundamental C

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Kotlin And Spring - A Love Story Unfolds
17/06/2025

JetBrains has made special arrangements with Spring to facilitate the framework's better integration with the Kotlin language. What can we expect from this new partnership?

+ Full Story

Windows 11 Overtakes Windows 10 - But Not In Europe
08/07/2025

With the end of support of Windows 10 just three months away, Windows 11 has finally edged ahead of Windows 10 in terms of Desktop Windows Version Market Share on a Worldwide Basis. In Europe, h [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

<< Prev - Next

Last Updated ( Wednesday, 05 July 2023 )

Recent Articles

Recent Book Reviews

Popular Articles

Affinity

In chapter but not in this extract

Summary

Now available as a paperback or ebook from Amazon.

Comments