Applying C - Cores
Written by Harry Fairhead   
Monday, 03 July 2023
Article Index
Applying C - Cores
Affinity

Affinity

The operating system tries to keep threads associated with particular cores, but sometime you need to enforce this. There is no standard POSIX way of determining which core a thread will use, but there is a Linux extension of the Pthreads library that does the job. The setaffinity function:

int pthread_setaffinity_np(pthread_t thread, 
size_t cpusetsize, const cpu_set_t *cpuset);

sets the specified thread to run on one of a set of possible CPUs as specified by the cpuset – the affinity mask.

The getaffinity function will return the affinity mask of the specified thread:

int pthread_getaffinity_np(pthread_t thread, 
size_t cpusetsize, cpu_set_t *cpuset);

Notice the thread is specified as a Pthread id. You can also use a Linux process id if you use the alternative get and set functions defined in sched.h:

int sched_setaffinity(pid_t pid,size_t cpusetsize,
const cpu_set_t *mask); int sched_getaffinity(pid_t pid,size_t cpusetsize, cpu_set_t *mask);

In practice, the Pthreads function calls the functions defined in sched.h.

The only thing we need to know is how to set the affinity mask. This uses a single bit to control access to each of the physical and logical cores. You can’t simply set or reset these bits. You have to use the set of macros designed for the job. There are a large number of these, but the ones that you use most often are:

CPU_ZERO(& cpuset); 	set all bits to 0
CPU_SET(n,& cpuset); 
sets the bit corresponding to core n CPU_CLR(n,&cpuset);
          	resets the bit corresponding to core n

How do you find out which core corresponds to which bit in the mask?

As long as your system is set up correctly you should be able to get details by reading the /proc/cpuinfo file or you could use the lstopo tool.

For example, suppose you want to run two threads on separate cores. First we need two functions to run:

volatile int j;
volatile int i;
void * threadA(void *p) {
    for (i = 0;; i++) {
    };
}
void * threadB(void *p) {
    for (j = 0;; j++) {
    };
}

These simply run a for loop with a global counter to let us know how many times the loop has been executed. The global counters have to be marked as volatile to stop the compiler optimizing the empty loops away.

To set the thread affinity we need to use the macros:

cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(1, &cpuset);

This sets the mask to core 1. Next we start the first thread and set its affinity:

pthread_t pthreadA;
pthread_create(&pthreadA, NULL, threadA, NULL);
pthread_setaffinity_np(pthreadA, sizeof (cpu_set_t), &cpuset);

The second thread is to run on core 2 so we need to change the mask and then start the thread:

CPU_ZERO(&cpuset);
CPU_SET(2, &cpuset);
pthread_t pthreadB;
pthread_create(&pthreadB, NULL, threadB, NULL);
pthread_setaffinity_np(pthreadB, sizeof (cpu_set_t),
&cpuset);

Now we can let the main thread sleep for a few seconds and print the value of the counters to give an indication of how many loops each thread has performed.

The complete program is:

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <sched.h>
#include <unistd.h>
volatile int j;
volatile int i;
void * threadA(void *p) {
    for (i = 0;; i++) {
    };
}
void * threadB(void *p) {
    for (j = 0;; j++) {
    };
}
int main(int argc, char** argv) {
    cpu_set_t cpuset;
    CPU_ZERO(&cpuset);
    CPU_SET(1, &cpuset);
    pthread_t pthreadA;
    pthread_create(&pthreadA, NULL, threadA, NULL);
    pthread_setaffinity_np(pthreadA, 
sizeof (cpu_set_t), &cpuset); CPU_ZERO(&cpuset); CPU_SET(2, &cpuset); pthread_t pthreadB; pthread_create(&pthreadB, NULL, threadB, NULL); pthread_setaffinity_np(pthreadB,
sizeof (cpu_set_t), &cpuset); sleep(5); printf("%d,%d", i, j); return (EXIT_SUCCESS); }

If you run the program you will find that each thread executes roughly the same number of loops. Now if you set the second thread to run on the same core by changing:

 CPU_SET(2, &cpuset);

to:

 CPU_SET(1, &cpuset);

and run it again, you will discover that each thread now loops for about half the previous total. This is what you would expect as each of the two threads now only gets to run on the core for half of the total time.

If you run the same program without setting affinities you will discover that for a lightly loaded machine they will automatically be allocated to different cores and as the load goes up they will eventually share a core.

In the case of a hyperthreaded processor, placing the two threads on two processing units in the same core has the same result as running them both on one core, as neither has any voluntary idle time and so they get to share the core equally. It is instructive to try this program out after assigning different scheduling policies and priorities to the threads.

There are Linux tools that will allow you to discover what core a process is running on and change its affinity. There is also the cpuset facility which can be used to dynamically change what cores are used. However, if your goal is to allocate a single core to a single important thread then the best and simplest way of doing this is to first prohibit Linux from using the core by adding:

isolcpus=core_number

to the boot loader. You can use a comma separated list of cores not to use.

For example, to disable core 3 you would edit /etc/default/grub and change the line:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"

to:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash isolcpus=3" 

You also have to use:

sudo update-grub 

and reboot.

For the Raspberry Pi the Linux configuration is stored in /boot/cmdline.txt. Simply add isolcpus=3 to the end of the list and reboot. When the machine starts up, core 3 will not be used by the system. You can, however, still use thread affinity to run a user thread on core 3. There are other ways, such as cpu setc, to disable a core dynamically, but these suffer from problems such as not moving any thread that is already running.

For fixed tasks the best way to do the job is isolcpus. You can find out what cores are isolated using:

cat /sys/devices/system/cpu/isolated

The system will still occasionally interrupt a thread running on an isolated core, but the interference is much less than encountered in normal scheduling.

You can discover which cores are being used for interrupt handlers using the command:

cat /proc/interrupts 

This gives you a list of interrupt numbers and the cores that have handled them. Some interrupts have names rather than numbers and these are the ones that you can’t tamper with. Isolated cores only handle interrupts that are essential – rescheduling interrupts for example.

It is sometimes possible to control which cores are used for particular interrupts – as long as they have an interrupt number and as long as they support IO-APIC, and many don’t - there are none on the Raspberry Pi for example. To discover which cores a particular interrupt can be handled by use:

cat /proc/irq/n/smp_affinity 

where n is the interrupt number. This returns a bit mask with the lowest order bit corresponding to core 0. You can set the bit mask to determine which processors will handle the interrupt using:

echo m > /proc/irq/n/smp_affinity 

where n is the interrupt number and m is the new mask.

For example, to have all timer interrupts, irq 17, handled by Core 0 you would use:

echo “1” > /proc/irq/17/smp_affinity 

Note that if the interrupt is not IO-APIC compatible you will get a read/write error. You also have to give the entire command as root e.g. use sudo -i.

In chapter but not in this extract

  • Memory Barrier and Fences
  • Compiler Reordering and Optimization
  • C11 Atomics
  • Atomic Structs
  • C11 Memory Models and Barriers

Summary

  • Even small machines now have multi-core processors and this introduces true parallelism. You can take control of how threads are allocated to processing units but it isn’t standard.

  • Another problem for the multi-threaded program is the reordering of instructions by the processor or by the compiler. To control this we can use memory barriers and fences.

  • C11 introduces atomics which can be used to create safe programs that don’t use explicit locks. If the processor has the facilities to support atomics this can be much faster.

  • C11 also introduced a memory model which provides a standard way to control instruction reordering. In practice this is not much used.

Now available as a paperback or ebook from Amazon.

Applying C For The IoT With Linux

  1. C,IoT, POSIX & LINUX
  2. Kernel Mode, User Mode & Syscall
  3. Execution, Permissions & Systemd
    Extract Running Programs With Systemd
  4. Signals & Exceptions
    Extract  Signals
  5. Integer Arithmetic
    Extract: Basic Arithmetic As Bit Operations
    Extract: BCD Arithmetic  ***NEW
  6. Fixed Point
    Extract: Simple Fixed Point Arithmetic
  7. Floating Point 
  8. File Descriptors
    Extract: Simple File Descriptors 
    Extract: Pipes 
  9. The Pseudo-File System
    Extract: The Pseudo File System
    Extract: Memory Mapped Files 
  10. Graphics
    Extract: framebuffer
  11. Sockets
    Extract: Sockets The Client
    Extract: Socket Server
  12. Threading
    Extract:  Pthreads
    Extract:  Condition Variables
    Extract:  Deadline Scheduling
  13. Cores Atomics & Memory Management
    Extract: Applying C - Cores 
  14. Interupts & Polling
    Extract: Interrupts & Polling 
  15. Assembler
    Extract: Assembler

Also see the companion book: Fundamental C

<ASIN:1871962609>

<ASIN:1871962617>

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Edera Releases Open Source Container Benchmark And Scanner
07/11/2024

Edera has released Am I Isolated, an open source container security benchmark that probes users runtime environments and tests for container isolation.



DuckDB And Hydra Partner To Get DuckDB Into PostgreSQL
11/11/2024

The offspring of that partnership is pg_duckdb, an extension that embeds the DuckDB engine into the PostgreSQL database, allowing it to handle analytical workloads.


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info



Last Updated ( Wednesday, 05 July 2023 )