Applying C - Assembler
Written by Harry Fairhead   
Monday, 11 November 2019
Article Index
Applying C - Assembler
Rotate A Global
Adding Two Values
Rotate A Variable

Rotate a Variable

Suppose you want to implement a function that rotates a value a specified number of bit positions. You could write a loop using a ror 1 instruction, but the x64 has a variable rotate using the cl register. That is:

ror %cl,%eax

right-rotates eax the number of times specified by the value in cl.

The number of rotates has to specified in the cl register - no other register will do - but there are a number of ways of doing this.

The first is to not rely on the compiler to allocate registers but to explicitly use cl. For example:

int ror(unsigned int value, unsigned char n) {
    __asm__ (
            "mov %[n],%%cl\n\t"
            "rorl %%cl,%[value]\n\t"
            : [value] "+r" (value)
            : [n] "m" (n)
    return value;

The first instruction moves the C variable n into the cl register. The second performs the rotate using it. Notice the use of %% to make sure that we have %cl in the instructions. Also notice the way n is constrained to be a memory reference - what is the point in moving n into a register and then moving it to another register? This function produces the following assembler:

 mov    -0x4(%rbp),%eax
 mov    -0x8(%rbp),%cl
 ror    %cl,%eax
 mov    %eax,-0x4(%rbp)

Included in the book but not in this extract:

  • The Clobber List
  • Processor-Specific Constraints
  • Register Variables
  • ARM Rotate and Portable Code
  • Goto Labels
  • Using the Condition Code Register


Assembler or C

If programming low level code in C feels like fighting the compiler, then moving to assembler is more so. The compiler tries to organize things so that you get efficient code, but often at the expense of the code doing what you actually want it to. Since the first compiler removed a loop that didn't appear to be doing anything, timing is of not importance to a program after all, optimizations have been increasingly changing what programs actually do – and, of course, it is always the programmer’s fault for not writing standard code.

When you take on assembler you have the additional problem of the different dialects of assembly language and even different versions of the assembly language - x86 v x64 say. And you still have the problem of fighting the compiler to get it to integrate your code with the assembler it produces from your C code.

Then there is the question of is it worth it?

Trying to get increased performance used to be a good argument for hand-coded assembler. In this area compilers have now developed to the point where, although you might not always understand why some particular code has been generated, it nearly always runs faster. In this area human programmers have lost the war with the compiler.

So why write assembler at all?

A good question and the honest answer is that you should avoid doing it until you have no other choice. You need to be dragged, kicking and screaming, to an assembler project and it really should be the last resort. Today about the only thing that justifies the complexity of assembler is when C doesn't support the hardware adequately. In such cases you can just write a few lines of assembler to make up the deficiency. If you really do have to support something large in assembler then it is often better to write a complete standalone function which can be called as if it was a C function. If you look up how to do this, you will find lots of explanations of different calling conventions and how to implement them. The simplest solution, however, is to write a C function with the required name and parameters and a dummy body - it may also have to have some instructions to stop the compiler optimizing it away. Then compile the function with -S or equivalent for another compiler to create an assembler .s file. Use this file as your template and fill in the body of the function using as much assembler as you like. The compiler will have generated the calling convention boilerplate code for you - just use it.


  • The GCC compiler outputs assembly language code, which is then processed by an assembler to produce executable code.

  • The asm command allows you to insert your own assembly language instructions into the compiler's output.

  • The rules for writing assembly language depends on the processor you are using, the exact dialect of assembler and the assembler itself. These are all very platform-dependent.

  • Basic asm allows you to insert assembler at the top-most level and is very limited.

  • Extended asm can be used within functions and allows you to get the compiler to make connections between C variables and registers.

  • Used in the simplest way, extended asm demands that you work out your own ways of connecting to C variables. Used in extended form you can specify an input and an output list which can be used to specify which C variables should be in registers or in memory.

  • Exactly how C variables are treated depends on the constraints you supply.

  • If you use any variables or memory locations not included in the input or output list then you have to include them in the clobber list so that the compiler knows what you are doing.

  • You can also specify a list of goto labels, which correspond to labels in your C program which the assembler can transfer control directly to.

  • Avoid using assembler if at all possible. It is usually not the case that your hand-crafted assembler will be faster than the compiler-created equivalent code.

  • The only situation in which assembler is really needed is to gain access to hardware features that C ignores, such as the condition code register.


Now available as a paperback or ebook from Amazon.

Applying C For The IoT With Linux

  2. Kernel Mode, User Mode & Syscall
  3. Execution, Permissions & Systemd
    Extract Running Programs With Systemd
  4. Signals & Exceptions
    Extract  Signals
  5. Integer Arithmetic
  6. Fixed Point
    Extract Simple Fixed Point Arithmetic ***NEW
  7. Floating Point
  8. File Descriptors
  9. The Pseudo-File System
    Extract: The Pseudo File System 
  10. Graphics
    Extract: framebuffer 
  11. Sockets
  12. Threading
    Extract  Condition Variables
    Extract  Deadline Scheduling
  13. Cores Atomics & Memory Management
  14. Interupts & Polling
  15. Assembler
    Extract: Assembler 

Also see the companion book: Fundamental C





Related Articles

Remote C/C++ Development With NetBeans

Raspberry Pi And The IoT In C

Getting Started With C/C++ On The Micro:bit

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, Facebook or Linkedin.


Apache Arrow Reaches 1.0

Apache Arrow 1.0 has been released, with a better metadata version and support with dictionary indices along with improved C++ libraries.

Kite Adds Jupyter Integration

The developers of the Kite autocomplete coding assistant have launched Kite for JupyterHub and JupyterLab. Kite uses AI to automate repetitive steps in programming. It goes further than the autocomple [ ... ]

More News





or email your comment to:


Last Updated ( Monday, 11 November 2019 )