Applying C - Assembler
Written by Harry Fairhead   
Monday, 11 November 2019
Article Index
Applying C - Assembler
Rotate A Global
Adding Two Values
Rotate A Variable

Adding Two Values

As an example, let's implement a function that adds two numbers together:

int sum(int myA, int myB){

    int myC;
      __asm__ (           
                "addl %[myA],%[myB]\n\t"
                "movl %[myB],%[myC] \n\t"
                :[myC] "=r" (myC)
                :[myA] "r" (myA),
                 [myB] "r"  (myB)
                        
      );    
    return myC;
}

You can see that the C variable myC is an output operand and this means that its current value is not loaded into a register for use by the program. The two input operands are loaded into registers. This makes it possible to write:

"addl %[myA],%[myB]\n\t"

and the compiler substitutes the registers it selects for myA and myB. The final instruction:

"movl %[myB],%[myC] \n\t"

stores the value in myB into the register assigned to myC. The compiler then generates the extra instruction to transfer the value in the register into the C variable myC.

You might be surprised at how short the assembler is. Extended asm does all of the work in connecting the C variables and registers for you and you don't have to push and pop any registers as the compiler makes sure that they aren't being used for anything else.

The code generated on an x86 Linux system is:

 mov    -0x14(%rbp),%eax
 mov    -0x18(%rbp),%edx
 add    %eax,%edx
 mov    %edx,%eax
 mov    %eax,-0x4(%rbp)

You can see that the compiler has picked the eax and edx registers to use to hold the input operands. It also reuses the eax register as the one to stand in for the myC variable. At the end of the program the compiler has also generated an instruction to transfer the eax register to the C variable myC.

This is a little inefficient as the result is moved to a different register and then stored back in memory. It would be much simpler to store the original register to memory and this can be done using an output operand that is also set to be an input operand:

int sum(int myA, int myB){
      __asm__ (           
                "addl %[myA],%[myB]\n\t"                
                :[myB] "+r" (myB)
                :[myA] "r"  (myA)                       
      );    
    return myB;
}

Notice that now myB is an output and an input operand due to the use of the +. This means that the compiler now generates an instruction that loads a register from myB and at the ends stores that register back into myB. The generated code is:

 mov    -0x4(%rbp),%edx
 mov    -0x8(%rbp),%eax
 add    %edx,%eax
 mov    %eax,-0x8(%rbp)

which you can see does exactly what you would expect and uses only two registers.

What if we ask the compiler to use a memory reference for myB:

int sum(int myA, int myB){
      __asm__ (           
                "addl %[myA],%[myB]\n\t"                
                :[myB] "+m" (myB)
                :[myA] "r"  (myA)                       
      );    
    return myB;
}

The generated code is now:

mov    -0x4(%rbp),%eax
add    %eax,-0x8(%rbp)

which as you can see now directly addresses myB rather than using a register.

If you change the memory constraint on myA to m then you will simply get an error as there is no add instruction that works with two memory references.

 

If you use the rm constraint on both variables and let the compiler pick how to implement the instruction the result is:

 mov    -0x8(%rbp),%eax
 add    -0x4(%rbp),%eax
 mov    %eax,-0x8(%rbp)

which of course might change according to the version of the compiler in use.

You can see that how you specify constraints really does alter the nature of the assembly language code the compiler generates.

The same example implemented in ARM assembly language is:

int sum(int myA, int myB) {
    __asm__ (
            "add %[myB],%[myA],%[myB] \n\t"
            : [myB] "+r" (myB)
            : [myA] "r" (myA)
            );
    return myB;
}

Notice that it is considerably simpler than implementing all of the variable-to-register and register-to-variable transfers yourself. The generated code is also simpler because the compiler selects registers that it doesn't have to save and restore and it knows the offsets of the variables in memory.

 ldr	r2, [r11, #-8]
 ldr	r3, [r11, #-12]
 add	r3, r2, r3
 str	r3, [r11, #-12]


Last Updated ( Monday, 11 November 2019 )