[C-prog-lang-l] signed int to unsigned char conversion on x86/Clang

Vladimír Kotal vlada at kotalovi.cz
Tue Mar 15 13:00:18 CET 2022


Hi all,

this complements my e-mail from yesterday on integer promotion. So again, if you are curious how the integer conversion works in practice on given architecture/implementation, in this case Clang on macOS/x64, see this:


The rule:
> if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
> 
*Question*: how does this work in practice ? Surely no compiler will generate a loop to actually trim the value within the limits.

On macOS with Clang:

`(lldb) list
   7   	{
   8   		if (argc != 2)
   9   			errx(1, "usage: prog <arg>");
   10  	
   11  		int arg = atoi(argv[1]);
   12  		printf("got arg = %d\n", arg);
   13  	
   14  		unsigned char c = arg;
   15  		printf("c = %hhu\n", c);
   16  	}
(lldb) b 14
Breakpoint 1: where = a.out`main + 88 at int-to-unsigned-char.c:14, address = 0x0000000100000f28
(lldb) r 384
Process 87082 launched: './a.out' (x86_64)
got arg = 384
Process 87082 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100000f28 a.out`main(argc=2, argv=0x00007ffeefbff9d0) at int-to-unsigned-char.c:14
   11  		int arg = atoi(argv[1]);
   12  		printf("got arg = %d\n", arg);
   13  	
-> 14  		unsigned char c = arg;
   15  		printf("c = %hhu\n", c);
   16  	}
Target 0: (a.out) stopped.
(lldb) disassemble 
a.out`main:
    0x100000ed0 <+0>:   pushq  %rbp
    0x100000ed1 <+1>:   movq   %rsp, %rbp
    0x100000ed4 <+4>:   subq   $0x20, %rsp
    0x100000ed8 <+8>:   movl   %edi, -0x4(%rbp)
    0x100000edb <+11>:  movq   %rsi, -0x10(%rbp)
    0x100000edf <+15>:  cmpl   $0x2, -0x4(%rbp)
    0x100000ee3 <+19>:  je     0x100000f00               ; <+48> at int-to-unsigned-char.c:11
    0x100000ee9 <+25>:  leaq   0x9e(%rip), %rsi          ; "usage: prog <arg>"
    0x100000ef0 <+32>:  xorl   %eax, %eax
    0x100000ef2 <+34>:  movb   %al, %cl
    0x100000ef4 <+36>:  movl   $0x1, %edi
    0x100000ef9 <+41>:  movb   %cl, %al
    0x100000efb <+43>:  callq  0x100000f52               ; symbol stub for: errx
    0x100000f00 <+48>:  movq   -0x10(%rbp), %rax
    0x100000f04 <+52>:  movq   0x8(%rax), %rdi
    0x100000f08 <+56>:  callq  0x100000f4c               ; symbol stub for: atoi
    0x100000f0d <+61>:  leaq   0x8c(%rip), %rdi          ; "got arg = %d\n"
    0x100000f14 <+68>:  movl   %eax, -0x14(%rbp)
    0x100000f17 <+71>:  movl   -0x14(%rbp), %esi
    0x100000f1a <+74>:  movb   $0x0, %al
    0x100000f1c <+76>:  callq  0x100000f58               ; symbol stub for: printf
    0x100000f21 <+81>:  leaq   0x86(%rip), %rdi          ; "c = %hhu\n"
->  0x100000f28 <+88>:  movl   -0x14(%rbp), %esi
    0x100000f2b <+91>:  movb   %sil, %cl
    0x100000f2e <+94>:  movb   %cl, -0x15(%rbp)
    0x100000f31 <+97>:  movzbl -0x15(%rbp), %esi
    0x100000f35 <+101>: movl   %eax, -0x1c(%rbp)
    0x100000f38 <+104>: movb   $0x0, %al
    0x100000f3a <+106>: callq  0x100000f58               ; symbol stub for: printf
    0x100000f3f <+111>: xorl   %esi, %esi
    0x100000f41 <+113>: movl   %eax, -0x20(%rbp)
    0x100000f44 <+116>: movl   %esi, %eax
    0x100000f46 <+118>: addq   $0x20, %rsp
    0x100000f4a <+122>: popq   %rbp
    0x100000f4b <+123>: retq   
(lldb) stepi
Process 87082 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
    frame #0: 0x0000000100000f2b a.out`main(argc=2, argv=0x00007ffeefbff9d0) at int-to-unsigned-char.c:14
   11  		int arg = atoi(argv[1]);
   12  		printf("got arg = %d\n", arg);
   13  	
-> 14  		unsigned char c = arg;
   15  		printf("c = %hhu\n", c);
   16  	}
Target 0: (a.out) stopped.
(lldb) register read esi
     esi = 0x00000180
(lldb) stepi
Process 87082 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
    frame #0: 0x0000000100000f2e a.out`main(argc=2, argv=0x00007ffeefbff9d0) at int-to-unsigned-char.c:14
   11  		int arg = atoi(argv[1]);
   12  		printf("got arg = %d\n", arg);
   13  	
-> 14  		unsigned char c = arg;
   15  		printf("c = %hhu\n", c);
   16  	}
Target 0: (a.out) stopped.
(lldb) register read cl
      cl = 0x80
`
This is 64-bit program, so using https://wiki.osdev.org/CPU_Registers_x86-64 , specifically:

*register name*
*meaning*
`sil`
8-bit `rsi`
`cl`
8-bit `rcx`
`al`
8-bit `rax`
`esi`
32-bit `rsi`
Looking at the disassembly, `-0x14(%rbp)` is the place where `arg` is stored (looking at how `atoi` was called and where its result in `eax` was stored to). When the integer conversion happens, it is first stored to `esi` (no change):

`movl   -0x14(%rbp), %esi
`
Then 8 lower bits of `esi` are stored into `cl` (8-bit view of `rcx`):

`movb   %sil, %cl
`
Then this value is stored into the `c` variable:

`movb   %cl, -0x15(%rbp)
`
and prior to calling `printf` to print it out it is promoted to `int` via:

`movzbl -0x15(%rbp), %esi
`
The `movzbl` instruction means "byte to int, zero-fill".

So, basically anything stored into the `int` returned from `atoi()` is truncated to the lower 8 bits. This has the same effect as the rule for repeated addition/subtraction. It is basically operation modulo 2^N.

-------------- next part --------------
HTML attachment scrubbed and removed


More information about the c-prog-lang-l mailing list