[C-prog-lang-l] signed int to unsigned char conversion on x86/Clang
Vladimír Kotal
vlada at kotalovi.cz
Tue Mar 15 13:00:18 CET 2022
Hi all,
this complements my e-mail from yesterday on integer promotion. So again, if you are curious how the integer conversion works in practice on given architecture/implementation, in this case Clang on macOS/x64, see this:
The rule:
> if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
>
*Question*: how does this work in practice ? Surely no compiler will generate a loop to actually trim the value within the limits.
On macOS with Clang:
`(lldb) list
7 {
8 if (argc != 2)
9 errx(1, "usage: prog <arg>");
10
11 int arg = atoi(argv[1]);
12 printf("got arg = %d\n", arg);
13
14 unsigned char c = arg;
15 printf("c = %hhu\n", c);
16 }
(lldb) b 14
Breakpoint 1: where = a.out`main + 88 at int-to-unsigned-char.c:14, address = 0x0000000100000f28
(lldb) r 384
Process 87082 launched: './a.out' (x86_64)
got arg = 384
Process 87082 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000100000f28 a.out`main(argc=2, argv=0x00007ffeefbff9d0) at int-to-unsigned-char.c:14
11 int arg = atoi(argv[1]);
12 printf("got arg = %d\n", arg);
13
-> 14 unsigned char c = arg;
15 printf("c = %hhu\n", c);
16 }
Target 0: (a.out) stopped.
(lldb) disassemble
a.out`main:
0x100000ed0 <+0>: pushq %rbp
0x100000ed1 <+1>: movq %rsp, %rbp
0x100000ed4 <+4>: subq $0x20, %rsp
0x100000ed8 <+8>: movl %edi, -0x4(%rbp)
0x100000edb <+11>: movq %rsi, -0x10(%rbp)
0x100000edf <+15>: cmpl $0x2, -0x4(%rbp)
0x100000ee3 <+19>: je 0x100000f00 ; <+48> at int-to-unsigned-char.c:11
0x100000ee9 <+25>: leaq 0x9e(%rip), %rsi ; "usage: prog <arg>"
0x100000ef0 <+32>: xorl %eax, %eax
0x100000ef2 <+34>: movb %al, %cl
0x100000ef4 <+36>: movl $0x1, %edi
0x100000ef9 <+41>: movb %cl, %al
0x100000efb <+43>: callq 0x100000f52 ; symbol stub for: errx
0x100000f00 <+48>: movq -0x10(%rbp), %rax
0x100000f04 <+52>: movq 0x8(%rax), %rdi
0x100000f08 <+56>: callq 0x100000f4c ; symbol stub for: atoi
0x100000f0d <+61>: leaq 0x8c(%rip), %rdi ; "got arg = %d\n"
0x100000f14 <+68>: movl %eax, -0x14(%rbp)
0x100000f17 <+71>: movl -0x14(%rbp), %esi
0x100000f1a <+74>: movb $0x0, %al
0x100000f1c <+76>: callq 0x100000f58 ; symbol stub for: printf
0x100000f21 <+81>: leaq 0x86(%rip), %rdi ; "c = %hhu\n"
-> 0x100000f28 <+88>: movl -0x14(%rbp), %esi
0x100000f2b <+91>: movb %sil, %cl
0x100000f2e <+94>: movb %cl, -0x15(%rbp)
0x100000f31 <+97>: movzbl -0x15(%rbp), %esi
0x100000f35 <+101>: movl %eax, -0x1c(%rbp)
0x100000f38 <+104>: movb $0x0, %al
0x100000f3a <+106>: callq 0x100000f58 ; symbol stub for: printf
0x100000f3f <+111>: xorl %esi, %esi
0x100000f41 <+113>: movl %eax, -0x20(%rbp)
0x100000f44 <+116>: movl %esi, %eax
0x100000f46 <+118>: addq $0x20, %rsp
0x100000f4a <+122>: popq %rbp
0x100000f4b <+123>: retq
(lldb) stepi
Process 87082 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x0000000100000f2b a.out`main(argc=2, argv=0x00007ffeefbff9d0) at int-to-unsigned-char.c:14
11 int arg = atoi(argv[1]);
12 printf("got arg = %d\n", arg);
13
-> 14 unsigned char c = arg;
15 printf("c = %hhu\n", c);
16 }
Target 0: (a.out) stopped.
(lldb) register read esi
esi = 0x00000180
(lldb) stepi
Process 87082 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x0000000100000f2e a.out`main(argc=2, argv=0x00007ffeefbff9d0) at int-to-unsigned-char.c:14
11 int arg = atoi(argv[1]);
12 printf("got arg = %d\n", arg);
13
-> 14 unsigned char c = arg;
15 printf("c = %hhu\n", c);
16 }
Target 0: (a.out) stopped.
(lldb) register read cl
cl = 0x80
`
This is 64-bit program, so using https://wiki.osdev.org/CPU_Registers_x86-64 , specifically:
*register name*
*meaning*
`sil`
8-bit `rsi`
`cl`
8-bit `rcx`
`al`
8-bit `rax`
`esi`
32-bit `rsi`
Looking at the disassembly, `-0x14(%rbp)` is the place where `arg` is stored (looking at how `atoi` was called and where its result in `eax` was stored to). When the integer conversion happens, it is first stored to `esi` (no change):
`movl -0x14(%rbp), %esi
`
Then 8 lower bits of `esi` are stored into `cl` (8-bit view of `rcx`):
`movb %sil, %cl
`
Then this value is stored into the `c` variable:
`movb %cl, -0x15(%rbp)
`
and prior to calling `printf` to print it out it is promoted to `int` via:
`movzbl -0x15(%rbp), %esi
`
The `movzbl` instruction means "byte to int, zero-fill".
So, basically anything stored into the `int` returned from `atoi()` is truncated to the lower 8 bits. This has the same effect as the rule for repeated addition/subtraction. It is basically operation modulo 2^N.
-------------- next part --------------
HTML attachment scrubbed and removed
More information about the c-prog-lang-l
mailing list