问题描述
我有一些模板密集的 C++ 代码,我想确保编译器尽可能地优化,因为它在编译时拥有大量信息.为了评估它的性能,我决定看一看它生成的目标文件的反汇编.下面是我从 objdump -dC
得到的片段:
I have some template-heavy C++ code that I want to ensure the compiler optimizes as much as possible due to the large amount of information it has at compile time. To evaluate its performance, I decided to take a look at the disassembly of the object file that it generates. Below is a snippet of what I got from objdump -dC
:
0000000000000000 <bar<foo, 0u>::get(bool)>:
0: 41 57 push %r15
2: 49 89 f7 mov %rsi,%r15
5: 41 56 push %r14
7: 41 55 push %r13
9: 41 54 push %r12
b: 55 push %rbp
c: 53 push %rbx
d: 48 81 ec 68 02 00 00 sub $0x268,%rsp
14: 48 89 7c 24 10 mov %rdi,0x10(%rsp)
19: 48 89 f7 mov %rsi,%rdi
1c: 89 54 24 1c mov %edx,0x1c(%rsp)
20: e8 00 00 00 00 callq 25 <bar<foo, 0u>::get(bool)+0x25>
25: 84 c0 test %al,%al
27: 0f 85 eb 00 00 00 jne 118 <bar<foo, 0u>::get(bool)+0x118>
2d: 48 c7 44 24 08 00 00 movq $0x0,0x8(%rsp)
34: 00 00
36: 4c 89 ff mov %r15,%rdi
39: 4d 8d b7 30 01 00 00 lea 0x130(%r15),%r14
40: e8 00 00 00 00 callq 45 <bar<foo, 0u>::get(bool)+0x45>
45: 84 c0 test %al,%al
47: 88 44 24 1b mov %al,0x1b(%rsp)
4b: 0f 85 ef 00 00 00 jne 140 <bar<foo, 0u>::get(bool)+0x140>
51: 80 7c 24 1c 00 cmpb $0x0,0x1c(%rsp)
56: 0f 85 24 03 00 00 jne 380 <bar<foo, 0u>::get(bool)+0x380>
5c: 48 8b 44 24 10 mov 0x10(%rsp),%rax
61: c6 00 00 movb $0x0,(%rax)
64: 80 7c 24 1b 00 cmpb $0x0,0x1b(%rsp)
69: 75 25 jne 90 <bar<foo, 0u>::get(bool)+0x90>
6b: 48 8b 74 24 10 mov 0x10(%rsp),%rsi
70: 4c 89 ff mov %r15,%rdi
73: e8 00 00 00 00 callq 78 <bar<foo, 0u>::get(bool)+0x78>
78: 48 8b 44 24 10 mov 0x10(%rsp),%rax
7d: 48 81 c4 68 02 00 00 add $0x268,%rsp
84: 5b pop %rbx
85: 5d pop %rbp
86: 41 5c pop %r12
88: 41 5d pop %r13
8a: 41 5e pop %r14
8c: 41 5f pop %r15
8e: c3 retq
8f: 90 nop
90: 4c 89 f7 mov %r14,%rdi
93: e8 00 00 00 00 callq 98 <bar<foo, 0u>::get(bool)+0x98>
98: 83 f8 04 cmp $0x4,%eax
9b: 74 f3 je 90 <bar<foo, 0u>::get(bool)+0x90>
9d: 85 c0 test %eax,%eax
9f: 0f 85 e4 08 00 00 jne 989 <bar<foo, 0u>::get(bool)+0x989>
a5: 49 83 87 b0 01 00 00 addq $0x1,0x1b0(%r15)
ac: 01
ad: 49 8d 9f 58 01 00 00 lea 0x158(%r15),%rbx
b4: 48 89 df mov %rbx,%rdi
b7: e8 00 00 00 00 callq bc <bar<foo, 0u>::get(bool)+0xbc>
bc: 49 8d bf 80 01 00 00 lea 0x180(%r15),%rdi
c3: e8 00 00 00 00 callq c8 <bar<foo, 0u>::get(bool)+0xc8>
c8: 48 89 df mov %rbx,%rdi
cb: e8 00 00 00 00 callq d0 <bar<foo, 0u>::get(bool)+0xd0>
d0: 4c 89 f7 mov %r14,%rdi
d3: e8 00 00 00 00 callq d8 <bar<foo, 0u>::get(bool)+0xd8>
d8: 83 f8 04 cmp $0x4,%eax
这个特定函数的反汇编还在继续,但我注意到的一件事是相对大量的 call
指令,如下所示:
The disassembly of this particular function continues on, but one thing I noticed is the relatively large number of call
instructions like this one:
20: e8 00 00 00 00 callq 25 <bar<foo, 0u>::get(bool)+0x25>
这些指令,总是带有操作码 e8 00 00 00 00
,在生成的代码中频繁出现,据我所知,只不过是无操作;他们似乎都只是通过下一个指令.这就引出了一个问题,那么,是否有充分的理由生成所有这些指令?
These instructions, always with the opcode e8 00 00 00 00
, occur frequently throughout the generated code, and from what I can tell, are nothing more than no-ops; they all seem to just fall through to the next instruction. This begs the question, then, is there a good reason why all these instructions are generated?
我担心生成代码的指令缓存占用空间,因此在整个函数中多次浪费 5 个字节似乎适得其反.nop
似乎有点重量级,除非编译器试图保留某种内存对齐或其他东西.如果是这种情况,我不会感到惊讶.
I'm concerned about the instruction cache footprint of the generated code, so wasting 5 bytes many times throughout a function seems counterproductive. It seems a bit heavyweight for a nop
, unless the compiler is trying to preserve some kind of memory alignment or something. I wouldn't be surprised if this were the case.
我使用 -O3 -fomit-frame-pointer
使用 g++ 4.8.5 编译了我的代码.就其价值而言,我看到使用 clang 3.7 生成类似的代码.
I compiled my code using g++ 4.8.5 using -O3 -fomit-frame-pointer
. For what it's worth, I saw similar code generation using clang 3.7.
推荐答案
e8 00 00 00 00
中的 00 00 00 00
(相对)目标地址旨在由链接器填写.这并不意味着呼叫失败.这只是意味着您正在反汇编尚未链接的目标文件.
The 00 00 00 00
(relative) target address in e8 00 00 00 00
is intended to be filled in by the linker. It doesn't mean that the call falls through. It just means you are disassembling an object file that has not been linked yet.
此外,对下一条指令的调用,如果那是链接阶段之后的最终结果,则不会是空操作,因为它改变了堆栈(某种暗示,这不是您的案件).
Also, a call to the next instruction, if that was the end result after the link phase, would not be a no-op, because it changes the stack (a certain hint that this is not what is going on in your case).
这篇关于我的 x86 目标文件中这些看似无用的 callq 指令是做什么用的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!