我有一些模板很重的C代码,我想确保编译器尽可能的优化,因为它在编译时有大量的信息.为了评估其性能,我决定看看它生成的对象文件的反汇编.下面是我从objdump -dC获得的代码段:
0000000000000000 <bar<foo,0u>::get(bool)>: 0: 41 57 push %r15 2: 49 89 f7 mov %rsi,%r15 5: 41 56 push %r14 7: 41 55 push %r13 9: 41 54 push %r12 b: 55 push %rbp c: 53 push %rbx d: 48 81 ec 68 02 00 00 sub $0x268,%rsp 14: 48 89 7c 24 10 mov %rdi,0x10(%rsp) 19: 48 89 f7 mov %rsi,%rdi 1c: 89 54 24 1c mov %edx,0x1c(%rsp) 20: e8 00 00 00 00 callq 25 <bar<foo,0u>::get(bool)+0x25> 25: 84 c0 test %al,%al 27: 0f 85 eb 00 00 00 jne 118 <bar<foo,0u>::get(bool)+0x118> 2d: 48 c7 44 24 08 00 00 movq $0x0,0x8(%rsp) 34: 00 00 36: 4c 89 ff mov %r15,%rdi 39: 4d 8d b7 30 01 00 00 lea 0x130(%r15),%r14 40: e8 00 00 00 00 callq 45 <bar<foo,0u>::get(bool)+0x45> 45: 84 c0 test %al,%al 47: 88 44 24 1b mov %al,0x1b(%rsp) 4b: 0f 85 ef 00 00 00 jne 140 <bar<foo,0u>::get(bool)+0x140> 51: 80 7c 24 1c 00 cmpb $0x0,0x1c(%rsp) 56: 0f 85 24 03 00 00 jne 380 <bar<foo,0u>::get(bool)+0x380> 5c: 48 8b 44 24 10 mov 0x10(%rsp),%rax 61: c6 00 00 movb $0x0,(%rax) 64: 80 7c 24 1b 00 cmpb $0x0,0x1b(%rsp) 69: 75 25 jne 90 <bar<foo,0u>::get(bool)+0x90> 6b: 48 8b 74 24 10 mov 0x10(%rsp),%rsi 70: 4c 89 ff mov %r15,%rdi 73: e8 00 00 00 00 callq 78 <bar<foo,0u>::get(bool)+0x78> 78: 48 8b 44 24 10 mov 0x10(%rsp),%rax 7d: 48 81 c4 68 02 00 00 add $0x268,%rsp 84: 5b pop %rbx 85: 5d pop %rbp 86: 41 5c pop %r12 88: 41 5d pop %r13 8a: 41 5e pop %r14 8c: 41 5f pop %r15 8e: c3 retq 8f: 90 nop 90: 4c 89 f7 mov %r14,%rdi 93: e8 00 00 00 00 callq 98 <bar<foo,0u>::get(bool)+0x98> 98: 83 f8 04 cmp $0x4,%eax 9b: 74 f3 je 90 <bar<foo,0u>::get(bool)+0x90> 9d: 85 c0 test %eax,%eax 9f: 0f 85 e4 08 00 00 jne 989 <bar<foo,0u>::get(bool)+0x989> a5: 49 83 87 b0 01 00 00 addq $0x1,0x1b0(%r15) ac: 01 ad: 49 8d 9f 58 01 00 00 lea 0x158(%r15),%rbx b4: 48 89 df mov %rbx,%rdi b7: e8 00 00 00 00 callq bc <bar<foo,0u>::get(bool)+0xbc> bc: 49 8d bf 80 01 00 00 lea 0x180(%r15),%rdi c3: e8 00 00 00 00 callq c8 <bar<foo,0u>::get(bool)+0xc8> c8: 48 89 df mov %rbx,%rdi cb: e8 00 00 00 00 callq d0 <bar<foo,0u>::get(bool)+0xd0> d0: 4c 89 f7 mov %r14,%rdi d3: e8 00 00 00 00 callq d8 <bar<foo,0u>::get(bool)+0xd8> d8: 83 f8 04 cmp $0x4,%eax
这个特定功能的反汇编继续进行,但我注意到的一个问题就是这样一个比较大的调用指令:
20: e8 00 00 00 00 callq 25 <bar<foo,0u>::get(bool)+0x25>
这些指令始终与操作码e8 00 00 00 00在整个生成的代码中频繁出现,从我可以看出的只是无操作;他们似乎只是陷入下一个指令.这就提出了这个问题,那么为什么要生成这些指令呢?
我关心生成的代码的指令缓存占用空间,因此在整个函数中浪费5个字节多次,似乎适得其反.对于nop来说似乎有些重量级,除非编译器试图保留某种内存对齐方式或某种东西.如果是这样,我不会感到惊讶.
我编译我的代码使用g 4.8.5使用-O3 -fomit-frame-pointer.为什么值得,我看到类似的代码生成使用clang 3.7.