Linux GCC 的 __builtin_expect 在 if else 语句中的优势是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7346929/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the advantage of GCC's __builtin_expect in if else statements?
提问by kingsmasher1
I came across a #define
in which they use __builtin_expect
.
我遇到了#define
他们使用__builtin_expect
.
The documentationsays:
文档说:
Built-in Function:
long __builtin_expect (long exp, long c)
You may use
__builtin_expect
to provide the compiler with branch prediction information. In general, you should prefer to use actual profile feedback for this (-fprofile-arcs
), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.The return value is the value of
exp
, which should be an integral expression. The semantics of the built-in are that it is expected thatexp == c
. For example:if (__builtin_expect (x, 0)) foo ();
would indicate that we do not expect to call
foo
, since we expectx
to be zero.
内置功能:
long __builtin_expect (long exp, long c)
您可以使用
__builtin_expect
向编译器提供分支预测信息。一般而言,您应该更喜欢使用实际的配置文件反馈 (-fprofile-arcs
),因为程序员在预测他们的程序实际执行情况方面是出了名的糟糕。但是,有些应用程序很难收集这些数据。返回值是 的值
exp
,应该是一个整数表达式。内置的语义是预期exp == c
. 例如:if (__builtin_expect (x, 0)) foo ();
表示我们不期望调用
foo
,因为我们期望x
为零。
So why not directly use:
那么为什么不直接使用:
if (x)
foo ();
instead of the complicated syntax with __builtin_expect
?
而不是复杂的语法__builtin_expect
?
采纳答案by Blagovest Buyukliev
Imagine the assembly code that would be generated from:
想象一下将生成的汇编代码:
if (__builtin_expect(x, 0)) {
foo();
...
} else {
bar();
...
}
I guess it should be something like:
我想它应该是这样的:
cmp $x, 0
jne _foo
_bar:
call bar
...
jmp after_if
_foo:
call foo
...
after_if:
You can see that the instructions are arranged in such an order that the bar
case precedes the foo
case (as opposed to the C code). This can utilise the CPU pipeline better, since a jump thrashes the already fetched instructions.
您可以看到指令的排列顺序是bar
大小写在大小写之前foo
(与 C 代码相反)。这可以更好地利用 CPU 流水线,因为跳转会破坏已经获取的指令。
Before the jump is executed, the instructions below it (the bar
case) are pushed to the pipeline. Since the foo
case is unlikely, jumping too is unlikely, hence thrashing the pipeline is unlikely.
在执行跳转之前,它下面的指令(bar
case)被推送到管道中。由于这种foo
情况不太可能发生,因此跳跃也不太可能,因此不太可能颠簸管道。
回答by Kerrek SB
Well, as it says in the description, the first version adds a predictive element to the construction, telling the compiler that the x == 0
branch is the more likely one - that is, it's the branch that will be taken more often by your program.
嗯,正如描述中所说,第一个版本在构造中添加了一个预测元素,告诉编译器x == 0
分支是更有可能的分支 - 也就是说,它是您的程序将更经常采用的分支。
With that in mind, the compiler can optimize the conditional so that it requires the least amount of work when the expected condition holds, at the expense of maybe having to do more work in case of the unexpected condition.
考虑到这一点,编译器可以优化条件,以便在预期条件成立时需要最少的工作,代价是在出现意外情况时可能不得不做更多的工作。
Take a look at how conditionals are implemented during the compilation phase, and also in the resulting assembly, to see how one branch may be less work than the other.
查看编译阶段以及生成的程序集中如何实现条件,以了解一个分支的工作量可能比另一个少。
However, I would only expect this optimization to have noticeable effect if the conditional in question is part of a tight inner loop that gets called a lot, since the difference in the resulting code is relatively small. And if you optimize it the wrong way round, you may well decrease your performance.
但是,如果所讨论的条件是被大量调用的紧密内部循环的一部分,我只会期望这种优化具有明显的效果,因为结果代码中的差异相对较小。如果你以错误的方式优化它,你很可能会降低你的性能。
回答by Michael Kohne
The idea of __builtin_expect
is to tell the compiler that you'll usually find that the expression evaluates to c, so that the compiler can optimize for that case.
的想法__builtin_expect
是告诉编译器您通常会发现表达式的计算结果为 c,以便编译器可以针对这种情况进行优化。
I'd guess that someone thought they were being clever and that they were speeding things up by doing this.
我猜有人认为他们很聪明,他们这样做是在加快速度。
Unfortunately, unless the situation is very well understood(it's likely that they have done no such thing), it may well have made things worse. The documentation even says:
不幸的是,除非对情况非常了解(很可能他们没有做过这样的事情),否则很可能会使事情变得更糟。文档甚至说:
In general, you should prefer to use actual profile feedback for this (
-fprofile-arcs
), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.
一般而言,您应该更喜欢使用实际的配置文件反馈 (
-fprofile-arcs
),因为程序员在预测他们的程序实际执行情况方面是出了名的糟糕。但是,有些应用程序很难收集这些数据。
In general, you shouldn't be using __builtin_expect
unless:
一般来说,__builtin_expect
除非:
- You have a very real performance issue
- You've already optimized the algorithms in the system appropriately
- You've got performance data to back up your assertion that a particular case is the most likely
- 你有一个非常真实的性能问题
- 您已经适当地优化了系统中的算法
- 您有性能数据来支持您的断言,即特定情况最有可能
回答by nobar
I don't see any of the answers addressing the question that I think you were asking, paraphrased:
我没有看到任何解决我认为你问的问题的答案,解释如下:
Is there a more portable way of hinting branch prediction to the compiler.
是否有更便携的方式向编译器提示分支预测。
The title of your question made me think of doing it this way:
你的问题的标题让我想到这样做:
if ( !x ) {} else foo();
If the compiler assumes that 'true' is more likely, it could optimize for not calling foo()
.
如果编译器假设 'true' 更有可能,它可以优化不调用foo()
.
The problem here is just that you don't, in general, know what the compiler will assume -- so any code that uses this kind of technique would need to be carefully measured (and possibly monitored over time if the context changes).
这里的问题只是你通常不知道编译器会假设什么——所以任何使用这种技术的代码都需要仔细测量(如果上下文发生变化,可能会随着时间的推移进行监控)。
回答by Victor Choy
I test it on Mac according @Blagovest Buyukliev and @Ciro. The assembles look clear and I add comments;
我根据@Blagovest Buyukliev 和@Ciro 在 Mac 上对其进行了测试。组装看起来很清楚,我添加了评论;
Commands are
gcc -c -O3 -std=gnu11 testOpt.c; otool -tVI testOpt.o
命令是
gcc -c -O3 -std=gnu11 testOpt.c; otool -tVI testOpt.o
When I use -O3 , it looks the same no matter the __builtin_expect(i, 0) exist or not.
当我使用 -O3 时,无论 __builtin_expect(i, 0) 是否存在,它看起来都一样。
testOpt.o:
(__TEXT,__text) section
_main:
0000000000000000 pushq %rbp
0000000000000001 movq %rsp, %rbp // open function stack
0000000000000004 xorl %edi, %edi // set time args 0 (NULL)
0000000000000006 callq _time // call time(NULL)
000000000000000b testq %rax, %rax // check time(NULL) result
000000000000000e je 0x14 // jump 0x14 if testq result = 0, namely jump to puts
0000000000000010 xorl %eax, %eax // return 0 , return appear first
0000000000000012 popq %rbp // return 0
0000000000000013 retq // return 0
0000000000000014 leaq 0x9(%rip), %rdi ## literal pool for: "a" // puts part, afterwards
000000000000001b callq _puts
0000000000000020 xorl %eax, %eax
0000000000000022 popq %rbp
0000000000000023 retq
When compile with -O2 , it looks different with and without __builtin_expect(i, 0)
用 -O2 编译时,有和没有 __builtin_expect(i, 0) 看起来不同
First without
首先没有
testOpt.o:
(__TEXT,__text) section
_main:
0000000000000000 pushq %rbp
0000000000000001 movq %rsp, %rbp
0000000000000004 xorl %edi, %edi
0000000000000006 callq _time
000000000000000b testq %rax, %rax
000000000000000e jne 0x1c // jump to 0x1c if not zero, then return
0000000000000010 leaq 0x9(%rip), %rdi ## literal pool for: "a" // put part appear first , following jne 0x1c
0000000000000017 callq _puts
000000000000001c xorl %eax, %eax // return part appear afterwards
000000000000001e popq %rbp
000000000000001f retq
Now with __builtin_expect(i, 0)
现在使用 __builtin_expect(i, 0)
testOpt.o:
(__TEXT,__text) section
_main:
0000000000000000 pushq %rbp
0000000000000001 movq %rsp, %rbp
0000000000000004 xorl %edi, %edi
0000000000000006 callq _time
000000000000000b testq %rax, %rax
000000000000000e je 0x14 // jump to 0x14 if zero then put. otherwise return
0000000000000010 xorl %eax, %eax // return appear first
0000000000000012 popq %rbp
0000000000000013 retq
0000000000000014 leaq 0x7(%rip), %rdi ## literal pool for: "a"
000000000000001b callq _puts
0000000000000020 jmp 0x10
To summarize, __builtin_expect works in the last case.
总而言之, __builtin_expect 在最后一种情况下有效。