Linux GCC 的 __builtin_expect 在 if else 语句中的优势是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7346929/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 06:04:22  来源:igfitidea点击:

What is the advantage of GCC's __builtin_expect in if else statements?

clinuxgccbuilt-in

提问by kingsmasher1

I came across a #definein which they use __builtin_expect.

我遇到了#define他们使用__builtin_expect.

The documentationsays:

文档说:

Built-in Function: long __builtin_expect (long exp, long c)

You may use __builtin_expectto provide the compiler with branch prediction information. In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.

The return value is the value of exp, which should be an integral expression. The semantics of the built-in are that it is expected that exp == c. For example:

      if (__builtin_expect (x, 0))
        foo ();

would indicate that we do not expect to call foo, since we expect xto be zero.

内置功能: long __builtin_expect (long exp, long c)

您可以使用__builtin_expect向编译器提供分支预测信息。一般而言,您应该更喜欢使用实际的配置文件反馈 ( -fprofile-arcs),因为程序员在预测他们的程序实际执行情况方面是出了名的糟糕。但是,有些应用程序很难收集这些数据。

返回值是 的值exp,应该是一个整数表达式。内置的语义是预期 exp == c. 例如:

      if (__builtin_expect (x, 0))
        foo ();

表示我们不期望调用foo,因为我们期望x为零。

So why not directly use:

那么为什么不直接使用:

if (x)
    foo ();

instead of the complicated syntax with __builtin_expect?

而不是复杂的语法__builtin_expect

采纳答案by Blagovest Buyukliev

Imagine the assembly code that would be generated from:

想象一下将生成的汇编代码:

if (__builtin_expect(x, 0)) {
    foo();
    ...
} else {
    bar();
    ...
}

I guess it should be something like:

我想它应该是这样的:

  cmp   $x, 0
  jne   _foo
_bar:
  call  bar
  ...
  jmp   after_if
_foo:
  call  foo
  ...
after_if:

You can see that the instructions are arranged in such an order that the barcase precedes the foocase (as opposed to the C code). This can utilise the CPU pipeline better, since a jump thrashes the already fetched instructions.

您可以看到指令的排列顺序是bar大小写在大小写之前foo(与 C 代码相反)。这可以更好地利用 CPU 流水线,因为跳转会破坏已经获取的指令。

Before the jump is executed, the instructions below it (the barcase) are pushed to the pipeline. Since the foocase is unlikely, jumping too is unlikely, hence thrashing the pipeline is unlikely.

在执行跳转之前,它下面的指令(barcase)被推送到管道中。由于这种foo情况不太可能发生,因此跳跃也不太可能,因此不太可能颠簸管道。

回答by Kerrek SB

Well, as it says in the description, the first version adds a predictive element to the construction, telling the compiler that the x == 0branch is the more likely one - that is, it's the branch that will be taken more often by your program.

嗯,正如描述中所说,第一个版本在构造中添加了一个预测元素,告诉编译器x == 0分支是更有可能的分支 - 也就是说,它是您的程序将更经常采用的分支。

With that in mind, the compiler can optimize the conditional so that it requires the least amount of work when the expected condition holds, at the expense of maybe having to do more work in case of the unexpected condition.

考虑到这一点,编译器可以优化条件,以便在预期条件成立时需要最少的工作,代价是在出现意外情况时可能不得不做更多的工作。

Take a look at how conditionals are implemented during the compilation phase, and also in the resulting assembly, to see how one branch may be less work than the other.

查看编译阶段以及生成的程序集中如何实现条件,以了解一个分​​支的工作量可能比另一个少。

However, I would only expect this optimization to have noticeable effect if the conditional in question is part of a tight inner loop that gets called a lot, since the difference in the resulting code is relatively small. And if you optimize it the wrong way round, you may well decrease your performance.

但是,如果所讨论的条件是被大量调用的紧密内部循环的一部分,我只会期望这种优化具有明显的效果,因为结果代码中的差异相对较小。如果你以错误的方式优化它,你很可能会降低你的性能。

回答by Michael Kohne

The idea of __builtin_expectis to tell the compiler that you'll usually find that the expression evaluates to c, so that the compiler can optimize for that case.

的想法__builtin_expect是告诉编译器您通常会发现表达式的计算结果为 c,以便编译器可以针对这种情况进行优化。

I'd guess that someone thought they were being clever and that they were speeding things up by doing this.

我猜有人认为他们很聪明,他们这样做是在加快速度。

Unfortunately, unless the situation is very well understood(it's likely that they have done no such thing), it may well have made things worse. The documentation even says:

不幸的是,除非对情况非常了解(很可能他们没有做过这样的事情),否则很可能会使事情变得更糟。文档甚至说:

In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.

一般而言,您应该更喜欢使用实际的配置文件反馈 ( -fprofile-arcs),因为程序员在预测他们的程序实际执行情况方面是出了名的糟糕。但是,有些应用程序很难收集这些数据。

In general, you shouldn't be using __builtin_expectunless:

一般来说,__builtin_expect除非:

  • You have a very real performance issue
  • You've already optimized the algorithms in the system appropriately
  • You've got performance data to back up your assertion that a particular case is the most likely
  • 你有一个非常真实的性能问题
  • 您已经适当地优化了系统中的算法
  • 您有性能数据来支持您的断言,即特定情况最有可能

回答by nobar

I don't see any of the answers addressing the question that I think you were asking, paraphrased:

我没有看到任何解决我认为你问的问题的答案,解释如下:

Is there a more portable way of hinting branch prediction to the compiler.

是否有更便携的方式向编译器提示分支预测。

The title of your question made me think of doing it this way:

你的问题的标题让我想到这样做:

if ( !x ) {} else foo();

If the compiler assumes that 'true' is more likely, it could optimize for not calling foo().

如果编译器假设 'true' 更有可能,它可以优化不调用foo().

The problem here is just that you don't, in general, know what the compiler will assume -- so any code that uses this kind of technique would need to be carefully measured (and possibly monitored over time if the context changes).

这里的问题只是你通常不知道编译器会假设什么——所以任何使用这种技术的代码都需要仔细测量(如果上下文发生变化,可能会随着时间的推移进行监控)。

回答by Victor Choy

I test it on Mac according @Blagovest Buyukliev and @Ciro. The assembles look clear and I add comments;

我根据@Blagovest Buyukliev 和@Ciro 在 Mac 上对其进行了测试。组装看起来很清楚,我添加了评论;

Commands are gcc -c -O3 -std=gnu11 testOpt.c; otool -tVI testOpt.o

命令是 gcc -c -O3 -std=gnu11 testOpt.c; otool -tVI testOpt.o

When I use -O3 , it looks the same no matter the __builtin_expect(i, 0) exist or not.

当我使用 -O3 时,无论 __builtin_expect(i, 0) 是否存在,它看起来都一样。

testOpt.o:
(__TEXT,__text) section
_main:
0000000000000000    pushq   %rbp     
0000000000000001    movq    %rsp, %rbp    // open function stack
0000000000000004    xorl    %edi, %edi       // set time args 0 (NULL)
0000000000000006    callq   _time      // call time(NULL)
000000000000000b    testq   %rax, %rax   // check time(NULL)  result
000000000000000e    je  0x14           //  jump 0x14 if testq result = 0, namely jump to puts
0000000000000010    xorl    %eax, %eax   //  return 0   ,  return appear first 
0000000000000012    popq    %rbp    //  return 0
0000000000000013    retq                     //  return 0
0000000000000014    leaq    0x9(%rip), %rdi  ## literal pool for: "a"  // puts  part, afterwards
000000000000001b    callq   _puts
0000000000000020    xorl    %eax, %eax
0000000000000022    popq    %rbp
0000000000000023    retq

When compile with -O2 , it looks different with and without __builtin_expect(i, 0)

用 -O2 编译时,有和没有 __builtin_expect(i, 0) 看起来不同

First without

首先没有

testOpt.o:
(__TEXT,__text) section
_main:
0000000000000000    pushq   %rbp
0000000000000001    movq    %rsp, %rbp
0000000000000004    xorl    %edi, %edi
0000000000000006    callq   _time
000000000000000b    testq   %rax, %rax
000000000000000e    jne 0x1c       //   jump to 0x1c if not zero, then return
0000000000000010    leaq    0x9(%rip), %rdi ## literal pool for: "a"   //   put part appear first ,  following   jne 0x1c
0000000000000017    callq   _puts
000000000000001c    xorl    %eax, %eax     // return part appear  afterwards
000000000000001e    popq    %rbp
000000000000001f    retq

Now with __builtin_expect(i, 0)

现在使用 __builtin_expect(i, 0)

testOpt.o:
(__TEXT,__text) section
_main:
0000000000000000    pushq   %rbp
0000000000000001    movq    %rsp, %rbp
0000000000000004    xorl    %edi, %edi
0000000000000006    callq   _time
000000000000000b    testq   %rax, %rax
000000000000000e    je  0x14   // jump to 0x14 if zero  then put. otherwise return 
0000000000000010    xorl    %eax, %eax   // return appear first 
0000000000000012    popq    %rbp
0000000000000013    retq
0000000000000014    leaq    0x7(%rip), %rdi ## literal pool for: "a"
000000000000001b    callq   _puts
0000000000000020    jmp 0x10

To summarize, __builtin_expect works in the last case.

总而言之, __builtin_expect 在最后一种情况下有效。