C++ 是否有编译器提示让 GCC 强制分支预测总是以某种方式进行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30130930/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 13:42:52  来源:igfitidea点击:

Is there a compiler hint for GCC to force branch prediction to always go a certain way?

c++gccintelpragmabranch-prediction

提问by WilliamKF

For the Intel architectures, is there a way to instruct the GCC compiler to generate code that always forces branch prediction a particular way in my code? Does the Intel hardware even support this? What about other compilers or hardwares?

对于 Intel 架构,有没有办法指示 GCC 编译器生成始终在我的代码中以特定方式强制分支预测的代码?英特尔硬件甚至支持这个吗?其他编译器或硬件呢?

I would use this in C++ code where I know the case I wish to run fast and do not care about the slow down when the other branch needs to be taken even when it has recently taken that branch.

我会在 C++ 代码中使用它,我知道我希望快速运行的情况并且不关心当另一个分支需要被采用时减速,即使它最近采用了该分支。

for (;;) {
  if (normal) { // How to tell compiler to always branch predict true value?
    doSomethingNormal();
  } else {
    exceptionalCase();
  }
}


As a follow on question for Evdzhan Mustafa, can the hint just specify a hint for the first time the processor encounters the instruction, all subsequent branch prediction, functioning normally?

作为 Evdzhan Mustafa 的后续问题,提示是否可以仅指定处理器第一次遇到指令时的提示,所有后续分支预测,功能正常吗?

采纳答案by pseyfert

As of C++20 the likely and unlikely attributesshould be standardized and are already supported in g++9. So as discussed here, you can write

从 C++20 开始,可能和不太可能的属性应该被标准化,并且已经在 g++9 中得到支持。所以正如这里所讨论的,你可以写

if (a>b) {
  /* code you expect to run often */
  [[likely]] /* last statement */
}

e.g. in the following code the else block gets inlined thanks to the [[unlikely]]in the if block

例如,在下面的代码中,由于[[unlikely]]在 if 块中,else 块被内联

int oftendone( int a, int b );
int rarelydone( int a, int b );
int finaltrafo( int );

int divides( int number, int prime ) {
  int almostreturnvalue;
  if ( ( number % prime ) == 0 ) {
    auto k                         = rarelydone( number, prime );
    auto l                         = rarelydone( number, k );
    [[unlikely]] almostreturnvalue = rarelydone( k, l );
  } else {
    auto a            = oftendone( number, prime );
    almostreturnvalue = oftendone( a, a );
  }
  return finaltrafo( almostreturnvalue );
}

godbolt link comparing the presence/absence of the attribute

Godbolt 链接比较属性的存在/不存在

回答by Hyman

GCC supports the function __builtin_expect(long exp, long c)to provide this kind of feature. You can check the documentation here.

GCC 支持__builtin_expect(long exp, long c)提供这种功能的功能。您可以在此处查看文档。

Where expis the condition used and cis the expected value. For example in you case you would want

哪里exp是使用的条件,c是期望值。例如在你的情况下你会想要

if (__builtin_expect(normal, 1))

Because of the awkward syntax this is usually used by defining two custom macros like

由于笨拙的语法,这通常通过定义两个自定义宏来使用,例如

#define likely(x)    __builtin_expect (!!(x), 1)
#define unlikely(x)  __builtin_expect (!!(x), 0)

just to ease the task.

只是为了减轻任务。

Mind that:

请注意:

  1. this is non standard
  2. a compiler/cpu branch predictor are likely more skilled than you in deciding such things so this could be a premature micro-optimization
  1. 这是非标准的
  2. 编译器/cpu 分支预测器在决定此类事情时可能比您更熟练,因此这可能是过早的微优化

回答by Shafik Yaghmour

gcc has long __builtin_expect (long exp, long c)(emphasis mine):

gcc 有很长的 __builtin_expect (long exp, long c)强调我的):

You may use __builtin_expect to provide the compiler with branch prediction information. In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.

The return value is the value of exp, which should be an integral expression. The semantics of the built-in are that it is expected that exp == c. For example:

if (__builtin_expect (x, 0))
   foo ();

indicates that we do not expect to call foo, since we expect x to be zero. Since you are limited to integral expressions for exp, you should use constructions such as

if (__builtin_expect (ptr != NULL, 1))
   foo (*ptr);

when testing pointer or floating-point values.

您可以使用 __builtin_expect 为编译器提供分支预测信息。通常,您应该更喜欢为此使用实际的配置文件反馈 (-fprofile-arcs),因为众所周知,程序员不擅长预测他们的程序实际执行情况。但是,有些应用程序很难收集这些数据。

返回值是exp的值,应该是一个整数表达式。内置的语义是预期 exp == c。例如:

if (__builtin_expect (x, 0))
   foo ();

表示我们不期望调用 foo,因为我们期望 x 为零。由于您仅限于 exp 的整数表达式,您应该使用诸如

if (__builtin_expect (ptr != NULL, 1))
   foo (*ptr);

测试指针或浮点值时。

As the documentation notes you should prefer to use actual profile feedback and this article shows a practical example of thisand how it in their case at least ends up being an improvement over using __builtin_expect. Also see How to use profile guided optimizations in g++?.

正如文档所指出的,您应该更喜欢使用实际的配置文件反馈,本文展示了一个实际示例以及在他们的情况下它如何至少最终比使用__builtin_expect. 另请参阅如何在 g++ 中使用配置文件引导的优化?.

We can also find a Linux kernel newbies article on the kernal macros likely() and unlikely()which use this feature:

我们还可以找到一篇关于使用此功能的内核宏可能()和不太可能()Linux 内核新手文章

#define likely(x)       __builtin_expect(!!(x), 1)
#define unlikely(x)     __builtin_expect(!!(x), 0)

Note the !!used in the macro we can find the explanation for this in Why use !!(condition) instead of (condition)?.

注意!!宏中使用的,我们可以在Why use !!(condition)而不是(condition)?中找到对此的解释.

Just because this technique is used in the Linux kernel does not mean it always makes sense to use it. We can see from this question I recently answered difference between the function performance when passing parameter as compile time constant or variablethat many hand rolled optimizations techniques don't work in the general case. We need to profile code carefully to understand whether a technique is effective. Many old techniques may not even be relevant with modern compiler optimizations.

仅仅因为在 Linux 内核中使用了这种技术,并不意味着使用它总是有意义的。我们可以从这个问题中看到我最近回答了将参数作为编译时常量或变量传递时函数性能之间的差异,许多手动优化技术在一般情况下不起作用。我们需要仔细分析代码以了解一项技术是否有效。许多旧技术甚至可能与现代编译器优化无关。

Note, although builtins are not portable clang also supports __builtin_expect.

请注意,虽然内置函数不是可移植的clang 也支持 __builtin_expect

Also on some architectures it may not make a difference.

同样在某些体系结构上,它可能没有什么区别

回答by Artelius

No, there is not. (At least on modern x86 processors.)

不,那里没有。(至少在现代 x86 处理器上是这样。)

__builtin_expectmentioned in other answers influences the way gcc arranges the assembly code. It does not directlyinfluence the CPU's branch predictor.Of course, there will be indirect effects on branch prediction caused by reordering the code. But on modern x86 processors there is no instruction that tells the CPU "assume this branch is/isn't taken".

__builtin_expect在其他答案中提到会影响 gcc 安排汇编代码的方式。它不会直接影响 CPU 的分支预测器。当然,重新排序代码会对分支预测产生间接影响。但是在现代 x86 处理器上,没有指令告诉 CPU“假设这个分支被/不被采用”。

See this question for more detail: Intel x86 0x2E/0x3E Prefix Branch Prediction actually used?

有关更多详细信息,请参阅此问题:Intel x86 0x2E/0x3E Prefix Branch Prediction actual used?

To be clear, __builtin_expectand/or the use of -fprofile-arcscanimprove the performance of your code, both by giving hints to the branch predictor through code layout (see Performance optimisations of x86-64 assembly - Alignment and branch prediction), and also improving cache behaviour by keeping "unlikely" code away from "likely" code.

明确地说,__builtin_expect和/或使用-fprofile-arcs可以提高代码的性能,既可以通过代码布局向分支预测器提供提示(请参阅x86-64 程序集的性能优化 - 对齐和分支预测),还可以改进缓存行为通过使“不太可能”的代码远离“可能的”代码。

回答by Maxim Egorushkin

The correct way to define likely/unlikely macros in C++11 is the following:

在 C++11 中定义可能/不可能宏的正确方法如下:

#define LIKELY(condition) __builtin_expect(static_cast<bool>(condition), 1)
#define UNLIKELY(condition) __builtin_expect(static_cast<bool>(condition), 0)

This method is compatible with all C++ versions, unlike [[likely]], but relies on non-standard extension __builtin_expect.

此方法与所有 C++ 版本兼容,与 不同[[likely]],但依赖于非标准扩展名__builtin_expect



When these macros defined this way:

当这些宏以这种方式定义时:

#define LIKELY(condition) __builtin_expect(!!(condition), 1)

That may change the meaning of ifstatements and break the code. Consider the following code:

这可能会改变if语句的含义并破坏代码。考虑以下代码:

#include <iostream>

struct A
{
    explicit operator bool() const { return true; }
    operator int() const { return 0; }
};

#define LIKELY(condition) __builtin_expect((condition), 1)

int main() {
    A a;
    if(a)
        std::cout << "if(a) is true\n";
    if(LIKELY(a))
        std::cout << "if(LIKELY(a)) is true\n";
    else
        std::cout << "if(LIKELY(a)) is false\n";
}

And its output:

它的输出:

if(a) is true
if(LIKELY(a)) is false

As you can see, the definition of LIKELY using !!as a cast to boolbreaks the semantics of if.

如您所见, LIKELY 的定义!!用作强制转换来bool破坏if.

The point here is not that operator int()and operator bool()should be related. Which is good practice.

这里的重点不是那个operator int()operator bool()应该是相关的。这是一个很好的做法。

Rather that using !!(x)instead of static_cast<bool>(x)loses the context for C++11 contextual conversions.

而是使用!!(x)而不是static_cast<bool>(x)丢失C++11 上下文转换的上下文

回答by Cody Gray

As the other answers have all adequately suggested, you can use __builtin_expectto give the compiler a hint about how to arrange the assembly code. As the official docspoint out, in most cases, the assembler built into your brain will not be as good as the one crafted by the GCC team. It's always best to use actual profile data to optimize your code, rather than guessing.

由于其他答案都已充分建议,您可以使用__builtin_expect向编译器提供有关如何安排汇编代码的提示。正如官方文档指出的那样,在大多数情况下,你大脑中内置的汇编器不会像 GCC 团队制作的那样好。最好使用实际的配置文件数据来优化您的代码,而不是猜测。

Along similar lines, but not yet mentioned, is a GCC-specific way to force the compiler to generate code on a "cold" path. This involves the use of the noinlineand coldattributes, which do exactly what they sound like they do. These attributes can only be applied to functions, but with C++11, you can declare inline lambda functions and these two attributes can also be applied to lambda functions.

沿着类似的路线,但尚未提及,是一种特定于 GCC 的方式来强制编译器在“冷”路径上生成代码。这涉及到noinlinecold属性的使用,它们的作用与它们听起来完全一样。这些属性只能应用于函数,但是在 C++11 中,您可以声明内联 lambda 函数,并且这两个属性也可以应用于 lambda 函数。

Although this still falls into the general category of a micro-optimization, and thus the standard advice applies—test don't guess—I feel like it is more generally useful than __builtin_expect. Hardly any generations of the x86 processor use branch prediction hints (reference), so the only thing you're going to be able to affect anyway is the order of the assembly code. Since you know what is error-handling or "edge case" code, you can use this annotation to ensure that the compiler won't ever predict a branch to it and will link it away from the "hot" code when optimizing for size.

虽然这仍然属于微优化的一般类别,因此标准建议适用 - 测试不要猜测 - 我觉得它比__builtin_expect. 几乎没有一代 x86 处理器使用分支预测提示(参考),因此无论如何您唯一能够影响的是汇编代码的顺序。由于您知道什么是错误处理或“边缘情况”代码,您可以使用此注释来确保编译器永远不会预测到它的分支,并在优化大小时将其与“热”代码链接起来。

Sample usage:

示例用法:

void FooTheBar(void* pFoo)
{
    if (pFoo == nullptr)
    {
        // Oh no! A null pointer is an error, but maybe this is a public-facing
        // function, so we have to be prepared for anything. Yet, we don't want
        // the error-handling code to fill up the instruction cache, so we will
        // force it out-of-line and onto a "cold" path.
        [&]() __attribute__((noinline,cold)) {
            HandleError(...);
        }();
    }

    // Do normal stuff
    ?
}

Even better, GCC will automatically ignore this in favor of profile feedback when it is available (e.g., when compiling with -fprofile-use).

更好的是,当它可用时(例如,当编译时-fprofile-use),GCC 将自动忽略它以支持配置文件反馈。

See the official documentation here: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes

请参阅此处的官方文档:https: //gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes

回答by gnasher729

__builtin_expect can be used to tell the compiler which way you expect a branch to go. This can influence how the code is generated. Typical processors run code faster sequentially. So if you write

__builtin_expect 可用于告诉编译器您期望分支的走向。这会影响代码的生成方式。典型的处理器按顺序运行代码更快。所以如果你写

if (__builtin_expect (x == 0, 0)) ++count;
if (__builtin_expect (y == 0, 0)) ++count;
if (__builtin_expect (z == 0, 0)) ++count;

the compiler will generate code like

编译器会生成类似的代码

if (x == 0) goto if1;
back1: if (y == 0) goto if2;
back2: if (z == 0) goto if3;
back3: ;
...
if1: ++count; goto back1;
if2: ++count; goto back2;
if3: ++count; goto back3;

If your hint is correct, this will execute the code without any branches actually performed. It will run faster than the normal sequence, where each if statement would branch around the conditional code and would execute three branches.

如果您的提示正确,这将执行代码而不实际执行任何分支。它将比正常序列运行得更快,其中每个 if 语句将围绕条件代码分支并执行三个分支。

Newer x86 processors have instructions for branches that are expected to be taken, or for branches that are expected not to be taken (there's an instruction prefix; not sure about the details). Not sure if the processor uses that. It is not very useful, because branch prediction will handle this just fine. So I don't think you can actually influence the branch prediction.

较新的 x86 处理器具有针对预期采用的分支或预期不采用的分支的指令(有一个指令前缀;不确定详细信息)。不确定处理器是否使用它。它不是很有用,因为分支预测可以很好地处理这个问题。所以我认为你实际上不能影响分支预测

回答by TheCppZoo

With regards to the OP, no, there is no way in GCC to tell the processor to always assume the branch is or isn't taken. What you have is __builtin_expect, which does what others say it does. Furthermore, I think you don't want to tell the processor whether the branch is taken or not always. Today's processors, such as the Intel architecture can recognize fairly complex patterns and adapt effectively.

关于 OP,不,在 GCC 中没有办法告诉处理器总是假设分支被采用或不被采用。你拥有的是 __builtin_expect,它做别人说的那样。此外,我认为您不想告诉处理器是否总是采用分支。今天的处理器,如英特尔架构,可以识别相当复杂的模式并有效地适应。

However, there are times you want to assume control of whether by defaulta branch is predicted taken or not: When you know the code will be called "cold" with respect of branching statistics.

但是,有时您希望控制是否默认预测采用分支:当您知道代码将在分支统计方面被称为“冷”时。

One concrete example: Exception management code. By definition the management code will happen exceptionally, but perhaps when it occurs maximum performance is desired (there may be a critical error to take care off as soon as possible), hence you may want to control the default prediction.

一个具体的例子:异常管理代码。根据定义,管理代码将发生异常,但也许当它发生时需要最大性能(可能存在需要尽快处理的严重错误),因此您可能希望控制默认预测。

Another example: You may classify your input and jump into the code that handles the result of your classification. If there are many classifications, the processor may collect statistics but lose them because the same classification does not happen soon enough and the prediction resources are devoted to recently called code. I wish there would be a primitive to tell the processor "please do not devote prediction resources to this code" the way you sometimes can say "do not cache this".

另一个示例:您可以对输入进行分类并跳转到处理分类结果的代码。如果有很多分类,处理器可能会收集统计信息但会丢失它们,因为相同的分类不会很快发生,并且预测资源专门用于最近调用的代码。我希望有一个原语告诉处理器“请不要将预测资源用于此代码”,就像您有时可以说“不要缓存它”一样。