C++ 最快代码的 GCC 建议和选项

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3005564/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 11:45:34  来源:igfitidea点击:

GCC recommendations and options for fastest code

c++unixgcc

提问by rwallace

I'm distributing a C++ program with a makefile for the Unix version, and I'm wondering what compiler options I should use to get the fastest possible code (it falls into the category of programs that can use all the computing power they can get and still come back for more), given that I don't know in advance what hardware, operating system or gcc version the user will have, and I want above all else to make sure it at least works correctly on every major Unix-like operating system.

我正在分发一个带有 Unix 版本的 makefile 的 C++ 程序,我想知道我应该使用哪些编译器选项来获得最快的代码(它属于可以使用所有计算能力的程序类别)并且仍然回来更多),因为我事先不知道用户将使用什么硬件、操作系统或 gcc 版本,而且我最重要的是要确保它至少在每个主要的类 Unix 上都能正常工作操作系统。

Thus far, I have g++ -O3 -Wno-write-strings, are there any other options I should add? On Windows, the Microsoft compiler has options for things like fast calling convention and link time code generation that are worth using, are there any equivalents on gcc?

到目前为止,我有g++ -O3 -Wno-write-strings,我应该添加任何其他选项吗?在 Windows 上,Microsoft 编译器具有诸如快速调用约定和链接时间代码生成等值得使用的选项,gcc 上是否有任何等效项?

(I'm assuming it will default to 64-bit on a 64-bit platform, please correct me if that's not the case.)

(我假设它会在 64 位平台上默认为 64 位,如果不是这种情况,请纠正我。)

采纳答案by Il-Bhima

Without knowing any specifics on your program it's hard to say. O3 covers most of the optimisations. The remaining options come "at a cost". If you can tolerate some random rounding and your code isn't dependent on IEEE floating point standards then you can try -Ofast. This disregards standards compliance and can give you faster code.

在不知道您的程序的任何细节的情况下,很难说。O3 涵盖了大部分优化。其余的选择是“有代价的”。如果您可以容忍一些随机舍入并且您的代码不依赖于 IEEE 浮点标准,那么您可以尝试 -Ofast。这无视标准合规性,可以为您提供更快的代码。

The remaining optimisations flags can only improve performance of certain programs, but can even be detrimental to others. Look at the available flags in the gcc documentation on optimisation flagsand benchmark them.

其余的优化标志只能提高某些程序的性能,但甚至可能对其他程序有害。查看gcc 文档中关于优化标志的可用标志并对它们进行基准测试。

Another option is to enable C99 (-std=c99) and inline appropriate functions. This is a bit of an art, you shouldn't inline everything, but with a little work you can get your code to be faster (albeit at the cost of having a larger executable).

另一种选择是启用 C99 (-std=c99) 并内联适当的函数。这是一门艺术,您不应该内联所有内容,但是通过一些工作,您可以使代码更快(尽管代价是拥有更大的可执行文件)。

If speed is really an issue I would suggest either going back to Microsoft's compiler, or to try Intel's. I've come to appreciate how slow some gcc compiled code can be, especially when it involves math.h.

如果速度真的是一个问题,我建议要么回到微软的编译器,要么尝试英特尔的。我开始意识到一些 gcc 编译的代码有多慢,尤其是当它涉及 math.h 时。

EDIT: Oh wait, you said C++? Then disregard my C99 paragraph, you can inline already :)

编辑:哦等等,你说的是 C++?然后忽略我的 C99 段落,你已经可以内联了 :)

回答by Bastien Léonard

I would try profile guided optimization:

我会尝试配置文件引导优化:

-fprofile-generateEnable options usually used for instrumenting application to produce profile useful for later recompilation with profile feedback based optimization. You must use -fprofile-generateboth when compiling and when linking your program. The following options are enabled: -fprofile-arcs, -fprofile-values, -fvpt.

-fprofile-generate启用通常用于检测应用程序的选项,以生成对以后使用基于配置文件反馈的优化重新编译有用的配置文件。-fprofile-generate编译和链接程序时必须同时使用。启用以下选项:-fprofile-arcs-fprofile-values-fvpt

You should also give the compiler hints about the architecture on which the program will run. For example if it will only run on a server and you can compile it on the same machine as the server, you can just use -march=native. Otherwise you need to determine which features your users will all have and pass the corresponding parameter to GCC.

您还应该向编译器提供有关程序将在其上运行的体系结构的提示。例如,如果它只在服务器上运行并且您可以在与服务器相同的机器上编译它,则可以使用-march=native. 否则,您需要确定您的用户都将拥有哪些功能,并将相应的参数传递给 GCC。

(Apparently you're targeting 64-bit, so GCC will probably already include more optimizations than for generic x86.)

(显然您的目标是 64 位,因此 GCC 可能已经包含比通用 x86 更多的优化。)

回答by TheCodeArtist

-oFast

-oFast



Please try -oFastinstead of -o3

请尝试-oFast而不是-o3

Also here is a list of flags you might want to selectively enable.

这里还有一个您可能想要有选择地启用的标志列表。

-ffloat-store

-fexcess-precision=style

-ffast-math

-fno-rounding-math

-fno-signaling-nans

-fcx-limited-range

-fno-math-errno

-funsafe-math-optimizations

-fassociative-math

-freciprocal-math

-ffinite-math-only

-fno-signed-zeros

-fno-trapping-math

-frounding-math

-fsingle-precision-constant

-fcx-fortran-rules

-ffloat-store

-fexcess-precision=style

-ffast-数学

-fno-舍入数学

-fno-signaling-nans

-fcx-limited-range

-fno-math-errno

-funsafe-math-optimizations

-fassociative-数学

-freciprocal-math

-finite-math-only

-fno 有符号零

-fno-trapping-math

-基础数学

-fsingle-precision-constant

-fcx-fortran-规则

A complete list of the flags and their detailed description is available here

完整的标志列表及其详细说明可在此处获得

回答by ohcul

Consider using -fomit-frame-pointerunless you need to debug with gdb (yuck). That will give the compiler one more register to use for variables (otherwise this register is wasted for useless frame pointers).

-fomit-frame-pointer除非您需要使用 gdb 进行调试,否则请考虑使用(糟糕)。这会给编译器多一个寄存器用于变量(否则这个寄存器会浪费在无用的帧指针上)。

Also you may use something like -march=core2or more generally -march=nativeto enable the compiler to use newer instructions and further tune the code for the specified architecture, but for this you must be sure your code will not be expected to run on older processors.

您也可以使用类似-march=core2或更一般的东西-march=native来使编译器能够使用更新的指令并进一步调整指定架构的代码,但为此您必须确保您的代码不会在旧处理器上运行。

回答by Jendas

You should certainly, apart from what others have already suggested, try -flto. It enables link time optimization which, in some cases, can really do magic.

除了其他人已经建议的之外,您当然应该尝试-flto. 它支持链接时间优化,在某些情况下,这真的很神奇。

For further information see https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

有关更多信息,请参阅https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

回答by rubenvb

gcc -O3 is not guaranteed to be the fastest. -O2 is often a better starting point. After that, profile guided optimization and trying out specific options: http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

gcc -O3 不能保证是最快的。-O2 通常是一个更好的起点。之后,配置文件引导优化并尝试特定选项:http: //gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

It's a long read, but probably worth it.

这是一个很长的阅读,但可能值得。

Note that a "Link Time Code Generation" (MSVC) aka "Link Time Optimization" is available in gcc 4.5+

请注意,gcc 4.5+ 中提供了“链接时间代码生成”(MSVC)又名“链接时间优化”

By the way, there is no specific "fastcall" calling convention for Win64. There is only "the" calling convention: http://msdn.microsoft.com/en-us/magazine/cc300794.aspx

顺便说一下,Win64 没有特定的“fastcall”调用约定。只有“该”调用约定:http: //msdn.microsoft.com/en-us/magazine/cc300794.aspx

回答by Flo

There is no 'fastcall' on x86-64 - both Win64 and Linux ABI define register-based calling ("fastcall") as the only calling convention (though Linux uses more registers).

x86-64 上没有“fastcall”——Win64 和 Linux ABI 都将基于寄存器的调用(“fastcall”)定义为唯一的调用约定(尽管 Linux 使用更多的寄存器)。