这里有人对英特尔 C++ 编译器和 GCC 进行过基准测试吗？

Question

提问by James Bond

I am not sure whether I should post this question here, because this seems to be a programming-oriented website.

我不确定我是否应该在这里发布这个问题，因为这似乎是一个面向编程的网站。

Anyway, I think there must be some gurus here who knows this.

无论如何，我认为这里一定有一些大师知道这一点。

Now I have a AMD Opteron server running CentOS 5. I want to have a compiler for a fairly large c++ Boost based program. Which compiler I should choose?

现在我有一个运行 CentOS 5 的 AMD Opteron 服务器。我想要一个相当大的基于 c++ Boost 的程序的编译器。我应该选择哪个编译器？

Answer 1

采纳答案by justin

I hope this helps more than hurts :)

我希望这比伤害更有用:)

I did a little compiler shootout sometime over a year ago, and I am going off memory.

一年多前的某个时候，我做了一个小小的编译器枪战，现在我已经记不清了。

GCC 4.2 (Apple)
Intel 10
GCC 4.2 (Apple) + LLVM

GCC 4.2（苹果）
英特尔 10
GCC 4.2（苹果）+ LLVM

I tested multiple template heavy audio signal processing programs that I'd written.

我测试了我编写的多个模板重音频信号处理程序。

Compilation times: The Intel compiler was by far the slowest compiler - more than '2x times slower' as another posted cited.

编译时间：英特尔编译器是迄今为止最慢的编译器 - 正如另一篇文章引用的那样，“慢了 2 倍”。

GCC handled deep templates very well in comparison to Intel.

与英特尔相比，GCC 能够很好地处理深度模板。

The Intel compiler generated hugeobject files.

英特尔编译器生成了巨大的目标文件。

GCC+LLVM yielded the smallest binary.

GCC+LLVM 产生了最小的二进制文件。

The generated code may have significant variance due to the program's construction, and where SIMD could be used.

由于程序的构造以及可以使用 SIMD 的地方，生成的代码可能会有很大的差异。

For the way I write, I found that GCC + LLVM generated the best code. For programs which I'd written before I took optimization seriously (as I wrote), Intel was generally better.

对于我写的方式，我发现 GCC + LLVM 生成了最好的代码。对于我在认真对待优化之前编写的程序（如我所写），英特尔通常更好。

Intel's results varied; it handled some programs far better, and some programs far worse. It handled raw processing very well, but I give GCC+LLVM the cake because when put into the context of a larger (normal) program... it did better.

英特尔的结果各不相同；它处理一些程序要好得多，而有些程序要差得多。它很好地处理了原始处理，但我给了 GCC+LLVM 蛋糕，因为当放入更大（正常）程序的上下文时......它做得更好。

Intel won for out of the box, number crunching on huge data sets.

英特尔赢得了开箱即用的巨大数据集的数字运算。

GCC alone generated the slowest code, though it can be as fast with measurement and nano-optimizations. I prefer to avoid those because the wind may change direction with the next compiler release, so to speak.

GCC 单独生成最慢的代码，尽管它可以与测量和纳米优化一样快。我更愿意避免这些，因为下一个编译器版本可能会改变方向，可以这么说。

I never measured poorly written programs in this test (i.e. results outperformed distributions of popular performance libraries).

在这个测试中，我从来没有测量过写得不好的程序（即结果优于流行性能库的分布）。

Finally, the programs were written over several years, using GCC as the primary compiler in that time.

最后，这些程序编写了几年，当时使用 GCC 作为主要编译器。

Update: I was also enabling optimizations/extensions for Core2Duo. The programs were clean enough to enable strict aliasing.

更新：我还为 Core2Duo 启用了优化/扩展。这些程序足够干净，可以启用严格的别名。

Answer 2

回答by Goz

There is an interesting PDF herewhich compares a number of compilers.

这里有一个有趣的PDF，它比较了许多编译器。

Answer 3

回答by Glen

The MySQL team posted once that icc gave them about a 10% performanct boost over gcc. I'll try to find the link.

MySQL 团队曾经发布过，icc 使他们的性能比 gcc 提高了 10%。我会试着找到链接。

In general I've found that the 'native' compilers perform better than gcc on their respective platforms

总的来说，我发现“本地”编译器在其各自的平台上的性能比 gcc 好

edit: I was a little off. Typical gains were 20-30% not 10%. Some narrow edge cases got a doubling of performance. http://www.mysqlperformanceblog.com/files/presentations/LinuxWorld2004-Intel.pdf

编辑：我有点不对劲。典型的收益是 20-30% 而不是 10%。一些狭窄的边缘情况使性能提高了一倍。 http://www.mysqlperformanceblog.com/files/presentations/LinuxWorld2004-Intel.pdf

Answer 4

回答by Gautham Ganapathy

I suppose it varies depending on the code, but with the codebase I am working on now, ICC 11.035 gives an almost 2x improvement over gcc 4.4.0 on a Xeon 5504.

我想它因代码而异，但是使用我现在正在处理的代码库，ICC 11.035 在 Xeon 5504 上比 gcc 4.4.0 提高了近 2 倍。

icc options: -O2 -fno-alias
gcc options: -O3 -msse3 -mfpmath=sse -fargument-noalias-global

icc 选项：-O2 -fno-alias
gcc 选项：-O3 -msse3 -mfpmath=sse -fargument-noalias-global

The options are specific to just the file containing the compute-intensive code, where I know there is no aliasing. Single-threaded code with a 5-level nested loop.

这些选项仅针对包含计算密集型代码的文件，我知道其中没有别名。具有 5 级嵌套循环的单线程代码。

Although autovectorization is enabled, neither compilers generate vectorized code (not a fault of the compilers)

尽管启用了自动向量化，但两个编译器都不会生成向量化代码（不是编译器的错）

Update (2015/02/27): While optimizing some geophysics code (Q2, 2013) to run on Sandy Bridge-E Xeons, I had an opportunity to compare the performance of ICC 11.1 against GCC 4.8.0, and GCC was now generating faster code than ICC. The code made used of AVX intrinsics and did use 8-way vectorized instructions (nieither compiler autovectorized the code properly due to certain data layout requirements). In addition, GCC's LTO implementation (with the IR core embedded in the .o files) was much easier to manage than that in ICC. GCC with LTO was running roughly 3 times faster than ICC without LTO. I'm not able to find the numbers right now for GCC without LTO, but I recall it was still faster than ICC. It's by no means a general statement on ICC's performance, but the results were sufficient for us to go ahead with GCC 4.8.*.

更新 (2015/02/27)：在优化一些地球物理代码 (Q2, 2013) 以在 Sandy Bridge-E Xeons 上运行时，我有机会比较 ICC 11.1 与 GCC 4.8.0 的性能，现在 GCC 正在生成比ICC更快的代码。代码使用了 AVX 内在函数，并且确实使用了 8 路向量化指令（由于某些数据布局要求，编译器都没有正确地自动向量化代码）。此外，GCC 的 LTO 实现（在 .o 文件中嵌入了 IR 内核）比在 ICC 中更容易管理。带 LTO 的 GCC 的运行速度大约比不带 LTO 的 ICC 快 3 倍。我现在无法找到没有 LTO 的 GCC 的数字，但我记得它仍然比 ICC 快。这绝不是对 ICC 性能的一般性陈述，但结果足以让我们继续使用 GCC 4.8.*。

Looking forward to GCC 5.0 (http://www.phoronix.com/scan.php?page=article&item=gcc-50-broadwell)!

期待 GCC 5.0（http://www.phoronix.com/scan.php?page=article&item=gcc-50-broadwell）！

Answer 5

回答by Peeter Joot

We use the Intel compiler on our product (DB2), on Linux and Windows IA32/AMD64, and on OS X (i.e. all our Intel platform ports except SunAMD).

我们在我们的产品 (DB2)、Linux 和 Windows IA32/AMD64 以及 OS X（即除 SunAMD 之外的所有英特尔平台端口）上使用英特尔编译器。

I don't know the numbers, but the performance is good enough that we:

我不知道数字，但性能足够好，我们：

pay for the compiler which I'm told is very expensive.
live with the 2x times slower build times (primarily due to the time it spends acquiring licenses before it allows itself to run).

我被告知为编译器付费非常昂贵。
忍受 2 倍慢的构建时间（主要是因为它在允许自己运行之前花费了获取许可证的时间）。

Answer 6

回答by Denis TRUFFAUT

PHP- Compilation from source, with ICC rather than GCC, should result in a 10 % to 20 % speed improvment - http://www.papelipe.no/tags/ez_publish/benchmark_of_intel_compiled_icc_apache_php_and_apc

PHP- 从源代码编译，使用 ICC 而不是 GCC，应该会提高 10% 到 20% 的速度 - http://www.papelipe.no/tags/ez_publish/benchmark_of_intel_compiled_icc_apache_php_and_apc

MySQL- Compilation from source, with ICC rather than GCC, should result in a 25 % to 50 % speed improvment - http://www.mysqlperformanceblog.com/files/presentations/LinuxWorld2005-Intel.pdf

MySQL- 从源代码编译，使用 ICC 而不是 GCC，应该会提高 25% 到 50% 的速度 - http://www.mysqlperformanceblog.com/files/presentations/LinuxWorld2005-Intel.pdf

Answer 7

回答by Calimo

I used UnixBench(v. 5.1.3) on an openSUSE 12.2 (kernel 3.4.33-2.24-default x86_64), and compiled it first with GCC, and then with Intel's compiler.

我在 openSUSE 12.2（内核 3.4.33-2.24-default x86_64）上使用了UnixBench（v. 5.1.3），首先用 GCC 编译它，然后用 Intel 的编译器编译它。

With 1 parallel copy, UnixBench compiled with Intel's is about 20% faster than the version compiled with GCC. However this hides huge differences. Dhrystone is about 25% slower with Intel compiler, while Whetstone runs 2x faster.

使用 1 个并行副本，使用 Intel 编译的 UnixBench 比使用 GCC 编译的版本快约 20%。然而，这隐藏了巨大的差异。使用英特尔编译器时，Dhrystone 的速度降低了约 25%，而 Whetstone 的运行速度提高了 2 倍。

With 4 copies of UnixBench running in parallel, the improvement of Intel compiler over GCC is only 7%. Again Intel is much better at Whetstone (> 200%), and slower at Dhrystone (about 20%).

在并行运行 4 个 UnixBench 副本的情况下，Intel 编译器相对于 GCC 的改进仅为 7%。同样，英特尔在 Whetstone（> 200%）方面要好得多，而在 Dhrystone 方面则较慢（约 20%）。

Answer 8

回答by Benj

I used to work on a fairly large signal processing system which ran on a large cluster. We used to reckon for heavy maths crunching, the Intel compiler gave us about 10% less CPU load than GCC. That's very unscientific but it was our experience (that was about 18 months ago).

我曾经在一个相当大的信号处理系统上工作，该系统在一个大型集群上运行。我们曾经估计过繁重的数学运算，英特尔编译器给我们的 CPU 负载比 GCC 减少了大约 10%。这是非常不科学的，但这是我们的经验（大约 18 个月前）。

What would have been interesting is if we'd been able to use Intel's math libraries as well which use their chipset more efficiently.

有趣的是，如果我们也能够使用英特尔的数学库，从而更有效地使用他们的芯片组。

Answer 9

回答by tim18

Many optimizations which the Intel compiler performs routinely require specific source syntax and use of -O3 -ffast-math for gcc. Unfortunately, the -funsafe-math-optimizations component of -ffast-math -O3 -march=native has turned out to be incompatible with -fopenmp, so I must split my source files into groups named with the different options in Makefile. Today I ran into a failure where a g++ build using -O3 -ffast-math -fopenmp -march=native was able to write to screen but not redirect to a file. One of the more egregious differences in my opinion is the optimization by icpc only of std::max and min where gcc/g++ want the fmax|min[f] with -ffast-math to change their meaning away from standard.

英特尔编译器例行执行的许多优化需要特定的源语法并使用 -O3 -ffast-math for gcc。不幸的是，-ffast-math -O3 -march=native 的 -funsafe-math-optimizations 组件与 -fopenmp 不兼容，所以我必须将我的源文件分成使用 Makefile 中不同选项命名的组。今天我遇到了一个失败，使用 -O3 -ffast-math -fopenmp -march=native 的 g++ 构建能够写入屏幕但不能重定向到文件。在我看来，最严重的差异之一是 icpc 仅对 std::max 和 min 进行了优化，其中 gcc/g++ 希望 fmax|min[f] 与 -ffast-math 的含义偏离标准。

这里有人对英特尔 C++ 编译器和 GCC 进行过基准测试吗？

提问by James Bond

采纳答案by justin

回答by Goz

回答by Glen

回答by Gautham Ganapathy

回答by Peeter Joot

回答by Denis TRUFFAUT

回答by Calimo

回答by Benj

回答by tim18

相关推荐

最近更新

标签

这里有人对英特尔 C++ 编译器和 GCC 进行过基准测试吗？

提问by James Bond

采纳答案by justin

回答by Goz

回答by Glen

回答by Gautham Ganapathy

回答by Peeter Joot

回答by Denis TRUFFAUT

回答by Calimo

回答by Benj

回答by tim18

相关推荐

C++ 第一次机会异常 - 长在内存位置？

C++ 使用指针反转字符串

C++ STL 的 list::sort() 使用哪种排序算法？

C++ 从 1 个字符转换为字符串？

相关推荐

最近更新

标签