为什么 C++ 编译需要这么长时间?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/318398/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 14:36:29  来源:igfitidea点击:

Why does C++ compilation take so long?

c++performancecompilation

提问by Dan Goldstein

Compiling a C++ file takes a very long time when compared to C# and Java. It takes significantly longer to compile a C++ file than it would to run a normal size Python script. I'm currently using VC++ but it's the same with any compiler. Why is this?

与 C# 和 Java 相比,编译 C++ 文件需要很长时间。与运行正常大小的 Python 脚本相比,编译 C++ 文件所需的时间要长得多。我目前正在使用 VC++,但它与任何编译器都是一样的。为什么是这样?

The two reasons I could think of were loading header files and running the preprocessor, but that doesn't seem like it should explain why it takes so long.

我能想到的两个原因是加载头文件和运行预处理器,但这似乎不能解释为什么需要这么长时间。

回答by jalf

Several reasons

几个原因

Header files

头文件

Every single compilation unit requires hundreds or even thousands of headers to be (1) loaded and (2) compiled. Every one of them typically has to be recompiled for every compilation unit, because the preprocessor ensures that the result of compiling a header mightvary between every compilation unit. (A macro may be defined in one compilation unit which changes the content of the header).

每个单独的编译单元都需要 (1) 加载和 (2) 编译数百甚至数千个头文件。它们中的每一个通常都必须为每个编译单元重新编译,因为预处理器确保编译头文件的结果可能因每个编译单元而异。(可以在一个编译单元中定义一个宏来更改头的内容)。

This is probably themain reason, as it requires huge amounts of code to be compiled for every compilation unit, and additionally, every header has to be compiled multiple times (once for every compilation unit that includes it).

这可能主要原因,因为它需要为每个编译单元编译大量代码,此外,每个头文件都必须多次编译(对于包含它的每个编译单元一次)。

Linking

链接

Once compiled, all the object files have to be linked together. This is basically a monolithic process that can't very well be parallelized, and has to process your entire project.

编译后,所有目标文件都必须链接在一起。这基本上是一个无法很好地并行化的整体流程,并且必须处理您的整个项目。

Parsing

解析

The syntax is extremely complicated to parse, depends heavily on context, and is very hard to disambiguate. This takes a lot of time.

语法解析极其复杂,严重依赖于上下文,并且很难消除歧义。这需要很多时间。

Templates

模板

In C#, List<T>is the only type that is compiled, no matter how many instantiations of List you have in your program. In C++, vector<int>is a completely separate type from vector<float>, and each one will have to be compiled separately.

在 C# 中,List<T>无论程序中有多少个 List 实例,它都是唯一被编译的类型。在 C++ 中,vector<int>是与 完全独立的类型vector<float>,每个类型都必须单独编译。

Add to this that templates make up a full Turing-complete "sub-language" that the compiler has to interpret, and this can become ridiculously complicated. Even relatively simple template metaprogramming code can define recursive templates that create dozens and dozens of template instantiations. Templates may also result in extremely complex types, with ridiculously long names, adding a lot of extra work to the linker. (It has to compare a lot of symbol names, and if these names can grow into many thousand characters, that can become fairly expensive).

除此之外,模板构成了编译器必须解释的完整图灵完备“子语言”,这可能变得非常复杂。即使是相对简单的模板元编程代码也可以定义递归模板,这些模板创建数十个模板实例。模板也可能导致极其复杂的类型,名称长得可笑,给链接器增加了很多额外的工作。(它必须比较很多符号名称,如果这些名称可以增长到数千个字符,那可能会变得相当昂贵)。

And of course, they exacerbate the problems with header files, because templates generally have to be defined in headers, which means far more code has to be parsed and compiled for every compilation unit. In plain C code, a header typically only contains forward declarations, but very little actual code. In C++, it is not uncommon for almost all the code to reside in header files.

当然,它们加剧了头文件的问题,因为模板通常必须在头文件中定义,这意味着必须为每个编译单元解析和编译更多的代码。在普通的 C 代码中,标头通常只包含前向声明,但很少包含实际代码。在 C++ 中,几乎所有代码都驻留在头文件中的情况并不少见。

Optimization

优化

C++ allows for some very dramatic optimizations. C# or Java don't allow classes to be completely eliminated (they have to be there for reflection purposes), but even a simple C++ template metaprogram can easily generate dozens or hundreds of classes, all of which are inlined and eliminated again in the optimization phase.

C++ 允许一些非常显着的优化。C# 或 Java 不允许完全消除类(它们必须存在用于反射目的),但即使是一个简单的 C++ 模板元程序也可以轻松生成数十或数百个类,所有这些类都在优化中被内联并再次消除阶段。

Moreover, a C++ program must be fully optimized by the compiler. A C# program can rely on the JIT compiler to perform additional optimizations at load-time, C++ doesn't get any such "second chances". What the compiler generates is as optimized as it's going to get.

此外,C++ 程序必须由编译器完全优化。AC# 程序可以依靠 JIT 编译器在加载时执行额外的优化,C++ 没有任何这样的“第二次机会”。编译器生成的内容将得到优化。

Machine

机器

C++ is compiled to machine code which may be somewhat more complicated than the bytecode Java or .NET use (especially in the case of x86). (This is mentioned out of completeness only because it was mentioned in comments and such. In practice, this step is unlikely to take more than a tiny fraction of the total compilation time).

C++ 被编译为机器代码,这可能比 Java 或 .NET 使用的字节码更复杂(尤其是在 x86 的情况下)。(这只是出于完整性的考虑才被提及,因为它在评论等中被提及。实际上,这一步不太可能占用总编译时间的一小部分)。

Conclusion

结论

Most of these factors are shared by C code, which actually compiles fairly efficiently. The parsing step is a lot more complicated in C++, and can take up significantly more time, but the main offender is probably templates. They're useful, and make C++ a far more powerful language, but they also take their toll in terms of compilation speed.

大多数这些因素由 C 代码共享,实际上编译效率相当高。解析步骤在 C++ 中要复杂得多,并且会占用更多的时间,但主要的违规者可能是模板。它们很有用,并使 C++ 成为一种功能更强大的语言,但它们也会对编译速度造成影响。

回答by tangentstorm

The slowdown is not necessarily the same with any compiler.

任何编译器的减速都不一定相同。

I haven't used Delphi or Kylix but back in the MS-DOS days, a Turbo Pascal program would compile almost instantaneously, while the equivalent Turbo C++ program would just crawl.

我没有使用过 Delphi 或 Kylix,但在 MS-DOS 时代,Turbo Pascal 程序几乎可以立即编译,而等效的 Turbo C++ 程序只会爬行。

The two main differences were a very strong module system and a syntax that allowed single-pass compilation.

两个主要区别是非常强大的模块系统和允许单遍编译的语法。

It's certainly possible that compilation speed just hasn't been a priority for C++ compiler developers, but there are also some inherent complications in the C/C++ syntax that make it more difficult to process. (I'm not an expert on C, but Walter Bright is, and after building various commercial C/C++ compilers, he created the D language. One of his changeswas to enforce a context-free grammar to make the language easier to parse.)

编译速度当然可能不是 C++ 编译器开发人员的优先考虑事项,但 C/C++ 语法中也存在一些固有的复杂性,使其更难以处理。(我不是 C 方面的专家,但 Walter Bright 是,在构建了各种商业 C/C++ 编译器之后,他创建了 D 语言。他的一个变化是强制执行上下文无关语法,使语言更容易解析.)

Also, you'll notice that generally Makefiles are set up so that every file is compiled separately in C, so if 10 source files all use the same include file, that include file is processed 10 times.

此外,您会注意到,通常 Makefile 设置为每个文件都在 C 中单独编译,因此如果 10 个源文件都使用相同的包含文件,则该包含文件将被处理 10 次。

回答by James Curran

Parsing and code generation are actually rather fast. The real problem is opening and closing files. Remember, even with include guards, the compiler still have open the .H file, and read each line (and then ignore it).

解析和代码生成实际上相当快。真正的问题是打开和关闭文件。请记住,即使有包含保护,编译器仍然会打开 .H 文件,并读取每一行(然后忽略它)。

A friend once (while bored at work), took his company's application and put everything -- all source and header files-- into one big file. Compile time dropped from 3 hours to 7 minutes.

一位朋友(在工作中感到无聊时)拿了他公司的应用程序并将所有内容——所有源文件和头文件——放入一个大文件中。编译时间从 3 小时缩短到 7 分钟。

回答by Alan

C++ is compiled into machine code. So you have the pre-processor, the compiler, the optimizer, and finally the assembler, all of which have to run.

C++被编译成机器码。所以你有预处理器、编译器、优化器,最后是汇编器,所有这些都必须运行。

Java and C# are compiled into byte-code/IL, and the Java virtual machine/.NET Framework execute (or JIT compile into machine code) prior to execution.

Java和C#被编译成字节码/IL,Java虚拟机/.NET框架在执行前先执行(或JIT编译成机器码)。

Python is an interpreted language that is also compiled into byte-code.

Python 是一种解释性语言,它也被编译为字节码。

I'm sure there are other reasons for this as well, but in general, not having to compile to native machine language saves time.

我相信这也有其他原因,但总的来说,不必编译为本地机器语言可以节省时间。

回答by Dave Ray

Another reason is the use of the C pre-processor for locating declarations. Even with header guards, .h still have to be parsed over and over, every time they're included. Some compilers support pre-compiled headers that can help with this, but they are not always used.

另一个原因是使用 C 预处理器来定位声明。即使有标题守卫, .h 仍然必须一遍又一遍地解析,每次包含它们时。一些编译器支持可以帮助解决此问题的预编译头文件,但并不总是使用它们。

See also: C++ Frequently Questioned Answers

另请参阅:C++ 常见问题解答

回答by Marco van de Voort

The biggest issues are:

最大的问题是:

1) The infinite header reparsing. Already mentioned. Mitigations (like #pragma once) usually only work per compilation unit, not per build.

1) 无限头解析。已经提到了。缓解措施(如#pragma once)通常只适用于每个编译单元,而不是每个构建。

2) The fact that the toolchain is often separated into multiple binaries (make, preprocessor, compiler, assembler, archiver, impdef, linker, and dlltool in extreme cases) that all have to reinitialize and reload all state all the time for each invocation (compiler, assembler) or every couple of files (archiver, linker, and dlltool).

2) 工具链通常被分成多个二进制文件(make、预处理器、编译器、汇编器、归档器、impdef、链接器和 dlltool 在极端情况下),它们都必须为每次调用重新初始化和重新加载所有状态(编译器、汇编器)或每对文件(归档器、链接器和 dlltool)。

See also this discussion on comp.compilers: http://compilers.iecc.com/comparch/article/03-11-078specially this one:

另见关于 comp.compilers 的讨论:http: //compilers.iecc.com/comparch/article/03-11-078特别是这个:

http://compilers.iecc.com/comparch/article/02-07-128

http://compilers.iecc.com/comparch/article/02-07-128

Note that John, the moderator of comp.compilers seems to agree, and that this means it should be possible to achieve similar speeds for C too, if one integrates the toolchain fully and implements precompiled headers. Many commercial C compilers do this to some degree.

请注意,comp.compilers 的主持人 John 似乎同意,这意味着如果完全集成工具链并实现预编译头文件,C 也应该可以达到类似的速度。许多商业 C 编译器都在某种程度上做到了这一点。

Note that the Unix model of factoring everything out to a separate binary is a kind of the worst case model for Windows (with its slow process creation). It is very noticable when comparing GCC build times between Windows and *nix, especially if the make/configure system also calls some programs just to obtain information.

请注意,将所有内容分解为单独二进制文件的 Unix 模型是 Windows 的一种最坏情况模型(其进程创建速度较慢)。在比较 Windows 和 *nix 之间的 GCC 构建时间时非常值得注意,特别是如果 make/configure 系统也调用一些程序只是为了获取信息。

回答by Ravindra Acharya

Building C/C++: what really happens and why does it take so long

构建 C/C++:真正发生了什么以及为什么需要这么长时间

A relatively large portion of software development time is not spent on writing, running, debugging or even designing code, but waiting for it to finish compiling. In order to make things fast, we first have to understand what is happening when C/C++ software is compiled. The steps are roughly as follows:

相当大一部分软件开发时间不是花在编写、运行、调试甚至设计代码上,而是等待它完成编译。为了加快速度,我们首先必须了解编译 C/C++ 软件时发生了什么。步骤大致如下:

  • Configuration
  • Build tool startup
  • Dependency checking
  • Compilation
  • Linking
  • 配置
  • 构建工具启动
  • 依赖性检查
  • 汇编
  • 链接

We will now look at each step in more detail focusing on how they can be made faster.

我们现在将更详细地研究每个步骤,重点是如何使它们更快。

Configuration

配置

This is the first step when starting to build. Usually means running a configure script or CMake, Gyp, SCons or some other tool. This can take anything from one second to several minutes for very large Autotools-based configure scripts.

这是开始构建的第一步。通常意味着运行配置脚本或 CMake、Gyp、SCons 或其他一些工具。对于非常大的基于 Autotools 的配置脚本,这可能需要一秒到几分钟的时间。

This step happens relatively rarely. It only needs to be run when changing configurations or changing the build configuration. Short of changing build systems, there is not much to be done to make this step faster.

这一步发生得相对较少。它只需要在更改配置或更改构建配置时运行。除了更改构建系统之外,没有太多工作要做来加快这一步。

Build tool startup

构建工具启动

This is what happens when you run make or click on the build icon on an IDE (which is usually an alias for make). The build tool binary starts and reads its configuration files as well as the build configuration, which are usually the same thing.

当您运行 make 或单击 IDE 上的构建图标(通常是 make 的别名)时,就会发生这种情况。构建工具二进制文件启动并读取其配置文件以及构建配置,这通常是同一件事。

Depending on build complexity and size, this can take anywhere from a fraction of a second to several seconds. By itself this would not be so bad. Unfortunately most make-based build systems cause make to be invocated tens to hundreds of times for every single build. Usually this is caused by recursive use of make (which is bad).

根据构建的复杂性和大小,这可能需要几分之一秒到几秒的时间。就其本身而言,这不会那么糟糕。不幸的是,大多数基于 make 的构建系统导致每次构建都会调用 make 数十到数百次。通常这是由递归使用 make 引起的(这是不好的)。

It should be noted that the reason Make is so slow is not an implementation bug. The syntax of Makefiles has some quirks that make a really fast implementation all but impossible. This problem is even more noticeable when combined with the next step.

需要注意的是,Make如此缓慢的原因并不是实现错误。Makefiles 的语法有一些怪癖,这使得真正快速的实现几乎是不可能的。当结合下一步时,这个问题更加明显。

Dependency checking

依赖性检查

Once the build tool has read its configuration, it has to determine what files have changed and which ones need to be recompiled. The configuration files contain a directed acyclic graph describing the build dependencies. This graph is usually built during the configure step. Build tool startup time and the dependency scanner are run on every single build. Their combined runtime determines the lower bound on the edit-compile-debug cycle. For small projects this time is usually a few seconds or so. This is tolerable. There are alternatives to Make. The fastest of them is Ninja, which was built by Google engineers for Chromium. If you are using CMake or Gyp to build, just switch to their Ninja backends. You don't have to change anything in the build files themselves, just enjoy the speed boost. Ninja is not packaged on most distributions, though, so you might have to install it yourself.

一旦构建工具读取了它的配置,它就必须确定哪些文件发生了变化,哪些需要重新编译。配置文件包含描述构建依赖项的有向无环图。该图通常在配置步骤期间构建。每次构建时都会运行构建工具启动时间和依赖项扫描器。它们的组合运行时间决定了编辑-编译-调试循环的下限。对于小型项目,这个时间通常是几秒钟左右。这是可以容忍的。Make 有其他选择。其中最快的是 Ninja,它是由 Google 工程师为 Chromium 构建的。如果您使用 CMake 或 Gyp 进行构建,只需切换到它们的 Ninja 后端即可。您不必更改构建文件本身中的任何内容,只需享受速度提升即可。但是,大多数发行版都没有打包 Ninja,

Compilation

汇编

At this point we finally invoke the compiler. Cutting some corners, here are the approximate steps taken.

此时我们终于调用了编译器。削减一些角落,这是所采取的大致步骤。

  • Merging includes
  • Parsing the code
  • Code generation/optimization
  • 合并包括
  • 解析代码
  • 代码生成/优化

Contrary to popular belief, compiling C++ is not actually all that slow. The STL is slow and most build tools used to compile C++ are slow. However there are faster tools and ways to mitigate the slow parts of the language.

与流行的看法相反,编译 C++ 实际上并不是那么慢。STL 很慢,大多数用于编译 C++ 的构建工具都很慢。然而,有更快的工具和方法可以缓解语言的缓慢部分。

Using them takes a bit of elbow grease, but the benefits are undeniable. Faster build times lead to happier developers, more agility and, eventually, better code.

使用它们需要一些肘部润滑脂,但好处是不可否认的。更快的构建时间会让开发人员更快乐、更敏捷,最终得到更好的代码。

回答by Andy Brice

A compiled language is always going to require a bigger initial overhead than an interpreted language. In addition, perhaps you didn't structure your C++ code very well. For example:

编译语言总是需要比解释语言更大的初始开销。此外,也许您没有很好地构建 C++ 代码。例如:

#include "BigClass.h"

class SmallClass
{
   BigClass m_bigClass;
}

Compiles a lot slower than:

编译比以下慢很多:

class BigClass;

class SmallClass
{
   BigClass* m_bigClass;
}

回答by rileyberton

An easy way to reduce compilation time in larger C++ projects is to make a *.cpp include file that includes all the cpp files in your project and compile that. This reduces the header explosion problem to once. The advantage of this is that compilation errors will still reference the correct file.

在较大的 C++ 项目中减少编译时间的一个简单方法是制作一个 *.cpp 包含文件,其中包含项目中的所有 cpp 文件并编译它。这将标题爆炸问题减少到一次。这样做的好处是编译错误仍然会引用正确的文件。

For example, assume you have a.cpp, b.cpp and c.cpp.. create a file: everything.cpp:

例如,假设您有 a.cpp、b.cpp 和 c.cpp.. 创建一个文件:everything.cpp:

#include "a.cpp"
#include "b.cpp"
#include "c.cpp"

Then compile the project by just making everything.cpp

然后通过制作everything.cpp来编译项目

回答by Nemanja Trifunovic

Some reasons are:

一些原因是:

1) C++ grammar is more complex than C# or Java and takes more time to parse.

1) C++ 语法比 C# 或 Java 更复杂,解析需要更多时间。

2) (More important) C++ compiler produces machine code and does all optimizations during compilation. C# and Java go just half way and leave these steps to JIT.

2)(更重要)C++ 编译器生成机器代码并在编译期间进行所有优化。C# 和 Java 只走了一半,将这些步骤留给 JIT。