C++ 编译/链接过程如何工作?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6264249/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 19:48:39  来源:igfitidea点击:

How does the compilation/linking process work?

c++compiler-constructionlinkerc++-faq

提问by Tony The Lion

How does the compilation and linking process work?

编译和链接过程如何工作?

(Note: This is meant to be an entry to Stack Overflow's C++ FAQ. If you want to critique the idea of providing an FAQ in this form, then the posting on meta that started all thiswould be the place to do that. Answers to that question are monitored in the C++ chatroom, where the FAQ idea started out in the first place, so your answer is very likely to get read by those who came up with the idea.)

(注意:这是Stack Overflow 的 C++ FAQ 的一个条目。如果你想批评以这种形式提供 FAQ 的想法,那么开始所有这一切的 meta 上的帖子将是这样做的地方。答案该问题在C++ 聊天室中受到监控,FAQ 的想法首先在这里开始,因此您的答案很可能会被提出该想法的人阅读。)

回答by R. Martinho Fernandes

The compilation of a C++ program involves three steps:

C++程序的编译包括三个步骤:

  1. Preprocessing: the preprocessor takes a C++ source code file and deals with the #includes, #defines and other preprocessor directives. The output of this step is a "pure" C++ file without pre-processor directives.

  2. Compilation: the compiler takes the pre-processor's output and produces an object file from it.

  3. Linking: the linker takes the object files produced by the compiler and produces either a library or an executable file.

  1. 预处理:预处理器接受一个 C++ 源代码文件并处理#includes、#defines 和其他预处理器指令。此步骤的输出是没有预处理器指令的“纯”C++ 文件。

  2. 编译:编译器获取预处理器的输出并从中生成一个目标文件。

  3. 链接:链接器获取编译器生成的目标文件并生成库或可执行文件。

Preprocessing

预处理

The preprocessor handles the preprocessor directives, like #includeand #define. It is agnostic of the syntax of C++, which is why it must be used with care.

预处理器处理预处理器指令,如#include#define。它与 C++ 的语法无关,这就是为什么必须小心使用它。

It works on one C++ source file at a time by replacing #includedirectives with the content of the respective files (which is usually just declarations), doing replacement of macros (#define), and selecting different portions of text depending of #if, #ifdefand #ifndefdirectives.

它通过用#include相应文件的内容(通常只是声明)替换指令、替换宏 ( #define) 并根据#if#ifdef#ifndef指令选择文本的不同部分,一次处理一个 C++ 源文件。

The preprocessor works on a stream of preprocessing tokens. Macro substitution is defined as replacing tokens with other tokens (the operator ##enables merging two tokens when it makes sense).

预处理器处理预处理令牌流。宏替换被定义为用其他标记替换标记(操作符##在有意义时允许合并两个标记)。

After all this, the preprocessor produces a single output that is a stream of tokens resulting from the transformations described above. It also adds some special markers that tell the compiler where each line came from so that it can use those to produce sensible error messages.

毕竟,预处理器产生一个单一的输出,它是由上述转换产生的令牌流。它还添加了一些特殊标记,告诉编译器每一行来自哪里,以便它可以使用这些标记来生成合理的错误消息。

Some errors can be produced at this stage with clever use of the #ifand #errordirectives.

在这个阶段巧妙地使用#if#error指令可能会产生一些错误。

Compilation

汇编

The compilation step is performed on each output of the preprocessor. The compiler parses the pure C++ source code (now without any preprocessor directives) and converts it into assembly code. Then invokes underlying back-end(assembler in toolchain) that assembles that code into machine code producing actual binary file in some format(ELF, COFF, a.out, ...). This object file contains the compiled code (in binary form) of the symbols defined in the input. Symbols in object files are referred to by name.

编译步骤在预处理器的每个输出上执行。编译器解析纯 C++ 源代码(现在没有任何预处理器指令)并将其转换为汇编代码。然后调用底层后端(工具链中的汇编程序)将该代码组装成机器代码,以某种格式(ELF、COFF、a.out、...)生成实际的二进制文件。该目标文件包含输入中定义的符号的编译代码(以二进制形式)。目标文件中的符号按名称引用。

Object files can refer to symbols that are not defined. This is the case when you use a declaration, and don't provide a definition for it. The compiler doesn't mind this, and will happily produce the object file as long as the source code is well-formed.

目标文件可以引用未定义的符号。当您使用声明并且不为其提供定义时就是这种情况。编译器不介意这一点,只要源代码格式正确,就会愉快地生成目标文件。

Compilers usually let you stop compilation at this point. This is very useful because with it you can compile each source code file separately. The advantage this provides is that you don't need to recompile everythingif you only change a single file.

编译器通常会让您在此时停止编译。这非常有用,因为使用它您可以单独编译每个源代码文件。这提供的优点是,如果您只更改单个文件,则无需重新编译所有内容

The produced object files can be put in special archives called static libraries, for easier reusing later on.

生成的目标文件可以放在称为静态库的特殊档案中,以便以后更容易地重用。

It's at this stage that "regular" compiler errors, like syntax errors or failed overload resolution errors, are reported.

正是在这个阶段,报告了“常规”编译器错误,例如语法错误或失败的重载解析错误。

Linking

链接

The linker is what produces the final compilation output from the object files the compiler produced. This output can be either a shared (or dynamic) library (and while the name is similar, they haven't got much in common with static libraries mentioned earlier) or an executable.

链接器是从编译器生成的目标文件生成最终编译输出的东西。此输出可以是共享(或动态)库(虽然名称相似,但它们与前面提到的静态库没有太多共同之处)或可执行文件。

It links all the object files by replacing the references to undefined symbols with the correct addresses. Each of these symbols can be defined in other object files or in libraries. If they are defined in libraries other than the standard library, you need to tell the linker about them.

它通过用正确的地址替换对未定义符号的引用来链接所有目标文件。这些符号中的每一个都可以在其他目标文件或库中定义。如果它们是在标准库以外的库中定义的,则需要将它们告知链接器。

At this stage the most common errors are missing definitions or duplicate definitions. The former means that either the definitions don't exist (i.e. they are not written), or that the object files or libraries where they reside were not given to the linker. The latter is obvious: the same symbol was defined in two different object files or libraries.

在这个阶段,最常见的错误是缺少定义或重复定义。前者意味着定义不存在(即它们没有被写入),或者它们所在的目标文件或库没有提供给链接器。后者很明显:在两个不同的目标文件或库中定义了相同的符号。

回答by user2003323

This topic is discussed at CProgramming.com:
https://www.cprogramming.com/compilingandlinking.html

此主题在 CProgramming.com 上讨论:https://www.cprogramming.com/compilingandlinking.html

Here is what the author there wrote:

这是那里的作者写道:

Compiling isn't quite the same as creating an executable file! Instead, creating an executable is a multistage process divided into two components: compilation and linking. In reality, even if a program "compiles fine" it might not actually work because of errors during the linking phase. The total process of going from source code files to an executable might better be referred to as a build.

Compilation

Compilation refers to the processing of source code files (.c, .cc, or .cpp) and the creation of an 'object' file. This step doesn't create anything the user can actually run. Instead, the compiler merely produces the machine language instructions that correspond to the source code file that was compiled. For instance, if you compile (but don't link) three separate files, you will have three object files created as output, each with the name .o or .obj (the extension will depend on your compiler). Each of these files contains a translation of your source code file into a machine language file -- but you can't run them yet! You need to turn them into executables your operating system can use. That's where the linker comes in.

Linking

Linking refers to the creation of a single executable file from multiple object files. In this step, it is common that the linker will complain about undefined functions (commonly, main itself). During compilation, if the compiler could not find the definition for a particular function, it would just assume that the function was defined in another file. If this isn't the case, there's no way the compiler would know -- it doesn't look at the contents of more than one file at a time. The linker, on the other hand, may look at multiple files and try to find references for the functions that weren't mentioned.

You might ask why there are separate compilation and linking steps. First, it's probably easier to implement things that way. The compiler does its thing, and the linker does its thing -- by keeping the functions separate, the complexity of the program is reduced. Another (more obvious) advantage is that this allows the creation of large programs without having to redo the compilation step every time a file is changed. Instead, using so called "conditional compilation", it is necessary to compile only those source files that have changed; for the rest, the object files are sufficient input for the linker. Finally, this makes it simple to implement libraries of pre-compiled code: just create object files and link them just like any other object file. (The fact that each file is compiled separately from information contained in other files, incidentally, is called the "separate compilation model".)

To get the full benefits of condition compilation, it's probably easier to get a program to help you than to try and remember which files you've changed since you last compiled. (You could, of course, just recompile every file that has a timestamp greater than the timestamp of the corresponding object file.) If you're working with an integrated development environment (IDE) it may already take care of this for you. If you're using command line tools, there's a nifty utility called make that comes with most *nix distributions. Along with conditional compilation, it has several other nice features for programming, such as allowing different compilations of your program -- for instance, if you have a version producing verbose output for debugging.

Knowing the difference between the compilation phase and the link phase can make it easier to hunt for bugs. Compiler errors are usually syntactic in nature -- a missing semicolon, an extra parenthesis. Linking errors usually have to do with missing or multiple definitions. If you get an error that a function or variable is defined multiple times from the linker, that's a good indication that the error is that two of your source code files have the same function or variable.

编译与创建可执行文件并不完全相同!相反,创建可执行文件是一个多阶段过程,分为两个部分:编译和链接。实际上,即使程序“编译正常”,由于链接阶段的错误,它实际上也可能无法正常工作。从源代码文件到可执行文件的整个过程最好称为构建。

汇编

编译是指处理源代码文件(.c、.cc 或 .cpp)和创建“目标”文件。此步骤不会创建用户实际可以运行的任何内容。相反,编译器仅生成与编译的源代码文件相对应的机器语言指令。例如,如果您编译(但不链接)三个单独的文件,您将创建三个目标文件作为输出,每个文件的名称为 .o 或 .obj(扩展名将取决于您的编译器)。这些文件中的每一个都包含您的源代码文件到机器语言文件的翻译——但您还不能运行它们!您需要将它们转换为您的操作系统可以使用的可执行文件。这就是链接器的用武之地。

链接

链接是指从多个目标文件创建单个可执行文件。在这一步中,链接器通常会抱怨未定义的函数(通常是 main 本身)。在编译期间,如果编译器找不到特定函数的定义,它只会假设该函数是在另一个文件中定义的。如果不是这种情况,编译器就不可能知道——它不会一次查看多个文件的内容。另一方面,链接器可能会查看多个文件并尝试查找未提及的函数的引用。

您可能会问为什么有单独的编译和链接步骤。首先,以这种方式实现事情可能更容易。编译器做它的事,链接器做它的事——通过保持函数的分离,程序的复杂性降低了。另一个(更明显的)优点是,这允许创建大型程序,而不必在每次更改文件时重做编译步骤。相反,使用所谓的“条件编译”,只需要编译那些已经改变的源文件;其余的,目标文件对于链接器来说是足够的输入。最后,这使得实现预编译代码库变得简单:只需创建目标文件并将它们链接起来,就像任何其他目标文件一样。

要获得条件编译的全部好处,与尝试记住自上次编译以来更改了哪些文件相比,获得一个程序来帮助您可能更容易。(当然,您可以只重新编译时间戳大于相应目标文件时间戳的每个文件。)如果您使用的是集成开发环境 (IDE),它可能已经为您解决了这个问题。如果您使用命令行工具,那么大多数 *nix 发行版都有一个名为 make 的漂亮实用程序。除了条件编译外,它还具有其他几个很好的编程特性,例如允许对程序进行不同的编译——例如,如果您有一个生成详细输出用于调试的版本。

了解编译阶段和链接阶段之间的区别可以更容易地寻找错误。编译器错误通常在本质上是句法错误——缺少分号、额外的括号。链接错误通常与缺少或多个定义有关。如果您从链接器收到一个函数或变量被多次定义的错误,这很好地表明该错误是您的两个源代码文件具有相同的函数或变量。

回答by AProgrammer

On the standard front:

在标准方面:

  • a translation unitis the combination of a source files, included headers and source files less any source lines skipped by conditional inclusion preprocessor directive.

  • the standard defines 9 phases in the translation. The first four correspond to preprocessing, the next three are the compilation, the next one is the instantiation of templates (producing instantiation units) and the last one is the linking.

  • 一个翻译单元是源文件,包括标头和源文件少任何源极线通过跳过条件包含预处理器指令的组合。

  • 该标准定义了翻译的 9 个阶段。前四个对应预处理,接下来三个是编译,下一个是模板的实例化(产生实例化单元),最后一个是链接。

In practice the eighth phase (the instantiation of templates) is often done during the compilation process but some compilers delay it to the linking phase and some spread it in the two.

在实践中,第八阶段(模板的实例化)通常在编译过程中完成,但有些编译器将其延迟到链接阶段,有些则将其分散在两个阶段。

回答by Elliptical view

The skinny is that a CPU loads data from memory addresses, stores data to memory addresses, and execute instructions sequentially out of memory addresses, with some conditional jumps in the sequence of instructions processed. Each of these three categories of instructions involves computing an address to a memory cell to be used in the machine instruction. Because machine instructions are of a variable length depending on the particular instruction involved, and because we string a variable length of them together as we build our machine code, there is a two step process involved in calculating and building any addresses.

肤浅的是,CPU 从内存地址加载数据,将数据存储到内存地址,并从内存地址顺序执行指令,并在处理的指令序列中进行一些条件跳转。这三类指令中的每一种都涉及计算要在机器指令中使用的存储单元的地址。因为机器指令的长度取决于所涉及的特定指令,并且因为我们在构建机器代码时将它们的可变长度串在一起,所以计算和构建任何地址都涉及两步过程。

First we laying out the allocation of memory as best we can before we can know what exactly goes in each cell. We figure out the bytes, or words, or whatever that form the instructions and literals and any data. We just start allocating memory and building the values that will create the program as we go, and note down anyplace we need to go back and fix an address. In that place we put a dummy to just pad the location so we can continue to calculate memory size. For example our first machine code might take one cell. The next machine code might take 3 cells, involving one machine code cell and two address cells. Now our address pointer is 4. We know what goes in the machine cell, which is the op code, but we have to wait to calculate what goes in the address cells till we know where that data will be located, i.e. what will be the machine address of that data.

首先,在我们知道每个单元格中到底发生了什么之前,我们尽可能地布置内存分配。我们计算出字节、字或任何构成指令、文字和任何数据的东西。我们只是开始分配内存并构建将在我们运行时创建程序的值,并记下我们需要返回并修复地址的任何地方。在那个地方,我们放置了一个 dummy 来填充该位置,以便我们可以继续计算内存大小。例如,我们的第一个机器代码可能需要一个单元格。下一个机器代码可能需要 3 个单元格,包括一个机器代码单元格和两个地址单元格。现在我们的地址指针是 4。我们知道机器单元中的内容,即操作码,但我们必须等待计算地址单元中的内容,直到我们知道该数据将位于何处,即

If there were just one source file a compiler could theoretically produce fully executable machine code without a linker. In a two pass process it could calculate all of the actual addresses to all of the data cells referenced by any machine load or store instructions. And it could calculate all of the absolute addresses referenced by any absolute jump instructions. This is how simpler compilers, like the one in Forth work, with no linker.

如果只有一个源文件,理论上编译器可以在没有链接器的情况下生成完全可执行的机器代码。在两次传递过程中,它可以计算任何机器加载或存储指令引用的所有数据单元的所有实际地址。它可以计算任何绝对跳转指令引用的所有绝对地址。这就是更简单的编译器的工作方式,比如 Forth 中的编译器,没有链接器。

A linker is something that allows blocks of code to be compiled separately. This can speed up the overall process of building code, and allows some flexibility with how the blocks are later used, in other words they can be relocated in memory, for example adding 1000 to every address to scoot the block up by 1000 address cells.

链接器允许单独编译代码块。这可以加快构建代码的整体过程,并允许以后如何使用块具有一定的灵活性,换句话说,它们可以在内存中重新定位,例如向每个地址添加 1000 以将块向上移动 1000 个地址单元。

So what the compiler outputs is rough machine code that is not yet fully built, but is laid out so we know the size of everything, in other words so we can start to calculate where all of the absolute addresses will be located. the compiler also outputs a list of symbols which are name/address pairs. The symbols relate a memory offset in the machine code in the module with a name. The offset being the absolute distance to the memory location of the symbol in the module.

因此,编译器输出的是尚未完全构建的粗略机器代码,但经过布局,我们可以知道所有内容的大小,换句话说,我们可以开始计算所有绝对地址的位置。编译器还输出名称/地址对的符号列表。这些符号将模块中机器代码中的内存偏移量与名称相关联。偏移量是到模块中符号内存位置的绝对距离。

That's where we get to the linker. The linker first slaps all of these blocks of machine code together end to end and notes down where each one starts. Then it calculates the addresses to be fixed by adding together the relative offset within a module and the absolute position of the module in the bigger layout.

这就是我们到达链接器的地方。链接器首先将所有这些机器代码块首尾相连并记下每个块的开始位置。然后它通过将模块内的相对偏移量和模块在更大布局中的绝对位置相加来计算要固定的地址。

Obviously I've oversimplified this so you can try to grasp it, and I have deliberately not used the jargon of object files, symbol tables, etc. which to me is part of the confusion.

很明显,我把它简化了,所以你可以试着去理解它,我故意不使用目标文件、符号表等的行话,这对我来说是混乱的一部分。

回答by kaps

GCC compiles a C/C++ program into executable in 4 steps.

GCC 将 C/C++ 程序编译为可执行程序分 4 步。

For example, gcc -o hello hello.cis carried out as follows:

例如,gcc -o hello hello.c执行如下:

1. Pre-processing

1. 预处理

Preprocessing via the GNU C Preprocessor (cpp.exe), which includes the headers (#include) and expands the macros (#define).

通过 GNU C 预处理器 ( cpp.exe) 进行预处理,其中包括头文件 ( #include) 并扩展宏 ( #define)。

cpp hello.c > hello.i
cpp hello.c > hello.i

The resultant intermediate file "hello.i" contains the expanded source code.

生成的中间文件“hello.i”包含扩展的源代码。

2. Compilation

2.编译

The compiler compiles the pre-processed source code into assembly code for a specific processor.

编译器将预处理后的源代码编译为特定处理器的汇编代码。

gcc -S hello.i
gcc -S hello.i

The -S option specifies to produce assembly code, instead of object code. The resultant assembly file is "hello.s".

-S 选项指定生成汇编代码,而不是目标代码。生成的程序集文件是“hello.s”。

3. Assembly

3. 组装

The assembler (as.exe) converts the assembly code into machine code in the object file "hello.o".

汇编程序 ( as.exe) 将汇编代码转换为目标文件“hello.o”中的机器代码。

as -o hello.o hello.s
as -o hello.o hello.s

4. Linker

4. 链接器

Finally, the linker (ld.exe) links the object code with the library code to produce an executable file "hello".

最后,链接器 ( ld.exe) 将目标代码与库代码链接以生成可执行文件“hello”。

    ld -o hello hello.o ...libraries...
    ld -o hello hello.o ...libraries...

回答by Charles Wang

Look at the URL: http://faculty.cs.niu.edu/~mcmahon/CS241/Notes/compile.html
The complete compling process of C++ is introduced clearly in this URL.

看网址:http
://faculty.cs.niu.edu/~mcmahon/CS241/Notes/compile.html 这个网址里很清楚的介绍了C++的完整编译过程。