C++ 目标文件与库文件,为什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23615282/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 00:30:35  来源:igfitidea点击:

Object files vs Library files and why?

c++

提问by Francisco Aguilera

I understand the basics of compilation. Source files compiled to object files which the linker then links into executables. These object files are comprised of source files containing definitions.

我了解编译的基础知识。源文件编译为目标文件,然后链接器将其链接为可执行文件。这些目标文件由包含定义的源文件组成。

So my questions are:

所以我的问题是:



  • Why do we have a separate implementation for a library? .a .lib, .dll...
  • I am probably mistaken, but it seems to me like .o files themselves are kind of the same thing as libraries?
  • Couldn't someone give you their .o implementations of a certain declaration (.h) and you could replace that in and have it linked to become an executable that performs the same functions, but using different operations?
  • 为什么我们有一个单独的库实现?.a .lib、.dll...
  • 我可能弄错了,但在我看来 .o 文件本身与库是一样的?
  • 难道有人不能给你他们的某个声明 (.h) 的 .o 实现,你可以替换它并将它链接成一个可执行文件,执行相同的功能,但使用不同的操作?

回答by Michael Karcher

Historically, an object file gets linked either completely or not at all into an executable (nowadays, there are exceptions like function level linkingor whole program optimizationbecoming more popular), so if one function of an object file is used, the executable receives all of them.

从历史上看,目标文件要么完全链接要么根本不链接到可执行文件中(现在,有一些例外,例如函数级链接整个程序优化变得越来越流行),因此如果使用目标文件的一个函数,可执行文件会收到所有其中。

To keep executables small and free of dead code, the standard library is split into many small object files (typically in the order of hundreds). Having hundreds of small files is very undesirable for efficiency reasons: Opening many files are inefficient, and every file has some slack (unused disk space at the end of the file). This is why object files get grouped into libraries, which is kind of like a ZIP file with no compression. At link time, the whole library is read, and all object files from that library that resolve symbols already known as unresolved when the linker started reading a library or object files needed by them are included into the output. This likely means that the whole library has to be in memory at once to recursively solve dependencies. As the amount of memory was quite limited, the linker only loads one library at a time, so a library mentioned later on the command line of the linker can not use functions from a library mentioned earlier on the command line.

为了保持可执行文件很小并且没有死代码,标准库被分成许多小的目标文件(通常是数百个)。出于效率原因,拥有数百个小文件是非常不可取的:打开许多文件效率低下,并且每个文件都有一些空闲时间(文件末尾未使用的磁盘空间)。这就是目标文件被分组到库中的原因,这有点像没有压缩的 ZIP 文件。在链接时,读取整个库,并且该库中所有解析符号的对象文件在链接器开始读取它们所需的库或对象文件时已知为未解析符号都包含在输出中。这可能意味着整个库必须立即在内存中以递归解决依赖关系。由于内存非常有限,

To improve the performance (loading a whole library takes some time, especially from slow media like floppy disks), libraries often contain an indexthat tells the linker what object files provide which symbols. Indexes are created by tools like ranlibor the library management tool (Borland's tlibhas a switch to generate the index). As soon as there is an index, libraries are definitely more efficient to link then single object files, even if all object files are in the disk cache and loading files from the disk cache are free.

为了提高性能(加载整个库需要一些时间,尤其是从软盘等慢速介质中加载),库通常包含一个索引,告诉链接器哪些目标文件提供哪些符号。索引是由类似的工具ranlib或库管理工具创建的(Borlandtlib有一个生成索引的开关)。只要有索引,库链接起来肯定比单个目标文件更有效,即使所有目标文件都在磁盘缓存中并且从磁盘缓存加载文件是免费的。

You are completely right that I can replace .oor .afiles while keeping the header files, and change what the functions do (or how they do it). This is used by the LPGL-license, which requires the author of a program that uses an LGPL-licensedlibrary to give the user the possibility to replace that library by a patched, improved or alternative implementation. Shipping the object files of the own application (possibly grouped as library files) is enough to give the user the required freedom; no need to ship the source code (like with the GPL).

您是完全正确的,我可以在保留头文件的同时替换.o.a文件,并更改函数的作用(或它们的作用方式)。这是由 使用的LPGL-license,这要求使用LGPL-licensed库的程序的作者为用户提供通过修补、改进或替代实现替换该库的可能性。传送自己的应用程序的目标文件(可能分组为库文件)足以为用户提供所需的自由;无需发送源代码(如GPL)。

If two sets of libraries (or object files) can be used successfully with the same header files, they are said to be ABI compatible, where ABI means Application Binary Interface. This is more narrow than just having two sets of libraries (or object files) accompanied by their respective headers, and guaranteeing that you can use each library if you use the headers for this specific library. This would be called API compatibility, where API means Application Program Interface. As an example of the difference, look at the following three header files:

如果两组库(或目标文件)可以与相同的头文件一起成功使用,则称它们为ABI 兼容的,其中 ABI 表示应用程序二进制接口。这比仅仅有两组库(或目标文件)及其各自的头文件更窄,并保证如果您使用该特定库的头文件,您可以使用每个库。这将称为API 兼容性,其中 API 表示应用程序接口。作为区别的例子,看下面三个头文件:

File 1:

文件 1:

typedef struct {
    int a;
    int __undocumented_member;
    int b;
} magic_data;
magic_data* calculate(int);

File 2:

文件2:

struct __tag_magic_data {
    int a;
    int __padding;
    int b;
};
typedef __tag_magic_data magic_data;
magic_data* calculate(const int);

File 3:

文件 3:

typedef struct {
    int a;
    int b;
    int c;
} magic_data;
magic_data* do_calculate(int, void*);
#define calculate(x) do_calculate(x, 0)

The first two files are not identical, but they provide exchangeable definitions that (as far as I expect) do not violate the "one definition rule", so a library providing File 1 as header file can be used as well with File 2 as a header file. On the other hand, File 3 provides a very similar interface to the programmer (which might be identical in all what the library author promises the user of the library), but code compiled with File 3 fails to link with a library designed to be used with File 1 or File 2, as the library designed for File 3 would not export calculate, but only do_calculate. Also, the structure has a different member layout, so using File 1 or File 2 instead of File 3 will not access b correctly. The libraries providing File 1 and File 2 are ABI compatible, but all three libraries are API compatible (assuming that c and the more capable function do_calculatedo not count towards that API).

前两个文件并不相同,但它们提供了不违反“单一定义规则”的可交换定义,因此提供文件 1 作为头文件的库也可以与文件 2 一起用作头文件。另一方面,文件 3 为程序员提供了一个非常相似的接口(这可能与库作者向库用户承诺的所有内容相同),但使用文件 3 编译的代码无法与设计用于使用的库链接文件 1 或文件 2,因为为文件 3 设计的库不会导出calculate,而只会导出do_calculate. 此外,该结构具有不同的成员布局,因此使用文件 1 或文件 2 而不是文件 3 将无法正确访问 b。提供文件 1 和文件 2 的库与 ABI 兼容,但所有三个库都与 API 兼容(假设 c 和功能更强大的函数do_calculate不计入该 API)。

For dynamic libraries (.dll, .so) things are completely different: They started appearing on systems where multiple (application) programs can be loaded at the same time (which is not the case on DOS, but it is the case on Windows). It is wasteful to have the same implementation of a library function in memory multiple times, so loading it only once into memory have different application use it conserves memory. For dynamic libraries, the code of the referenced function is not included in the executable file, but just a reference to the function inside a dynamic library is included (for Windows NE/PE, it is specified, which DLL has to provide which function; for Unix .so files, only the function names and a set of libraries is specified). The operating system contains a loaderaka dynamic linkerthat resolves these references and loads dynamic libraries if they are not already in memory at the time a program is started.

对于动态库(.dll、.so),情况完全不同:它们开始出现在可以同时加载多个(应用程序)程序的系统上(在 DOS 上不是这种情况,但在 Windows 上是这种情况) . 在内存中多次使用相同的库函数实现是很浪费的,因此只将它加载一次到内存中有不同的应用程序使用它可以节省内存。对于动态库,引用函数的代码不包含在可执行文件中,而只是包含对动态库内部函数的引用(对于Windows NE/PE,指定了哪个DLL必须提供哪个函数;对于 Unix .so 文件,仅指定了函数名称和一组库)。操作系统包含一个加载器,又名动态链接器如果它们在程序启动时不在内存中,则解析这些引用并加载动态库。

回答by Serge Ballesta

Ok, let's start with the beginning.

好的,让我们从头开始。

A programmer (you) creates some source files, .cppand .h. The difference between those two files is just a convention :

程序员(您)创建了一些源文件,.cpp并且.h. 这两个文件之间的区别只是一个约定:

  • .cppare meant to be compiled
  • .hare meant to be included in other source files
  • .cpp是为了编译
  • .h旨在包含在其他源文件中

but nothing (except the fear of having an unmaintanable thing) forbids you to import cppfiles into other .cppfiles.

但是没有什么(除了害怕有不可维护的东西)禁止您将cpp文件导入其他.cpp文件。

At the early time of C (the ancestor of C++) .hfile only contained declarations of functions, structures (without methods in C !) and constants. You could also have a macro (#define) but apart from that, no code should be in .h.

在 C(C++ 的祖先)的早期,.h文件只包含函数、结构(在 C 中没有方法!)和常量的声明。你也可以有一个宏 ( #define) 但除此之外,没有代码应该在.h.

In C++ with templates, you must also add in the .himplementation of template classes because as C++ uses templates and not generics like Java, each instantiation of a template is a different class.

在带有模板的 C++ 中,您还必须添加.h模板类的实现,因为 C++ 使用模板而不是像 Java 那样的泛型,模板的每个实例化都是一个不同的类。

Now with the answer to your question :

现在回答你的问题:

Each .cppfile is a compilation unit. The compiler will :

每个.cpp文件都是一个编译单元。编译器将:

  • in the preprocessor phase process, all #includeor #defineto (internally) generates a full source code
  • compiles it to object format (generally .oor .obj)
  • 在预处理器阶段过程中,all #includeor #defineto(内部)生成完整的源代码
  • 将其编译为对象格式(通常为.o.obj

This object format contains :

此对象格式包含:

  • relocatable code (that is addresses in code or variables are relativesto exported symbols)
  • exported symbols: the symbols that could be used from other compilation units (functions, classes, global variables)
  • imported symbols: the symbols used in that compilation unit and defined in other compilations units
  • 重定位代码(也就是在代码或变量地址亲属来导出符号)
  • 导出符号:可以从其他编译单元(函数、类、全局变量)使用的符号
  • 导入的符号:在该编译单元中使用并在其他编译单元中定义的符号

Then (let's forget the libraries for now) the linker will take all the compilations units together and will resolve symbols to create an executable file.

然后(让我们暂时忘记库)链接器会将所有编译单元放在一起并解析符号以创建可执行文件。

One step further with static libraries.

使用静态库更进一步。

A static library (generally .aor .lib) is more or less a bunch of object files put together. It exists to avoid to list individually every object file that you need, those from which you use the exported symbols. Linking a library containing object files you use and linking the objects files themselves is exactly the same. Simply adding -lc, -lmor -lx11is shorter them adding hundred of .ofiles. But at least on Unix-like systems, a static library is an archive and you can extract the individual object files if you want to.

静态库(通常.a.lib)或多或少是一堆放在一起的目标文件。它的存在是为了避免单独列出您需要的每个目标文件,这些目标文件是您使用导出符号的目标文件。链接包含您使用的目标文件的库和链接目标文件本身是完全相同的。简单地添加-lc-lm或者-lx11缩短它们添加数百个.o文件。但至少在类 Unix 系统上,静态库是一个存档,如果您愿意,您可以提取单个目标文件。

The dynamic libraries are completely different. A dynamic library should be seen as a special executable file. They are generally built with the same linker that creates normal executables (but with different options). But instead of simply declaring an entry point (on windows a .dllfile does declare an entry point that can be used for initializing the .dll), they declare a list of exported (and imported) symbols. At runtime, there are system calls that allow to get the addresses of those symbols and use them almost normally. But in fact, when you call a routine in a dynamic loaded library the code resides outside of what the loader initially loads from your own executable file. Generally, the operation of loading all the used symbols from a dynamic library is either at load time directly by the loader (on Unix like systems) or with import libraries on Windows.

动态库完全不同。动态库应该被视为一个特殊的可执行文件。它们通常使用创建普通可执行文件的相同链接器构建(但具有不同的选项)。但不是简单地声明一个入口点(在 Windows 上,.dll文件确实声明了一个入口点,可用于初始化.dll),它们声明了一个导出(和导入)符号列表。在运行时,系统调用允许获取这些符号的地址并几乎正常使用它们。但实际上,当您在动态加载的库中调用例程时,代码驻留在加载程序最初从您自己的可执行文件加载的内容之外。通常,从动态库加载所有使用过的符号的操作要么在加载时直接由加载器(在类 Unix 系统上)要么在 Windows 上使用导入库。

And now a look back to the include files. Neither good old K&R C nor the most recent C++ have a notion of the global module to import like for example Java or C#. In those languages, when you importa module, you get both the declarations for their exported symbols, and an indication that you will later link it. But in C++ (same in C) you have to do it separately :

现在回顾一下包含文件。好的旧 K&R C 和最新的 C++ 都没有导入全局模块的概念,例如 Java 或 C#。在这些语言中,当你导入一个模块时,你会得到它们导出符号的声明,以及你稍后将链接它的指示。但是在 C++ 中(在 C 中相同)你必须分开做:

  • first, declare the functions or classes - done by including a .hfile from your source, so that compiler knows what they are
  • next link the object module, static library or dynamic library to actually get access to the code
  • 首先,声明函数或类 - 通过包含.h源文件中的文件来完成,以便编译器知道它们是什么
  • 接下来链接目标模块、静态库或动态库以实际访问代码

回答by Peter

Object files contain definitions of functions, static variables used by those functions, and other information output by the compiler. This is in a form that can be connected by the linker (linking points where functions are called with the entry points of the function, for example).

目标文件包含函数的定义、这些函数使用的静态变量以及编译器输出的其他信息。这是一种可以由链接器连接的形式(例如,将调用函数的点与函数的入口点连接起来)。

Library files are typically packaged to contain one or more object files (and therefore all the information in them). This offers advantages that it is easier to distribute a single library than a bunch of object files (e.g. if distributing compiled objects to another developer to use in their programs) and also makes linking simpler (the linker need to be directed to access fewer files, which makes it easier to create scripts to do linking). Also, typically, there are small performance benefits for the linker - opening one large library file and interpreting its content is more efficient than opening and interpreting the content of lots of small object files, particularly if the linker needs to do multiple passes through them. There are also small advantages that, depending on how hard drives are formatted and managed that a few large files consumes less disk space than a lot of smaller ones.

库文件通常被打包以包含一个或多个目标文件(因此包含其中的所有信息)。这提供了一个优点,它比一堆目标文件更容易分发单个库(例如,如果将编译的对象分发给另一个开发人员以在他们的程序中使用)并且还使链接更简单(链接器需要被引导访问更少的文件,这使得创建脚本来进行链接变得更加容易)。此外,通常情况下,链接器的性能优势很小——打开一个大库文件并解释其内容比打开和解释大量小目标文件的内容更有效,特别是如果链接器需要多次通过它们。还有一些小优点,

It is often worth packaging object files into libraries because that is an operation that can be done once, and the benefits are realised numerous times (every time the library is used by the linker to produce the executable).

将目标文件打包到库中通常是值得的,因为这是一次可以完成的操作,而且好处可以多次实现(每次链接器使用库生成可执行文件时)。

Since humans comprehend source code better - and therefore have more chance of getting it working right - when it is in small chunks, most large projects consist of a significant number of (relatively) small source files, that get compiled to objects. Assembling object files into libraries - in one step - gives all the benefits I mentioned above, while allowing humans to manage their source code in a way that makes sense to humans rather than linkers.

由于人类更好地理解源代码——因此更有可能使其正常工作——当它是小块时,大多数大型项目由大量(相对)小的源文件组成,这些文件被编译为对象。将目标文件组装到库中 - 一步 - 提供了我上面提到的所有好处,同时允许人类以一种对人类而不是链接器有意义的方式管理他们的源代码。

That said, it is a developer choice to use libraries. The linker doesn't care, and it can take more effort to set up a library and use it than to link together lots of object files. So there is nothing stopping the developer employing a mix of object files and libraries (except for the obvious need to avoid duplication of functions and other things in multiple objects or libraries, which causes the link process to fail). It is, after all, the job of a developer to work out a strategy for managing the building and distribution of their software.

也就是说,使用库是开发人员的选择。链接器并不关心,建立一个库并使用它比将大量目标文件链接在一起需要更多的努力。所以没有什么可以阻止开发人员混合使用目标文件和库(除了明显需要避免在多个对象或库中重复函数和其他东西,这会导致链接过程失败)。毕竟,开发人员的工作是制定管理软件构建和分发的策略。

There is actually (at least) two types of library.

实际上(至少)有两种类型的库。

Statically linked libraries are used by the linker to build an executable, and compiled code from them is copied by the linker into the executable. Examples are .lib files under windows and .a files under unix. The libraries themselves (typically) do not need to be distributed separately with a program executable, because need parts are IN the executable.

链接器使用静态链接库来构建可执行文件,链接器将编译后的代码复制到可执行文件中。例如 windows 下的 .lib 文件和 unix 下的 .a 文件。库本身(通常)不需要与程序可执行文件分开分发,因为需要部分在可执行文件中。

Dynamically linked libraries are loaded into the program at run time. Two advantages are that the executable file is smaller (because it doesn't contain the content of the object files or static libraries) and that multiple executables can use every dynamically linked library (i.e. it is only necessary to distribute/install the libraries once, and all executables which use those libraries will work). Offsetting this is that installation of programs becomes more complicated (the executables will not run if the dynamically linked libraries cannot be found, so installation processes must cope with the potential need to install the libraries at least once). Another advantage is that dynamic libraries can be updated, without having to change the executable - for example, to fix a flaw in one of the functions contained in the library, and therefore fix the functioning of all programs which use that library without changing the executables. Offsetting this is that a program which relies on a recent version of a library may malfunction if only an older version of the library is found when it runs. This gives maintenance concerns with libraries (called by various names, such as DLL hell), particularly when programs rely on multiple dynamically linked libraries. Examples of dynamically linked libraries include DLLs under windows, .so files under unix. Facilities provided by operating systems are often installed - with the operating system - in the form of dynamically linked libraries, which allows all programs (when correctly built) to use the operating system services.

动态链接库在运行时加载到程序中。两个优点是可执行文件较小(因为它不包含目标文件或静态库的内容)和多个可执行文件可以使用每个动态链接库(即只需要分发/安装库一次,并且所有使用这些库的可执行文件都可以工作)。与之相对的是,程序的安装变得更加复杂(如果找不到动态链接库,则可执行文件将不会运行,因此安装过程必须至少应对可能需要安装一次库的情况)。另一个优点是可以更新动态库,而无需更改可执行文件 - 例如,修复库中包含的函数之一中的缺陷,因此,在不更改可执行文件的情况下修复使用该库的所有程序的功能。与此相反的是,如果在运行时仅发现库的旧版本,则依赖于库的最新版本的程序可能会发生故障。这给库带来了维护问题(以各种名称命名,例如 DLL 地狱),尤其是当程序依赖多个动态链接库时。动态链接库的示例包括 windows 下的 DLL、unix 下的 .so 文件。操作系统提供的工具通常与操作系统一起以动态链接库的形式安装,这允许所有程序(正确构建时)使用操作系统服务。与此相反的是,如果在运行时仅发现库的旧版本,则依赖于库的最新版本的程序可能会发生故障。这给库带来了维护问题(以各种名称命名,例如 DLL 地狱),尤其是当程序依赖多个动态链接库时。动态链接库的示例包括 windows 下的 DLL、unix 下的 .so 文件。操作系统提供的工具通常与操作系统一起以动态链接库的形式安装,这允许所有程序(正确构建时)使用操作系统服务。与此相反的是,如果在运行时仅发现库的旧版本,则依赖于库的最新版本的程序可能会发生故障。这给库带来了维护问题(以各种名称命名,例如 DLL 地狱),尤其是当程序依赖多个动态链接库时。动态链接库的示例包括 windows 下的 DLL、unix 下的 .so 文件。操作系统提供的工具通常与操作系统一起以动态链接库的形式安装,这允许所有程序(正确构建时)使用操作系统服务。这给库带来了维护问题(以各种名称命名,例如 DLL 地狱),尤其是当程序依赖多个动态链接库时。动态链接库的示例包括 windows 下的 DLL、unix 下的 .so 文件。操作系统提供的工具通常与操作系统一起以动态链接库的形式安装,这允许所有程序(正确构建时)使用操作系统服务。这给库带来了维护问题(以各种名称命名,例如 DLL 地狱),尤其是当程序依赖多个动态链接库时。动态链接库的示例包括 windows 下的 DLL、unix 下的 .so 文件。操作系统提供的工具通常与操作系统一起以动态链接库的形式安装,这允许所有程序(正确构建时)使用操作系统服务。

Programs can be developed to use a mix of static and dynamic libraries as well - again at the discretion of the developer. A static library might also be linked into the program, and take care of all the book-keeping associated with using a dynamically loaded library.

也可以开发程序以混合使用静态和动态库 - 再次由开发人员自行决定。静态库也可能链接到程序中,并负责与使用动态加载的库相关的所有簿记。

回答by milleniumbug

What you describe is how static linking works.

您所描述的是静态链接的工作原理。

Why do we have a separate implementation for a library? .a .lib, .dll...

为什么我们有一个单独的库实现?.a .lib、.dll...

.dlls are dynamically linked - the linking happens after you run the program. Depending on how you use the library, the function addresses are loaded just after you execute the program, or as late as possible.

.dlls 是动态链接的 - 链接发生在您运行程序之后。根据您使用库的方式,函数地址在您执行程序后立即加载,或者尽可能晚地加载。

.sos are the same idea, but on Linux.

.sos 是相同的想法,但在 Linux 上。

.as, traditionally used on Linux (and also in MinGW), are library archives, which behave basically like enhanced object files:

.as,传统上用于 Linux(以及 MinGW),是库档案,其行为基本上类似于增强的目标文件:

  • they are linked statically.
  • you can pack multiple object files inside single library archive.
  • the names are indexed.
  • 它们是静态链接的。
  • 您可以在单个库存档中打包多个目标文件。
  • 名称已编入索引。

.libs are used by Microsoft linker in Visual Studio.

.libs 由 Visual Studio 中的 Microsoft 链接器使用。

Couldn't someone give you their .o implementations of a certain declaration (.h) and you could replace that in and have it linked to become an executable that performs the same functions, but using different operations?

难道有人不能给你他们的某个声明 (.h) 的 .o 实现,你可以替换它并将它链接成一个可执行文件,执行相同的功能,但使用不同的操作?

Yes! With dynamic libraries, you can go even further: you can replace the library without recompiling, sometimes even without restarting the program.

是的!使用动态库,您可以更进一步:无需重新编译即可替换库,有时甚至无需重新启动程序

The practical example is Wine - they provide open-sourced and portable implementation of WinAPI.

实际的例子是 Wine - 它们提供了 WinAPI 的开源和可移植实现。