Windows malloc 替换(例如,tcmalloc)和动态 crt 链接

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/858592/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-15 12:28:31  来源:igfitidea点击:

Windows malloc replacement (e.g., tcmalloc) and dynamic crt linking

windowsdllmallocmsvcrtcrt

提问by Weidenrinde

A C++ program that uses several DLLs and QT should be equipped with a malloc replacement (like tcmalloc) for performance problems that can be verified to be caused by Windows malloc. With linux, there is no problem, but with windows, there are several approaches, and I find none of them appealing:

使用多个 DLL 和 QT 的 C++ 程序应该配备 malloc 替代品(如tcmalloc),以解决可以验证是由 Windows malloc 引起的性能问题。对于 linux,没有问题,但是对于 windows,有几种方法,我觉得它们都没有吸引力:

1. Put new malloc in lib and make sure to link it first (Other SO-question)

1.将新的malloc放入lib并确保先链接它(其他SO问题)

This has the disadvantage, that for example strdup will still use the old malloc and a free may crash the program.

这有一个缺点,例如strdup 仍将使用旧的 malloc 并且 free 可能会使程序崩溃

2. Remove malloc from the static libcrt library with lib.exe (Chrome)

2. 使用 lib.exe (Chrome) 从静态 libcrt 库中移除 malloc

This is tested/used(?) for chrome/chromium, but has the disadvantage that it just works with static linking the crt. Static linking has the problem if one system library is linked dynamically against msvcrt there may be mismatches in the heap allocation/deallocation. If I understand it correctly, tcmalloc could be linked dynamically such that there is a common heap for all self-compiled dlls (which is good).

这是针对chrome/chromium测试/使用的(?),但缺点是它只能与静态链接 crt 一起使用。如果一个系统库与 msvcrt 动态链接,则静态链接会出现问题,堆分配/释放可能不匹配。如果我理解正确,tcmalloc 可以动态链接,以便所有自编译的 dll 都有一个公共堆(这很好)。

3. Patch crt-source code (firefox)

3. 补丁crt-源代码(firefox)

Firefox's jemallocapparently patches the windows CRT source code and builds a new crt. This has again the static/dynamic linking problem above.

Firefox 的 jemalloc显然修补了 windows CRT 源代码并构建了一个新的 crt。这又是上面的静态/动态链接问题。

One could think of using this to generate a dynamic MSVCRT, but I think this is not possible, because the license forbids providing a patched MSVCRT with the same name.

可以考虑使用它来生成动态 MSVCRT,但我认为这是不可能的,因为许可证禁止提供具有相同名称的修补 MSVCRT。

4. Dynamically patching loaded CRT at run time

4. 在运行时动态修补加载的 CRT

Some commercial memory allocators can do such magic. tcmalloc can do, too, but this seems rather ugly. It had some issues, but they have been fixed. Currently, with tcmalloc it does not work under 64 bit windows.

一些商业内存分配器可以做这样的魔术。tcmalloc 也可以,但这看起来相当难看。它有一些问题,但它们已被修复。目前,使用 tcmalloc 它不能在 64 位窗口下工作。

Are there better approaches? Any comments?

有没有更好的方法?任何意见?

回答by Chris Becke

Q: A C++ program that is split accross several dlls should:

问:跨多个 dll 拆分的 C++ 程序应该:

A) replace malloc?

A) 替换 malloc?

B) ensure that allocation and de-allocation happens in the same dll module?

B) 确保分配和取消分配发生在同一个 dll 模块中?

A: The correct answer is B. A c++ application design that incorporates multiple DLLs SHOULD ensure that a mechanism exists to ensure that things that are allocated on the heap in one dll, are free'd by the same dll module.

A:正确答案是 B。一个包含多个 DLL 的 c++ 应用程序设计应该确保存在一种机制来确保在一个 dll 中的堆上分配的东西被同一个 dll 模块释放。



Why would you split a c++ program into several dlls anyway? By c++ program I mean that the objects and types you are dealing with are c++ templates, STL objects, classes etc. You CAN'T pass c++ objects accross dll boundries without either lot of very careful design and lots of compiler specific magic, or suffering from massive duplication of object code in the various dlls, and as a result an application that is extremely version sensitive. Any small change to a class definition will force a rebuild of all exe's and dll's, removing at least one of the major benefits of a dll approach to app development.

你为什么要把一个 C++ 程序分成几个 dll 呢?通过 C++ 程序,我的意思是你正在处理的对象和类型是 C++ 模板、STL 对象、类等。你不能在没有大量非常仔细的设计和大量编译器特定魔法的情况下传递 C++ 对象跨越 dll 边界,或者痛苦来自各种 dll 中大量重复的目标代码,因此应用程序对版本非常敏感。对类定义的任何小的更改都将强制重建所有 exe 和 dll,至少会消除 dll 方法对应用程序开发的主要好处之一。

Either stick to a straight C interface between app and dll's, suffer hell, or just compile the entire c++ app as one exe.

要么坚持应用程序和 dll 之间的直接 C 接口,要么受苦,要么只是将整个 C++ 应用程序编译为一个 exe。

回答by Adrian McCarthy

It's a bold claim that a C++ program "should be equipped with a malloc replacement (like tcmalloc) for performance problems...."

这是一个大胆的声明,即 C++ 程序“应该配备 malloc 替代品(如 tcmalloc)以解决性能问题......”

"[In] 6 out of 8 popular benchmarks ... [real-sized applications] replacing back the custom allocator, in which people had invested significant amounts of time and money, ... with the system-provided dumb allocator [yielded] better performance. ... The simplest custom allocators, tuned for very special situations, are the only ones that can provide gains." --Andrei Alexandrescu

“[在] 8 个流行基准测试中的 6 个...... [实际大小的应用程序] 替换掉了人们投入大量时间和金钱的自定义分配器,......使用系统提供的哑分配器 [产生]更好的性能......最简单的自定义分配器,针对非常特殊的情况进行了调整,是唯一可以提供收益的分配器。” ——安德烈·亚历山大雷斯库

Most system allocators are about as good as a general purposeallocator can be. You can do better onlyif you have a very specific allocation pattern.

大多数系统分配器与通用分配器一样好。只有当你有一个非常具体的分配模式时,你才能做得更好。

Typically, such special patterns apply only to a portion of the program, in which case, it's better to apply the custom allocator to the specific portion that can benefit than it is to globally replace the allocator.

通常,这种特殊模式仅适用于程序的一部分,在这种情况下,最好将自定义分配器应用于可以受益的特定部分,而不是全局替换分配器。

C++ provides a few ways to selectively replace the allocator. For example, you can provide an allocator to an STL container or you can override new and delete on a class by class basis. Both of these give you much better control than any hack which globally replaces the allocator.

C++ 提供了几种有选择地替换分配器的方法。例如,您可以为 STL 容器提供一个分配器,或者您可以逐个类地覆盖 new 和 delete。与全局替换分配器的任何 hack 相比,这两者都为您提供了更好的控制。

Note also that replacing malloc and free will not necessarily change the allocator used by operators new and delete. While the global new operator is typically implemented using malloc, there is no requirement that it do so. So replacing malloc may not even affect most of the allocations.

另请注意,替换 malloc 和 free 不一定会更改运算符 new 和 delete 使用的分配器。虽然 global new 运算符通常使用 malloc 实现,但并不要求这样做。因此,替换 malloc 甚至可能不会影响大多数分配。

If you're using C, chances are you can wrap or replace key malloc and free calls with your custom allocator just where it matters and leave the rest of the program to use the default allocator. (If that's not the case, you might want to consider some refactoring.)

如果您使用的是 C,您可能可以在重要的地方使用自定义分配器包装或替换 key malloc 和 free 调用,而让程序的其余部分使用默认分配器。(如果情况并非如此,您可能需要考虑进行一些重构。)

System allocators have decades of development behind them. They are stable and well-tested. They perform extremely well for general cases (in terms of raw speed, thread contention, and fragmentation). They have debugging versions for leak detection and support for tracking tools. Some even improve the security of your application by providing defenses against heap buffer overrun vulnerabilities. Chances are, the libraries you want to use have been tested only with the system allocator.

系统分配器经过数十年的发展。它们稳定且经过良好测试。它们在一般情况下表现非常好(在原始速度、线程争用和碎片方面)。他们有用于泄漏检测的调试版本和对跟踪工具的支持。有些甚至通过提供针对堆缓冲区溢出漏洞的防御来提高应用程序的安全性。很有可能,您要使用的库仅通过系统分配器进行过测试。

Most of the techniques to replace the system allocator forfeit these benefits. In some cases, they can even increase memory demand (because they can't be shared with the DLL runtime possibly used by other processes). They also tend to be extremely fragile in the face of changes in the compiler version, runtime version, and even OS version. Using a tweaked version of the runtime prevents your users from getting benefits of runtime updates from the OS vendor. Why give all that up when you can retain those benefits by applying a custom allocator just to the exceptional part of the program that can benefit from it?

大多数替换系统分配器的技术都失去了这些好处。在某些情况下,它们甚至会增加内存需求(因为它们不能与其他进程可能使用的 DLL 运行时共享)。面对编译器版本、运行时版本甚至操作系统版本的变化,它们也往往非常脆弱。使用经过调整的运行时版本会阻止您的用户从操作系统供应商处获得运行时更新的好处。当您可以通过将自定义分配器应用于可以从中受益的程序的特殊部分来保留这些好处时,为什么要放弃所有这些呢?

回答by rogerdpack

nedmalloc? also NB that smplayer uses a special patch to override malloc, which may be the direction you're headed in.

内马尔洛克?还请注意,smplayer 使用特殊补丁来覆盖 malloc,这可能是您前进的方向。

回答by sean e

Where does your premise "A C++ program that uses several DLLs and QT should be equipped with a malloc replacement" come from?

您的前提“使用多个 DLL 和 QT 的 C++ 程序应该配备 malloc 替代品”从何而来?

On Windows, if the all the dlls use the shared MSVCRT, then there is no need to replace malloc. By default, Qt builds against the shared MSVCRT dll.

在 Windows 上,如果所有 dll 都使用共享的 MSVCRT,则无需替换 malloc。默认情况下,Qt 是针对共享的 MSVCRT dll 构建的。

One will run into problems if they:

一个人会遇到问题,如果他们:

1) mix dlls that use static linking vs using the shared VCRT

1) 混合使用静态链接与使用共享 VCRT 的 dll

2) ANDalso free memory that was not allocated where it came from (ie, free memory in a statically linked dll that was allocated by the shared VCRT or vice versa).

2)并且还有未在其来源处分配的可用内存(即,由共享 VCRT 分配的静态链接 dll 中的可用内存,反之亦然)。

Note that adding your own ref counted wrapper around a resource can help mitigate that problems associated with resources that need to be deallocated in particular ways (ie, a wrapper that disposes of one type of resource via a call back to the originating dll, a different wrapper for a resource that originates from another dll, etc).

请注意,在资源周围添加您自己的引用计数包装器可以帮助缓解与需要以特定方式解除分配的资源相关的问题(即,通过回调原始 dll 处理一种类型资源的包装器,不同的来自另一个 dll 等的资源的包装器)。