Linux C++内存分配机制性能对比（tcmalloc vs. jemalloc）

Question

提问by Shayan Pooya

I have an application which allocates lots of memory and I am considering using a better memory allocation mechanism than malloc.

我有一个分配大量内存的应用程序，我正在考虑使用比 malloc 更好的内存分配机制。

My main options are: jemalloc and tcmalloc. Is there any benefits in using any of them over the other?

我的主要选择是：jemalloc 和 tcmalloc。使用它们中的任何一个比另一个有什么好处吗？

There is a good comparison between some mechanisms (including the author's proprietary mechanism -- lockless) in http://locklessinc.com/benchmarks.shtmland it mentions some pros and cons of each of them.

http://locklessinc.com/benchmarks.shtml 中对一些机制（包括作者的专有机制——lockless）进行了很好的比较，并提到了每个机制的优缺点。

Given that both of the mechanisms are active and constantly improving. Does anyone have any insight or experience about the relative performance of these two?

鉴于这两种机制都处于活跃状态并不断改进。有没有人对这两者的相对表现有任何见解或经验？

Answer 1

采纳答案by Matthieu M.

If I remember correctly, the main difference was with multi-threaded projects.

如果我没记错的话，主要区别在于多线程项目。

Both libraries try to de-contention memory acquire by having threads pick the memory from different caches, but they have different strategies:

两个库都试图通过让线程从不同的缓存中选择内存来消除内存获取，但它们有不同的策略：

jemalloc(used by Facebook) maintains a cache per thread
tcmalloc(from Google) maintains a pool of caches, and threads develop a "natural" affinity for a cache, but may change

jemalloc（由 Facebook 使用）为每个线程维护一个缓存
tcmalloc（来自谷歌）维护一个缓存池，线程对缓存产生“自然”的亲和力，但可能会改变

This led, once again if I remember correctly, to an important difference in term of thread management.

如果我没记错的话，这再次导致线程管理方面的重要差异。

jemallocis faster if threads are static, for example using pools
tcmallocis faster when threads are created/destructed

jemalloc如果线程是静态的，则更快，例如使用池
tcmalloc创建/销毁线程时速度更快

There is also the problem that since jemallocspin new caches to accommodate new thread ids, having a sudden spike of threads will leave you with (mostly) empty caches in the subsequent calm phase.

还有一个问题是，由于jemalloc旋转新的缓存以适应新的线程 id，线程突然激增将使您在随后的平静阶段（大部分）拥有空缓存。

As a result, I would recommend tcmallocin the general case, and reserve jemallocfor very specific usages (low variation on the number of threads during the lifetime of the application).

因此，我会tcmalloc在一般情况下推荐，并保留jemalloc用于非常特定的用途（在应用程序的生命周期内线程数的变化很小）。

Answer 2

回答by SunfiShie

There's a pretty good discussion about allocators here:

这里有一个关于分配器的很好的讨论：

http://www.reddit.com/r/programming/comments/7o8d9/tcmalloca_faster_malloc_than_glibcs_open_sourced/

Answer 3

回答by Martin

Your post do not mention threading, but before considering mixing C and C++ allocation methods, I would investigate the concept of memory pool.BOOST has a good one.

你的帖子没有提到线程，但在考虑混合使用 C 和 C++ 分配方法之前，我会调查内存池的概念。BOOST 有一个很好的概念。

Answer 4

回答by Basile Starynkevitch

You could also consider using Boehm conservative garbage collector. Basically, you replace every mallocin your source code with GC_malloc(etc...), and you don't bother calling free. Boehm's GC don't allocate memory more quickly than malloc (it is about the same, or can be 30% slower), but it has the advantage to deal with useless memory zones automatically, which might improve your program (and certainly eases coding, since you don't care any more about free). And Boehm's GC can also be usedas a C++ allocator.

您还可以考虑使用Boehm 保守垃圾收集器。基本上，您将malloc源代码中的each 替换为GC_malloc(etc...)，并且您不必费心调用free. Boehm 的 GC 不会比 malloc 更快地分配内存（大致相同，或者可能慢 30%），但它具有自动处理无用内存区域的优势，这可能会改进您的程序（当然也可以简化编码，因为你不再关心免费了）。并且 Boehm 的 GC 也可以用作C++ 分配器。

If you really think that mallocis too slow (but you should benchmark; most malloc-s take less than microsecond), and if you fully understand the allocating behavior of your program, you might replace some malloc-s with your special allocator (which could, for instance, get memory from the kernel in big chunks using mmapand manage memory by yourself). But I believe doing that is a pain. In C++ you have the allocatorconcept and std::allocator_traits, with most standard containerstemplates accepting such an allocator (see also std::allocator), e.g. the optional second template argument to std::vector, etc...

如果您真的认为这malloc太慢了（但您应该进行基准测试；大多数malloc-s 花费的时间不到微秒），并且如果您完全了解程序的分配行为，您可以用您的特殊分配器替换一些 malloc-s（它可以，例如，使用大块从内核中获取内存mmap并自行管理内存）。但我相信这样做是一种痛苦。在 C++ 中，您有分配器概念和std::allocator_traits，大多数标准容器模板都接受这样的分配器（另请参阅std::allocator），例如可选的第二个模板参数std::vector，等等...

As others suggested, if you believe mallocis a bottleneck, you could allocate data in chunks (or using arenas), or just in an array.

正如其他人所建议的，如果您认为这malloc是一个瓶颈，您可以分块（或使用 arenas）或仅在数组中分配数据。

Sometimes, implementing a specialized copying garbage collector(for some of your data) could help. Consider perhaps MPS.

有时，实现一个专门的复制垃圾收集器（对于您的某些数据）可能会有所帮助。也许考虑MPS。

But don't forget that premature optimization is eviland please benchmark & profile your application to understand exactly where time is lost.

但是不要忘记过早优化是邪恶的，请对您的应用程序进行基准测试和分析，以准确了解时间损失的地方。

Answer 5

回答by Alexey

I have recently considered tcmalloc for a project at work. This is what I observed:

我最近考虑将 tcmalloc 用于工作中的项目。这是我观察到的：

Greatly improved performance for heavy usage of malloc in a multithreaded setting. I used it with a tool at work and the performance improved almost twofold. The reason is that in this tool there were a few threads performing allocations of small objects in a critical loop. Using glibc, the performance suffers because of, I think, lock contentions between malloc/free calls in different threads.
Unfortunately, tcmalloc increases the memory footprint. The tool I mentioned above would consume two or three times more memory (as measured by the maximum resident set size). The increased footprint is a no go for us since we are actually looking for ways to reduce memory footprint.

大大提高了在多线程设置中大量使用 malloc 的性能。我将它与工作中的工具一起使用，性能提高了近两倍。原因是在这个工具中，有几个线程在关键循环中执行小对象的分配。使用 glibc，性能会受到影响，因为我认为不同线程中 malloc/free 调用之间的锁争用。
不幸的是，tcmalloc 增加了内存占用。我上面提到的工具会消耗两到三倍的内存（以最大驻留集大小衡量）。增加的占用空间对我们来说是行不通的，因为我们实际上正在寻找减少内存占用的方法。

In the end I have decided not to use tcmalloc and instead optimize the application code directly: this means removing the allocations from the inner loops to avoid the malloc/free lock contentions. (For the curious, using a form of compression rather than using memory pools.)

最后我决定不使用 tcmalloc 而是直接优化应用程序代码：这意味着从内部循环中删除分配以避免 malloc/free 锁争用。（出于好奇，使用一种压缩形式而不是使用内存池。）

The lesson for you would be that you should carefully measure your application with typical workloads. If you can afford the additional memory usage, tcmalloc could be great for you. If not, tcmalloc is still useful to see what you would gain by avoiding the frequent calls to memory allocation across threads.

给您的教训是，您应该使用典型的工作负载仔细衡量您的应用程序。如果您负担得起额外的内存使用量，tcmalloc 可能对您很有用。如果没有，tcmalloc 仍然可以帮助您了解避免频繁调用跨线程的内存分配会获得什么。

Answer 6

回答by rogerdpack

Be aware that according to the 'nedmalloc' homepage, modern OS's allocators are actually pretty fast now:

请注意，根据“nedmalloc”主页，现代操作系统的分配器现在实际上非常快：

"Windows 7, Linux 3.x, FreeBSD 8, Mac OS X 10.6 all contain state-of-the-art allocators and no third party allocator is likely to significantly improve on them in real world results"

“Windows 7、Linux 3.x、FreeBSD 8、Mac OS X 10.6 都包含最先进的分配器，在现实世界中，没有第三方分配器可能会显着改进它们”

http://www.nedprod.com/programs/portable/nedmalloc

So you might be able to get away with just recommending your users upgrade or something like it :)

因此，您可能只需推荐您的用户升级或类似的东西就可以逃脱:)

Linux C++内存分配机制性能对比（tcmalloc vs. jemalloc）

提问by Shayan Pooya

采纳答案by Matthieu M.

回答by SunfiShie

回答by Martin

回答by Basile Starynkevitch

回答by Alexey

回答by rogerdpack

相关推荐

最近更新

标签

Linux C++内存分配机制性能对比（tcmalloc vs. jemalloc）

提问by Shayan Pooya

采纳答案by Matthieu M.

回答by SunfiShie

回答by Martin

回答by Basile Starynkevitch

回答by Alexey

回答by rogerdpack

相关推荐

Linux SONAR - 使用 Cobertura 测量代码覆盖率

在 Linux 中创建和修改 VDI 磁盘映像的内容

Linux pass stdout as file name for command line util?

Linux 替代 Sequel Pro？（基于 GUI 的 sql 导航器）

相关推荐

最近更新

标签