.net 为什么是大对象堆,我们为什么要关心?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8951836/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why Large Object Heap and why do we care?
提问by Manish Basantani
I have read about Generations and Large object heap. But I still fail to understand what is the significance (or benefit) of having Large object heap?
我已经阅读了有关代和大对象堆的信息。但是我仍然不明白拥有大对象堆的意义(或好处)是什么?
What could have went wrong (in terms of performance or memory) if CLR would have just relied on Generation 2 (Considering that threshold for Gen0 and Gen1 is small to handle Large objects) for storing large objects?
如果 CLR 仅依靠第 2 代(考虑到 Gen0 和 Gen1 的阈值很小,无法处理大对象)来存储大对象,会出现什么问题(在性能或内存方面)?
回答by Hans Passant
A garbage collection doesn't just get rid of unreferenced objects, it also compactsthe heap. That's a very important optimization. It doesn't just make memory usage more efficient (no unused holes), it makes the CPU cache much more efficient. The cache is a really big deal on modern processors, they are an easy order of magnitude faster than the memory bus.
垃圾回收不仅会清除未引用的对象,还会压缩堆。这是一个非常重要的优化。它不仅提高了内存使用效率(没有未使用的漏洞),还提高了 CPU 缓存的效率。缓存在现代处理器上非常重要,它们比内存总线快一个数量级。
Compacting is done simply by copying bytes. That however takes time. The larger the object, the more likely that the cost of copying it outweighs the possible CPU cache usage improvements.
压缩只是通过复制字节来完成的。然而这需要时间。对象越大,复制它的成本就越有可能超过可能的 CPU 缓存使用改进。
So they ran a bunch of benchmarks to determine the break-even point. And arrived at 85,000 bytes as the cutoff point where copying no longer improves perf. With a special exception for arrays of double, they are considered 'large' when the array has more than 1000 elements. That's another optimization for 32-bit code, the large object heap allocator has the special property that it allocates memory at addresses that are aligned to 8, unlike the regular generational allocator that only allocates aligned to 4. That alignment is a big deal for double, reading or writing a mis-aligned double is very expensive. Oddly the sparse Microsoft info never mention arrays of long, not sure what's up with that.
因此,他们运行了一系列基准测试来确定盈亏平衡点。并且到达 85,000 字节作为复制不再提高性能的截止点。除了 double 数组的一个特殊例外,当数组超过 1000 个元素时,它们被认为是“大”的。这是对 32 位代码的另一种优化,大对象堆分配器有一个特殊的属性,它在对齐到 8 的地址上分配内存,与只分配对齐到 4 的常规分代分配器不同。这种对齐对 double 来说很重要,读取或写入未对齐的双精度非常昂贵。奇怪的是,稀疏的微软信息从来没有提到过长数组,不知道这是怎么回事。
Fwiw, there's lots of programmer angst about the large object heap not getting compacted. This invariably gets triggered when they write programs that consume more than half of the entire available address space. Followed by using a tool like a memory profiler to find out why the program bombed even though there was still lots of unused virtual memory available. Such a tool shows the holes in the LOH, unused chunks of memory where previously a large object lived but got garbage collected. Such is the inevitable price of the LOH, the hole can only be re-used by an allocation for an object that's equal or smaller in size. The real problem is assuming that a program should be allowed to consume allvirtual memory at any time.
Fwiw,有很多程序员担心大对象堆没有被压缩。当他们编写的程序占用了整个可用地址空间的一半以上时,这总是会被触发。随后使用内存分析器之类的工具找出程序爆炸的原因,即使仍有大量未使用的虚拟内存可用。这样的工具显示了 LOH 中的漏洞,未使用的内存块,以前一个大对象存在但被垃圾收集。这是 LOH 的不可避免的代价,空洞只能由大小相等或更小的对象的分配来重用。真正的问题是假设应该允许程序在任何时候消耗所有虚拟内存。
A problem that otherwise disappears completely by just running the code on a 64-bit operating system. A 64-bit process has 8 terabytesof virtual memory address space available, 3 orders of magnitude more than a 32-bit process. You just can't run out of holes.
只需在 64 位操作系统上运行代码即可完全消失的问题。64 位进程有8 TB的可用虚拟内存地址空间,比 32 位进程多 3 个数量级。你只是不能用完洞。
Long story short, the LOH makes code run more efficient. At the cost of using available virtual memory address space less efficient.
长话短说,LOH 使代码运行更高效。代价是使用可用的虚拟内存地址空间效率较低。
UPDATE, .NET 4.5.1 now supports compacting the LOH, GCSettings.LargeObjectHeapCompactionModeproperty. Beware the consequences please.
更新,.NET 4.5.1 现在支持压缩 LOH、GCSettings.LargeObjectHeapCompactionMode属性。请注意后果。
回答by oleksii
If the object's size is greater than some pinned value (85000 bytes in .NET 1), then CLR puts it in Large Object Heap. This optimises:
如果对象的大小大于某个固定值(.NET 1 中的 85000 字节),则 CLR 将其放入大对象堆中。这优化了:
- Object allocation (small objects are not mixed with large objects)
- Garbage collection (LOH collected only on full GC)
- Memory defragmentation (LOH is
neverrarely compacted)
- 对象分配(小对象不与大对象混合)
- 垃圾收集(LOH 仅在 full GC 上收集)
- 内存碎片整理(LOH是
从不很少压实)
回答by grapeot
The essential difference of Small Object Heap (SOH) and Large Object Heap (LOH) is, memory in SOH gets compacted when collected, while LOH not, as this articleillustrates. Compacting large objects costs a lot. Similar with the examples in the article, say moving a byte in memory needs 2 cycles, then compacting a 8MB object in a 2GHz computer needs 8ms, which is a large cost. Considering large objects (arrays in most cases) are quite common in practice, I suppose that's the reason why Microsoft pins large objects in the memory and proposes LOH.
小对象堆 (SOH) 和大对象堆 (LOH) 的本质区别在于,SOH 中的内存在收集时会被压缩,而 LOH 不会,如本文所述。压缩大对象的成本很高。与文章中的例子类似,假设在内存中移动一个字节需要 2 个周期,那么在 2GHz 计算机中压缩 8MB 对象需要 8ms,这是一个很大的成本。考虑到大对象(在大多数情况下是数组)在实践中很常见,我想这就是微软将大对象固定在内存中并提出 LOH 的原因。
BTW, according to this post, LOH usually doesn't generate memory fragment problems.
顺便说一句,根据这篇文章,LOH 通常不会产生内存碎片问题。
回答by Myles McDonnell
The principal is that it unlikely (and quite possibly bad design) that a process would create lots of short lived large objects so the CLR allocates large objects to a separate heap on which it runs GC on a different schedule to the regular heap. http://msdn.microsoft.com/en-us/magazine/cc534993.aspx
主要是一个进程不太可能(并且很可能是糟糕的设计)创建大量短期存在的大对象,因此 CLR 将大对象分配给一个单独的堆,在该堆上它以与常规堆不同的时间表运行 GC。http://msdn.microsoft.com/en-us/magazine/cc534993.aspx
回答by Chris Shain
I am not an expert on the CLR, but I would imagine that having a dedicated heap for large objects can prevent unnecessary GC sweeps of the existing generational heaps. Allocating a large object requires a significant amount of contiguousfree memory. In order to provide that from the scattered "holes" in the generational heaps, you'd need frequent compactions (which are only done with GC cycles).
我不是 CLR 方面的专家,但我认为为大对象设置专用堆可以防止对现有分代堆进行不必要的 GC 扫描。分配大对象需要大量连续的空闲内存。为了从分代堆中分散的“漏洞”中提供这种信息,您需要频繁的压缩(这只能通过 GC 周期完成)。

