“java.lang.OutOfMemoryError: GC 开销限制超出”中过多 GC 时间的持续时间
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2863984/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Duration of Excessive GC Time in "java.lang.OutOfMemoryError: GC overhead limit exceeded"
提问by jilles de wit
Occasionally, somewhere between once every 2 days to once every 2 weeks, my application crashes in a seemingly random location in the code with: java.lang.OutOfMemoryError: GC overhead limit exceeded. If I google this error I come to this SO questionand that lead me to this piece of sun documentationwhich expains:
偶尔,连用2天之间某处每2周一次,我的应用程序崩溃,在与代码看似随机位置:java.lang.OutOfMemoryError: GC overhead limit exceeded。如果我用谷歌搜索这个错误,我就会遇到这个 SO 问题,这导致我找到了这篇解释的太阳文档:
The parallel collector will throw an OutOfMemoryError if too much time is being spent in garbage collection: if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, an OutOfMemoryError will be thrown. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small. If necessary, this feature can be disabled by adding the option -XX:-UseGCOverheadLimit to the command line.
如果垃圾收集花费了太多时间,并行收集器将抛出 OutOfMemoryError:如果超过 98% 的总时间花费在垃圾收集上,而回收的堆不到 2%,则会抛出 OutOfMemoryError。此功能旨在防止应用程序长时间运行而由于堆太小而进展甚微或没有进展。如有必要,可以通过在命令行中添加选项 -XX:-UseGCOverheadLimit 来禁用此功能。
Which tells me that my application is apparently spending 98% of the total time in garbage collection to recover only 2% of the heap.
这告诉我,我的应用程序显然在垃圾收集上花费了总时间的 98%,只回收了 2% 的堆。
But 98% of what time? 98% of the entire two weeks the application has been running? 98% of the last millisecond?
但是 98% 是什么时候?整个两周应用程序的 98% 一直在运行?最后一毫秒的 98%?
I'm trying to determine a best approach to actually solving this issue rather than just using -XX:-UseGCOverheadLimitbut I feel a need to better understand the issue I'm solving.
我正在尝试确定实际解决此问题的最佳方法,而不仅仅是使用,-XX:-UseGCOverheadLimit但我觉得需要更好地了解我正在解决的问题。
采纳答案by gustafc
I'm trying to determine a best approach to actually solving this issue rather than just using
-XX:-UseGCOverheadLimitbut I feel a need to better understand the issue I'm solving.
我正在尝试确定实际解决此问题的最佳方法,而不仅仅是使用,
-XX:-UseGCOverheadLimit但我觉得需要更好地了解我正在解决的问题。
Well, you're using too much memory - and from the sound of it, it's probably because of a slow memory leak.
好吧,您使用了太多内存 - 从它的声音来看,这可能是因为内存泄漏缓慢。
You can try increasing the heap size with -Xmx, which would help if this isn't a memory leak but a sign that your app actually needs a lot of heap and the setting you currently have is slightly to low. If it is a memory leak, this'll just postpone the inevitable.
您可以尝试使用 增加堆大小-Xmx,如果这不是内存泄漏,而是表明您的应用程序实际上需要大量堆并且您当前拥有的设置略低的迹象,这将有所帮助。如果是内存泄漏,这只会推迟不可避免的事情。
To investigate if it is a memory leak, instruct the VM to dump heap on OOM using the -XX:+HeapDumpOnOutOfMemoryErrorswitch, and then analyze the heap dump to see if there are more objects of some kind than there should be. http://blogs.oracle.com/alanb/entry/heap_dumps_are_back_withis a pretty good place to start.
要调查它是否是内存泄漏,请使用-XX:+HeapDumpOnOutOfMemoryError开关指示 VM 在 OOM 上转储堆,然后分析堆转储以查看某种类型的对象是否超出了应有的数量。http://blogs.oracle.com/alanb/entry/heap_dumps_are_back_with是一个不错的起点。
Edit:As fate would have it, I happened to run into this problem myself just a day after this question was asked, in a batch-style app. This was not caused by a memory leak, and increasing heap size didn't help, either. What I did was actually to decreaseheap size (from 1GB to 256MB) to make full GCs faster (though somewhat more frequent). YMMV, but it's worth a shot.
编辑:正如命运所愿,在提出这个问题的一天后,我碰巧在批处理样式的应用程序中遇到了这个问题。这不是由内存泄漏引起的,增加堆大小也无济于事。我所做的实际上是减少堆大小(从 1GB 到 256MB)以使 full GC 更快(尽管频率更高)。YMMV,但值得一试。
Edit 2:Not all problems solved by smaller heap... next step was enabling the G1 garbage collectorwhich seems to do a better job than CMS.
编辑 2:并非所有问题都由较小的堆解决......下一步是启用G1 垃圾收集器,它似乎比 CMS 做得更好。
回答by Stephen C
But 98% of what time? 98% of the entire two weeks the application has been running? 98% of the last millisecond?
但是 98% 是什么时候?整个两周应用程序的 98% 一直在运行?最后一毫秒的 98%?
The simple answer is that it is not specified. However, in practice the heuristic "works", so it cannot be either of the two extreme interpretations that you posited.
简单的答案是它没有被指定。但是,在实践中,启发式“有效”,因此它不能是您提出的两种极端解释中的任何一种。
If you reallywanted to find out what the interval over which the measurements are made, you could always read the OpenJDK 6 or 7 source-code. But I wouldn't bother because it wouldn't help you solve your problem.
如果您真的想知道进行测量的时间间隔是多少,您可以随时阅读 OpenJDK 6 或 7 源代码。但我不会打扰,因为它不会帮助你解决你的问题。
The "best" approach is to do some reading on tuning (starting with the Oracle / Sun pages), and then carefully "twiddle the tuning knobs". It is not very scientific, but the problem space (accurately predictingapplication + GC performance) is "too hard" given the tools that are currently available.
“最好”的方法是阅读一些关于调优的书(从 Oracle / Sun 页面开始),然后小心地“转动调优旋钮”。不是很科学,但考虑到目前可用的工具,问题空间(准确预测应用程序+GC性能)“太难了”。
回答by MSalters
The >98% would be measured over the same period in which less than 2% of memory is recovered.
> 98% 将在少于 2% 的内存恢复的同一时期内测量。
It's quite possible that there is no fixed period for this. For instance, if the OOM check would be done after every 1,000,000 object live checks. The time that takes would be machine-dependent.
这很可能没有固定的时间段。例如,如果在每 1,000,000 个对象实时检查后进行 OOM 检查。所需的时间将取决于机器。
You most likely can't "solve" your problem by adding -XX:-UseGCOverheadLimit. The most likely result is that your application will slow to a crawl, use a bit more of memory, and then hit the point where the GC simply does not recover anymemory anymore. Instead, fix your memory leaks and then (if still needed) increase your heap size.
您很可能无法通过添加“解决”您的问题-XX:-UseGCOverheadLimit。最有可能的结果是您的应用程序将缓慢爬行,使用更多的内存,然后达到 GC 不再恢复任何内存的程度。相反,修复您的内存泄漏,然后(如果仍然需要)增加您的堆大小。

