如何减少java并发模式失败和过多的gc

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2918124/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 14:28:28  来源:igfitidea点击:

How to reduce java concurrent mode failure and excessive gc

javagarbage-collectionconcurrency

提问by jimx

In Java, the concurrent mode failure means that the concurrent collector failed to free up enough memory space form tenured and permanent gen and has to give up and let the full stop-the-worldgc kicks in. The end result could be very expensive.

在 Java 中,并发模式失败意味着并发收集器未能从tenured 和permanent gen 中释放足够的内存空间,不得不放弃并让full stop-the-worldgc 启动。最终结果可能非常昂贵。

I understand this concept but never had a good comprehensive understanding of
A) what could cause a concurrent mode failure and
B) what's the solution?.

我理解这个概念,但从来没有对
A) 什么可能导致并发模式失败和
B) 有什么解决方案有一个很好的全面理解?

This sort of unclearness leads me to write/debug code without much of hints in mind and often has to shop around those performance flags from Foo to Bar without particular reasons, just have to try.

这种不清楚导致我在没有太多提示的情况下编写/调试代码,并且经常不得不在没有特殊原因的情况下购买从 Foo 到 Bar 的那些性能标志,只需要尝试。

I'd like to learn from developers here how your experience is? If you had encountered such performance issue, what was the cause and how you addressed it?

我想向这里的开发人员学习您的体验如何?如果您遇到过此类性能问题,原因是什么,您是如何解决的?

If you have coding recommendations, please don't be too general. Thanks!

如果您有编码建议,请不要太笼统。谢谢!

回答by Stephen C

Sometimes OOM pretty quick and got killed, sometime suffers long gc period (last time was over 10 hours).

有时 OOM 很快就被杀死了,有时会遭受很长的 gc 周期(上次超过 10 小时)。

It sounds to me like a memory leak is at the root of your problems.

在我看来,内存泄漏是您问题的根源。

A CMS failure won't (as I understand it) cause an OOM. Rather a CMS failure happens because the JVM needs to do too many collections too quickly, and CMS could not keep up. One situation where lots of collection cycles happen in a short period is when your heap is nearly full.

CMS 故障不会(据我所知)导致 OOM。相反,CMS 失败的发生是因为 JVM 需要快速完成太多收集,而 CMS 跟不上。在短时间内发生大量收集周期的一种情况是您的堆几乎已满。

The really long GC time sounds weird ... but is theoretically possible if your machine was thrashing horribly. However, a long period of repeated GCs is quite plausible if your heap is very nearly full.

真的很长的 GC 时间听起来很奇怪……但如果您的机器运行得非常糟糕,理论上是可能的。但是,如果您的堆几乎已满,则长时间重复 GC 是非常合理的。

You can configure the GC to give up when the heap is 1) at max size and 2) still close to full after a full GC has completed. Try doing this if you haven't done so already. It won't cure your problems, but at least your JVM will get the OOM quickly, allowing a faster service restart and recovery.

您可以将 GC 配置为在堆为 1) 最大大小和 2) 在完整 GC 完成后仍接近满时放弃。如果您还没有这样做,请尝试这样做。它不会解决您的问题,但至少您的 JVM 会快速获得 OOM,从而允许更快的服务重启和恢复。

EDIT- the option to do this is -XX:GCHeapFreeLimit=nnnwhere nnnis a number between 0 and 100 giving the minimum percentage of the heap that must be free after the GC. The default is 2. The option is listed in the aptly titled "The most complete list of -XX options for Java 6 JVM"page. (There are lots of -XX options listed there that don't appear in the Sun documentation. Unfortunately the page provides few details on what the options actually do.)

编辑-要做到这一点的选项-XX:GCHeapFreeLimit=nnn,其中nnn0和100之间的数字给堆必须是GC后自由的最低百分比。默认值为 2。该选项列在标题为“Java 6 JVM 的最完整 -XX 选项列表”页面中。(其中列出了许多 -XX 选项,但它们并未出现在 Sun 文档中。不幸的是,该页面几乎没有提供有关这些选项实际作用的详细信息。)

You should probably start looking to see if your application / webapp has memory leaks. If it has, your problems won't go away unless those leaks are found and fixed. In the long term, fiddling with the Hotspot GC options won't fix memory leaks.

您可能应该开始查看您的应用程序/webapp 是否存在内存泄漏。如果有,除非找到并修复这些泄漏,否则您的问题不会消失。从长远来看,摆弄 Hotspot GC 选项不会修复内存泄漏。

回答by fglez

Quoted from "Understanding Concurrent Mark Sweep Garbage Collector Logs"

引自《了解并发标记清除垃圾收集器日志》

The concurrent mode failure can either be avoided by increasing the tenured generation size or initiating the CMS collection at a lesser heap occupancy by setting CMSInitiatingOccupancyFractionto a lower value

可以通过增加年老代大小或通过设置CMSInitiatingOccupancyFraction为较低的值以较小的堆占用率启动 CMS 收集来避免并发模式失败

However, if there is really a memory leak in your application, you're just buying time.

但是,如果您的应用程序中确实存在内存泄漏,那么您只是在争取时间。

If you need fast restart and recovery and prefer a 'die fast' approach I would suggest not using CMS at all. I would stick with '-XX:+UseParallelGC'.

如果您需要快速重启和恢复并且更喜欢“快速死”的方法,我建议您根本不要使用 CMS。我会坚持使用'-XX:+UseParallelGC'。

From "Garbage Collector Ergonomics"

来自“垃圾收集器人体工程学”

The parallel garbage collector (UseParallelGC) throws an out-of-memory exception if an excessive amount of time is being spent collecting a small amount of the heap. To avoid this exception, you can increase the size of the heap. You can also set the parameters -XX:GCTimeLimit=time-limitand -XX:GCHeapFreeLimit=space-limit

如果用于收集少量堆的时间过长,并行垃圾收集器 (UseParallelGC) 将引发内存不足异常。为避免此异常,您可以增加堆的大小。您还可以设置参数 -XX:GCTimeLimit=time-limit-XX:GCHeapFreeLimit=space-limit

回答by Kevin Lafayette

The first thing about CMS that I have learned is it needs more memory than the other collectors, about 25 to 50% more is a good starting point. This helps you avoid fragmentation, since CMS does not do any compaction like the stop the world collectors would. Second, do things that help the garbage collector; Integer.valueOf instead of new Integer, get rid of anonymous classes, make sure inner classes are not accessing inaccessible things (private in the outer class) stuff like that. The less garbage the better. FindBugs and not ignoring warnings will help a lot with this.

我了解到的关于 CMS 的第一件事是它需要比其他收集器更多的内存,大约 25% 到 50% 是一个很好的起点。这有助于您避免碎片化,因为 CMS 不会像世界收集器那样进行任何压缩。其次,做一些对垃圾收集器有帮助的事情;Integer.valueOf 而不是 new Integer,去掉匿名类,确保内部类没有访问不可访问的东西(外部类中的私有)这样的东西。垃圾越少越好。FindBugs 和不忽略警告将对此有很大帮助。

As far as tuning, I have found that you need to try several things:

至于调整,我发现你需要尝试几件事:

-XX:+UseConcMarkSweepGC

-XX:+UseConcMarkSweepGC

Tells JVM to use CMS in tenured gen.

告诉JVM 在tenured gen 中使用CMS。

Fix the size of your heap: -Xmx2048m -Xms2048m This prevents GC from having to do things like grow and shrink the heap.

修复堆的大小: -Xmx2048m -Xms2048m 这可以防止 GC 执行诸如增大和缩小堆之类的操作。

-XX:+UseParNewGC

-XX:+UseParNewGC

use parallel instead of serial collection in the young generation. This will speed up your minor collections, especially if you have a very large young gen configured. A large young generation is generally good, but don't go more than half of the old gen size.

在年轻代中使用并行而不是串行收集。这将加速您的小集合,特别是如果您配置了非常大的年轻代。大的年轻代通常是好的,但不要超过老一代的一半。

-XX:ParallelCMSThreads=X

-XX:ParallelCMSThreads=X

set the number of threads that CMS will use when it is doing things that can be done in parallel.

设置 CMS 在执行可以并行完成的事情时将使用的线程数。

-XX:+CMSParallelRemarkEnabled remark is serial by default, this can speed you up.

-XX:+CMSParallelRemarkEnabled 备注默认是串行的,这可以加快您的速度。

-XX:+CMSIncrementalMode allows application to run more by pasuing GC between phases

-XX:+CMSIncrementalMode 允许应用程序通过在阶段之间暂停 GC 来运行更多

-XX:+CMSIncrementalPacing allows JVM to figure change how often it collects over time

-XX:+CMSIncrementalPacing 允许 JVM 计算随时间变化的收集频率

-XX:CMSIncrementalDutyCycleMin=X Minimm amount of time spent doing GC

-XX:CMSIncrementalDutyCycleMin=X 花费在 GC 上的最少时间

-XX:CMSIncrementalDutyCycle=X Start by doing GC this % of the time

-XX:CMSIncrementalDutyCycle=X 从这 % 的时间开始执行 GC

-XX:CMSIncrementalSafetyFactor=X

-XX:CMSIncrementalSafetyFactor=X

I have found that you can get generally low pause times if you set it up so that it is basically always collecting. Since most of the work is done in parallel, you end up with basically regular predictable pauses.

我发现,如果您将其设置为基本上总是在收集,则通常可以获得较短的暂停时间。由于大部分工作是并行完成的,因此您最终会遇到基本规则的可预测停顿。

-XX:CMSFullGCsBeforeCompaction=1

-XX:CMSFullGCsBeforeCompaction=1

This one is very important. It tells the CMS collector to always complete the collection before it starts a new one. Without this, you can run into the situation where it throws a bunch of work away and starts again.

这一点非常重要。它告诉 CMS 收集器在开始新的收集之前总是完成收集。没有这个,你可能会遇到这样的情况,它会扔掉一堆工作并重新开始。

-XX:+CMSClassUnloadingEnabled

-XX:+CMSClassUnloadingEnabled

By default, CMS will let your PermGen grow till it kills your app a few weeks from now. This stops that. Your PermGen would only be growing though if you make use of Reflection, or are misusing String.intern, or doing something bad with a class loader, or a few other things.

默认情况下,CMS 会让您的 PermGen 增长,直到它在几周后杀死您的应用程序。这停止了​​。如果你使用反射,或者滥用 String.intern,或者用类加载器做一些不好的事情,或者其他一些事情,你的 PermGen 只会增长。

Survivor ratio and tenuring theshold can also be played with, depending on if you have long or short lived objects, and how much object copying between survivor spaces you can live with. If you know all your objects are going to stick around, you can configure zero sized survivor spaces, and anything that survives one young gen collection will be immediately tenured.

也可以使用幸存者比率和任期阈值,这取决于您拥有的是长期还是短期的对象,以及您可以在幸存者空间之间进行多少对象复制。如果你知道你所有的对象都会留下来,你可以配置零大小的幸存者空间,任何在一个年轻代集合中幸存下来的东西都将立即被使用。

回答by dave

I've found using -XX:PretenureSizeThreshold=1mto make 'large' object go immediately to tenured space greatly reduced my young GC and concurrent mode failures since it tends not to try to dump the young + 1 survivor amount of data (xmn=1536m survivorratio=3 maxTenuringThreashould=5) before a full CMS cycle can complete. Yes my survivor space is large, but about once ever 2 days something comes in the app that will need it (and we run 12 app servers each day for 1 app).

我发现使用-XX:PretenureSizeThreshold=1m使“大”对象立即进入永久空间大大减少了我的年轻 GC 和并发模式失败,因为它往往不会xmn=1536m survivorratio=3 maxTenuringThreashould=5在完整的 CMS 周期完成之前尝试转储年轻的 + 1 幸存者数量的数据 ( ) . 是的,我的幸存者空间很大,但是大约每 2 天应用程序中就会出现一次需要它的东西(我们每天为 1 个应用程序运行 12 个应用程序服务器)。