Java G1 垃圾收集器:Perm Gen 无限期填满,直到执行 Full GC

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20274317/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 00:45:36  来源:igfitidea点击:

G1 garbage collector: Perm Gen fills up indefinitely until a Full GC is performed

javagarbage-collectionjboss7.xg1gc

提问by Jose Otavio

We have a fairly big application running on a JBoss 7 application server. In the past, we were using ParallelGC but it was giving us trouble in some servers where the heap was large (5 GB or more) and usually nearly filled up, we would get very long GC pauses frequently.

我们有一个相当大的应用程序在 JBoss 7 应用服务器上运行。过去,我们使用 ParallelGC,但它在一些堆很大(5 GB 或更多)并且通常几乎填满的服务器中给我们带来了麻烦,我们会经常遇到很长的 GC 暂停。

Recently, we made improvements to our application's memory usage and in a few cases added more RAM to some of the servers where the application runs, but we also started switching to G1 in the hopes of making these pauses less frequent and/or shorter. Things seem to have improved but we are seeing a strange behaviour which did not happen before (with ParallelGC): the Perm Gen seems to fill up pretty quickly and once it reaches the max value a Full GC is triggered, which usually causes a long pause in the application threads (in some cases, over 1 minute).

最近,我们改进了应用程序的内存使用,并在少数情况下为应用程序运行的一些服务器增加了更多 RAM,但我们也开始切换到 G1,希望减少这些暂停的频率和/或更短。事情似乎有所改善,但我们看到了以前没有发生过的奇怪行为(使用 ParallelGC):Perm Gen 似乎很快填满,一旦达到最大值就会触发 Full GC,这通常会导致长时间的暂停在应用程序线程中(在某些情况下,超过 1 分钟)。

We have been using 512 MB of max perm size for a few months and during our analysis the perm size would usually stop growing at around 390 MB with ParallelGC. After we switched to G1, however, the behaviour above started happening. I tried increasing the max perm size to 1 GB and even 1,5 GB, but still the Full GCs are happening (they are just less frequent).

几个月来,我们一直在使用 512 MB 的最大 perm 大小,在我们的分析期间,perm 大小通常会在 ParallelGC 的 390 MB 左右停止增长。然而,在我们切换到 G1 之后,上面的行为开始发生。我尝试将最大 perm 大小增加到 1 GB 甚至 1.5 GB,但仍然会发生 Full GC(只是频率较低)。

In this linkyou can see some screenshots of the profiling tool we are using (YourKit Java Profiler). Notice how when the Full GC is triggered the Eden and the Old Gen have a lot of free space, but the Perm size is at the maximum. The Perm size and the number of loaded classes decrease drastically after the Full GC, but they start rising again and the cycle is repeated. The code cache is fine, never rises above 38 MB (it's 35 MB in this case).

此链接中,您可以看到我们正在使用的分析工具(YourKit Java Profiler)的一些屏幕截图。注意当 Full GC 被触发时,Eden 和 Old Gen 有很多可用空间,但 Perm 大小是最大的。在 Full GC 之后 Perm 大小和加载类的数量急剧减少,但它们又开始上升并重复循环。代码缓存很好,永远不会超过 38 MB(在这种情况下是 35 MB)。

Here is a segment of the GC log:

这是GC日志的一部分:

2013-11-28T11:15:57.774-0300: 64445.415: [Full GC 2126M->670M(5120M), 23.6325510 secs] [Eden: 4096.0K(234.0M)->0.0B(256.0M) Survivors: 22.0M->0.0B Heap: 2126.1M(5120.0M)->670.6M(5120.0M)] [Times: user=10.16 sys=0.59, real=23.64 secs]

2013-11-28T11:15:57.774-0300: 64445.415: [Full GC 2126M->670M(5120M), 23.6325510 secs] [Eden: 4096.0M.0M)-232rs.0M(2320M)-2320rs. >0.0B 堆:2126.1M(5120.0M)->670.6M(5120.0M)] [时间:user=10.16 sys=0.59, real=23.64 secs]

You can see the full log here(from the moment we started up the server, up to a few minutes after the full GC).

你可以在这里看到完整的日志(从我们启动服务器的那一刻起,直到完整 GC 后的几分钟)。

Here's some environment info:

以下是一些环境信息:

java version "1.7.0_45"

Java(TM) SE Runtime Environment (build 1.7.0_45-b18)

Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)

java版本“1.7.0_45”

Java(TM) SE 运行时环境(构建 1.7.0_45-b18)

Java HotSpot(TM) 64 位服务器 VM(构建 24.45-b08,混合模式)

Startup options: -Xms5g -Xmx5g -Xss256k -XX:PermSize=1500M -XX:MaxPermSize=1500M -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -Xloggc:gc.log

启动选项: -Xms5g -Xmx5g -Xss256k -XX:PermSize=1500M -XX:MaxPermSize=1500M -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -Xloggc:gc.log

So here are my questions:

所以这里是我的问题:

  • Is this the expected behaviour with G1? I found another post on the web of someone questioning something very similar and saying that G1 should perform incremental collections on the Perm Gen, but there was no answer...

  • Is there something I can improve/corrrect in our startup parameters? The server has 8 GB of RAM, but it doesn't seem we are lacking hardware, performance of the application is fine until a full GC is triggered, that's when users experience big lags and start complaining.

  • 这是 G1 的预期行为吗?我在网上发现了另一个帖子,有人质疑非常相似的事情,并说 G1 应该在 Perm Gen 上执行增量收集,但没有答案......

  • 我可以改进/纠正我们的启动参数吗?服务器有 8 GB 的 RAM,但似乎我们并不缺乏硬件,在触发完整 GC 之前应用程序的性能很好,那时用户会遇到很大的滞后并开始抱怨。

采纳答案by Joshua Wilson

Causes of growing Perm Gen

Perm Gen 增长的原因

  • Lots of classes, especially JSPs.
  • Lots of static variables.
  • There is a classloader leak.
  • 很多类,尤其是 JSP。
  • 很多静态变量。
  • 存在类加载器泄漏。

For those that don't know, here is a simple way to think about how the PremGen fills up. The Young Gen doesn't get enough time to let things expire and so they get moved up to Old Gen space. The Perm Gen holds the classes for the objects in the Young and Old Gen. When the objects in the Young or Old Gen get collected and the class is no longer being referenced then it gets 'unloaded' from the Perm Gen. If the Young and Old Gen don't get GC'd then neither does the Perm Gen and once it fills up it needs a Full stop-the-world GC. For more info see Presenting the Permanent Generation.

对于那些不知道的人,这里有一个简单的方法来思考 PremGen 如何填满。年轻一代没有足够的时间让事情到期,所以他们被转移到了老一代的空间。Perm Gen 持有 Young 和 Old Gen 中对象的类。当 Young 或 Old Gen 中的对象被收集并且该类不再被引用时,它就会从 Perm Gen 中“卸载”。如果 Young 和 Old Gen 中的对象被收集Old Gen 不会被 GC,Perm Gen 也不会,一旦它填满,它就需要一个 Full stop-the-world GC。有关更多信息,请参阅展示永久代



Switching to CMS

切换到内容管理系统

I know you are using G1 but if you do switch to the Concurrent Mark Sweep (CMS) low pause collector -XX:+UseConcMarkSweepGC, try enabling class unloading and permanent generation collections by adding -XX:+CMSClassUnloadingEnabled.

我知道您正在使用 G1,但如果您确实切换到并发标记扫描 (CMS) 低暂停收集器-XX:+UseConcMarkSweepGC,请尝试通过添加-XX:+CMSClassUnloadingEnabled.



The Hidden Gotcha'

隐藏的陷阱'

If you are using JBoss, RMI/DGC has the gcInterval set to 1 min. The RMI subsystem forces a full garbage collection once per minute. This in turn forces promotion instead of letting it get collected in the Young Generation.

如果您使用 JBoss,RMI/DGC 将 gcInterval 设置为 1 分钟。RMI 子系统每分钟强制执行一次完全垃圾收集。这反过来又会强制提升而不是让它在年轻一代中被收集。

You should change this to at least 1 hr if not 24 hrs, in order for the the GC to do proper collections.

如果不是 24 小时,您应该将其更改为至少 1 小时,以便 GC 进行适当的收集。

-Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000


List of every JVM option

每个 JVM 选项的列表

To see all the options, run this from the cmd line.

要查看所有选项,请从 cmd 行运行它。

java -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version

If you want to see what JBoss is using then you need to add the following to your standalone.xml. You will get a list of every JVM option and what it is set to. NOTE: it must be in the JVM that you want to look at to use it. If you run it external you won't see what is happening in the JVM that JBoss is running on.

如果您想查看 JBoss 正在使用什么,那么您需要将以下内容添加到您的standalone.xml. 您将获得每个 JVM 选项及其设置的列表。注意:它必须在您要查看的 JVM 中才能使用它。如果您在外部运行它,您将看不到运行 JBoss 的 JVM 中发生了什么。

set "JAVA_OPTS= -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal %JAVA_OPTS%"

There is a shortcut to use when we are only interested in the modified flags.

当我们只对修改后的标志感兴趣时,有一个快捷方式可以使用。

-XX:+PrintcommandLineFlags


Diagnostics

诊断

Use jmapto determine what classes are consuming permanent generation space. Output will show

使用jmap确定哪些类正在消耗永久代空间。输出将显示

  • class loader
  • # of classes
  • bytes
  • parent loader
  • alive/dead
  • type
  • totals

    jmap -permstat JBOSS_PID  >& permstat.out
    
  • 类加载器
  • # 班级
  • 字节
  • 父加载器
  • 生/死
  • 类型
  • 总计

    jmap -permstat JBOSS_PID  >& permstat.out
    


JVM Options

JVM 选项

These settings worked for me but depending how your system is set up and what your application is doing will determine if they are right for you.

这些设置对我有用,但取决于您的系统如何设置以及您的应用程序正在做什么将决定它们是否适合您。

  • -XX:SurvivorRatio=8– Sets survivor space ratio to 1:8, resulting in larger survivor spaces (the smaller the ratio, the larger the space). The SurvivorRatio is the size of the Eden space compared to one survivor space. Larger survivor spaces allow short lived objects a longer time period to die in the young generation.

  • -XX:TargetSurvivorRatio=90– Allows 90% of the survivor spaces to be occupied instead of the default 50%, allowing better utilization of the survivor space memory.

  • -XX:MaxTenuringThreshold=31– To prevent premature promotion from the young to the old generation . Allows short lived objects a longer time period to die in the young generation (and hence, avoid promotion). A consequence of this setting is that minor GC times can increase due to additional objects to copy. This value and survivor space sizes may need to be adjusted so as to balance overheads of copying between survivor spaces versus tenuring objects that are going to live for a long time. The default settings for CMS are SurvivorRatio=1024 and MaxTenuringThreshold=0 which cause all survivors of a scavenge to be promoted. This can place a lot of pressure on the single concurrent thread collecting the tenured generation. Note: when used with -XX:+UseBiasedLocking, this setting should be 15.

  • -XX:NewSize=768m– allow specification of the initial young generation sizes

  • -XX:MaxNewSize=768m– allow specification of the maximum young generation sizes

  • -XX:SurvivorRatio=8– 将幸存者空间比例设置为 1:8,从而产生更大的幸存者空间(比例越小,空间越大)。SurvivorRatio 是伊甸园空间与一个幸存者空间相比的大小。更大的幸存者空间允许短命对象在年轻代中死亡更长的时间。

  • -XX:TargetSurvivorRatio=90– 允许 90% 的幸存者空间被占用,而不是默认的 50%,从而更好地利用幸存者空间内存。

  • -XX:MaxTenuringThreshold=31– 防止从年轻代过早提升到年老代。允许生命周期较短的对象在年轻代中死亡更长的时间(因此避免升级)。此设置的结果是,由于要复制的额外对象,次要 GC 时间可能会增加。可能需要调整此值和幸存者空间大小,以平衡幸存者空间之间复制的开销与将长期存在的长期对象。CMS 的默认设置是 SurvivorRatio=1024 和 MaxTenuringThreshold=0,这会导致所有清理的幸存者都被提升。这会给收集年老代的单个并发线程带来很大压力。注意:与 -XX:+UseBiasedLocking 一起使用时,此设置应为 15。

  • -XX:NewSize=768m– 允许指定初始年轻代大小

  • -XX:MaxNewSize=768m– 允许指定最大年轻代大小

Here is a more extensive JVM optionslist.

这是一个更广泛的JVM 选项列表。

回答by samash

You should be starting your server.bat with java command with -verbose:gc

您应该使用带有 -verbose:gc 的 java 命令启动 server.bat

回答by mnp

I would first try to find the root cause for the PermGen getting larger before randomly trying JVM options.

在随机尝试 JVM 选项之前,我会首先尝试找出永久代变大的根本原因。

  • You could enable classloading logging (-verbose:class, -XX:+TraceClassLoading -XX:+TraceClassUnloading, ...) and chek out the output
  • In your test environment, you could try monitoring (over JMX) when classes get loaded (java.lang:type=ClassLoading LoadedClassCount). This might help you find out which part of your application is responsible.
  • You could also try listing all the classes using the JVM tools (sorry but I still mostly use jrockit and there you would do it with jrcmd. Hope Oracle have migrated those helpful features to Hotspot...)
  • 您可以启用类加载日志记录 (-verbose:class, -XX:+TraceClassLoading -XX:+TraceClassUnloading, ...) 并检查输出
  • 在您的测试环境中,您可以尝试在类加载时进行监控(通过 JMX)(java.lang:type=ClassLoading LoadedClassCount)。这可能会帮助您找出应用程序的哪个部分负责。
  • 您也可以尝试使用 JVM 工具列出所有类(抱歉,我仍然主要使用 jrockit,您可以使用 jrcmd 来完成。希望 Oracle 已将这些有用的功能迁移到 Hotspot...)

In summary, find out what generates so many classes and then think how to reduce that / tune the gc.

总之,找出产生这么多类的原因,然后考虑如何减少/调整 gc。

Cheers, Dimo

干杯,迪莫

回答by eis

I agree with the answer abovein that you should really try to find what is actually filling your permgen, and I'd heavily suspect it's about some classloader leak that you want to find a root cause for.

我同意上面的答案,因为您应该真正尝试找到实际填充 permgen 的内容,并且我严重怀疑这是关于您想要找到根本原因的某些类加载器泄漏。

There's this threadin the JBoss forums that goes through couple of such diagnozed cases and how they were fixed. this answerand this articlediscusses the issue in general as well. In that article there's a mention of possibly the easiest test you can do:

JBoss 论坛中有一个线程,其中介绍了几个这样的诊断案例以及它们是如何修复的。这个答案这篇文章也总体上讨论了这个问题。在那篇文章中提到了您可以做的可能是最简单的测试:

Symptom

This will happen only if you redeploy your application without restarting the application server. The JBoss 4.0.x series suffered from just such a classloader leak. As a result I could not redeploy our application more than twice before the JVM would run out of PermGen memory and crash.

Solution

To identify such a leak, un-deploy your application and then trigger a full heap dump (make sure to trigger a GC before that). Then check if you can find any of your application objects in the dump. If so, follow their references to their root, and you will find the cause of your classloader leak. In the case of JBoss 4.0 the only solution was to restart for every redeploy.

症状

仅当您在不重新启动应用程序服务器的情况下重新部署应用程序时,才会发生这种情况。JBoss 4.0.x 系列就遭受了这样的类加载器泄漏。因此,在 JVM 耗尽 PermGen 内存并崩溃之前,我无法多次重新部署我们的应用程序。

解决方案

要识别此类泄漏,请取消部署您的应用程序,然后触发完整的堆转储(确保在此之前触发 GC)。然后检查是否可以在转储中找到任何应​​用程序对象。如果是这样,请遵循他们对根目录的引用,您将找到类加载器泄漏的原因。在 JBoss 4.0 的情况下,唯一的解决方案是在每次重新部署时重新启动。

This is what I'd try first, IF you think that redeployment might be related. This blog postis an earlier one, doing the same thing but discussing the details as well. Based on the posting it might be though that you're not actually redeploying anything, but permgen is just filling up by itself. In that case, examination of classes + anything else added to permgen might be the way (as has been already mentioned in previous answer).

如果您认为重新部署可能相关,这就是我首先要尝试的。这篇博文是较早的一篇,做了同样的事情,但也讨论了细节。根据发布的信息,您可能实际上并没有重新部署任何东西,但是 permgen 只是自己填满了。在这种情况下,检查类 + 添加到 permgen 的任何其他内容可能是方法(如之前的答案中已经提到的那样)。

If that doesn't give more insight, my next step would be trying out plumbr tool. They have a sort of guarantee on finding the leak for you, as well.

如果这不能提供更多洞察力,我的下一步将是尝试plumbr tool。他们也为您找到泄漏提供了某种保证

回答by Stephen C

Is this the expected behaviour with G1?

这是 G1 的预期行为吗?

I don't find it surprising. The base assumption is that stuff put into permgen almost neverbecomes garbage. So you'd expect that permgen GC would be a "last resort"; i.e. something the JVM would only do if its was forced into a full GC. (OK, this argument is nowhere near a proof ... but its consistent with the following.)

我不觉得奇怪。基本假设是放入 permgen 的东西几乎永远不会变成垃圾。所以你会认为 permgen GC 将是“最后的手段”;即 JVM 只有在强制进入完整 GC 时才会做的事情。(好吧,这个论点远非证明......但它与以下内容一致。)

I've seen lots of evidence that other collectors have the same behaviour; e.g.

我已经看到很多证据表明其他收藏家也有同样的行为;例如

I found another post on the web of someone questioning something very similar and saying that G1 should perform incremental collections on the Perm Gen, but there was no answer...

我在网上发现了另一个帖子,有人质疑非常相似的事情,并说 G1 应该在 Perm Gen 上执行增量收集,但没有答案......

I think I found the same post. But someone's opinion that it ought to bepossible is not really instructive.

我想我找到了同样的帖子。但有人认为它应该是可能的,这并没有真正的指导意义。

Is there something I can improve/corrrect in our startup parameters?

我可以改进/纠正我们的启动参数吗?

I doubt it. My understanding is that this is inherent in the permgen GC strategy.

我对此表示怀疑。我的理解是,这是 permgen GC 策略所固有的。

I suggest that you either track down and fix what is using so much permgen in the first place ... or switch to Java 8 in which there isn't a permgen heap anymore: see PermGen elimination in JDK 8

我建议您首先追踪并修复使用了如此多 permgen 的东西……或者切换到不再有 permgen 堆的 Java 8:请参阅JDK 8 中的 PermGen 消除

While a permgen leak is one possible explanation, there are others; e.g.

虽然 permgen 泄漏是一种可能的解释,但还有其他解释;例如

  • overuse of String.intern(),
  • application code that is doing a lot of dynamic class generation; e.g. using DynamicProxy,
  • a huge codebase ... though that wouldn't cause permgen churnas you seem to be observing.
  • 过度使用String.intern()
  • 正在执行大量动态类生成的应用程序代码;例如使用DynamicProxy
  • 一个巨大的代码库......虽然这不会像你所观察到的那样导致 permgen流失