生产中的 Java G1 垃圾收集
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2254041/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java G1 garbage collection in production
提问by benstpierre
Since Java 7 is going to use the new G1 garbage collection by default is Java going to be able to handle an order of magnitude larger heap without supposed "devastating" GC pause times? Has anybody actually implemented G1 in production, what were your experiences?
由于默认情况下 Java 7 将使用新的 G1 垃圾收集,Java 是否能够处理一个数量级更大的堆,而不会出现所谓的“破坏性”GC 暂停时间?有没有人在生产中实际实施过 G1,你有什么经验?
To be fair the only time I have seen really long GC pauses is on very large heaps, much more than a workstation would have. To clarify my question; will G1 open the gateway to heaps in the hundreds of GB? TB?
公平地说,我唯一一次看到非常长的 GC 停顿是在非常大的堆上,远远超过工作站。澄清我的问题;G1 会打开通往数百 GB 堆的网关吗?结核病?
采纳答案by Bill K
It sounds like the point of G1 is to have smaller pause times, even to the point where it has the ability to specify a maximum pause time target.
听起来 G1 的重点是具有更短的暂停时间,甚至可以指定最大暂停时间目标。
Garbage collection isn't just a simple "Hey, it's full, let's move everything at once and start over" deal any more--it's fantastically complex, multi-level, background threaded system. It can do much of its maintenance in the background with no pauses at all, and it also uses knowledge of the system's expected patterns at runtime to help--like assuming most objects die right after being created, etc.
垃圾收集不仅仅是一个简单的“嘿,它已满,让我们立即移动所有内容并重新开始”的交易——它是非常复杂的、多层次的、后台线程系统。它可以在后台进行大部分维护而根本没有停顿,并且它还在运行时使用系统预期模式的知识来提供帮助——比如假设大多数对象在创建后立即死亡等。
I would say GC pause times are going to continue to improve, not worsen, with future releases.
我会说 GC 暂停时间会随着未来的发布而继续改善,而不是恶化。
EDIT:
编辑:
in re-reading it occurred to me that I use Java daily--Eclipse, Azureus, and the apps I develop, and it's been a LONG TIME since I saw a pause. Not a significant pause, but I mean any pause at all.
在重新阅读时,我突然想到我每天都在使用 Java——Eclipse、Azureus 和我开发的应用程序,而且我已经很久没有看到停顿了。不是一个明显的停顿,但我的意思是任何停顿。
I've seen pauses when I right-click on windows explorer or (occasionally) when I hook up certain USB hardware, but with Java---none at all.
当我右键单击 Windows 资源管理器或(偶尔)连接某些 USB 硬件时,我看到了暂停,但使用 Java 时根本没有。
Is GC still an issue with anyone?
GC 仍然是任何人的问题吗?
回答by Peter Lawrey
The G1 collector reduces the impact of full collections. If you have an application where you have already reduced the need for full collections, the Concurrent map Sweep collector is just as good and in my experience has shorter minor collection times.
G1 收集器减少了完整收集的影响。如果您的应用程序已经减少了对完整集合的需求,那么 Concurrent map Sweep 收集器同样出色,而且根据我的经验,它的次要收集时间更短。
回答by Ted Dunning
CMS can lead to slowly degraded performance even if you are running it without accumulating tenured objects. This is because of memory fragmentation which G1 supposedly avoids.
CMS 可能会导致性能缓慢下降,即使您在运行它时没有累积使用年限的对象。这是因为 G1 应该避免的内存碎片。
The myth about G1 available only with paid support is just that, a myth. Sun and now Oracle have clarified this on the JDK page.
关于 G1 只能通过付费支持才能获得的神话只是一个神话。Sun 和现在 Oracle 已经在 JDK 页面上澄清了这一点。
回答by David Leppik
I've been testing it out with a heavy application: 60-70GB allocated to heap, with 20-50GB in use at any time. With these sorts of applications, it's an understatement to say that your mileage may vary. I'm running JDK 1.6_22 on Linux. The minor versions are important-- before about 1.6_20, there were bugs in G1 that caused random NullPointerExceptions.
我一直在用一个繁重的应用程序测试它:60-70GB 分配给堆,20-50GB 随时使用。对于这些类型的应用程序,可以轻描淡写地说您的里程可能会有所不同。我在 Linux 上运行 JDK 1.6_22。次要版本很重要——在大约 1.6_20 之前,G1 中存在导致随机 NullPointerExceptions 的错误。
I've found that it is very good at keeping within the pause target you give it most of the time. The default appears to be a 100ms (0.1 second) pause, and I've been telling it to do half that (-XX:MaxGCPauseMillis=50). However, once it gets really low on memory, it panics and does a full stop-the-world garbage collection. With 65GB, that takes between 30 seconds and 2 minutes. (The number of CPUs probably doesn't make a difference; it's probably limited by the bus speed.)
我发现它非常擅长保持在你大部分时间给它的暂停目标内。默认似乎是 100 毫秒(0.1 秒)的暂停,我一直告诉它做一半(-XX:MaxGCPauseMillis=50)。然而,一旦它的内存真的很低,它就会恐慌并进行完全停止的世界垃圾收集。使用 65GB,这需要 30 秒到 2 分钟。(CPU 的数量可能没有什么区别;它可能受总线速度的限制。)
Compared with CMS (which is not the default server GC, but it should be for web servers and other real-time applications), typical pauses are much more predictable and can be made much shorter. So far I'm having better luck with CMS for the huge pauses, but that may be random; I'm seeing them only a few times every 24 hours. I'm not sure which one will be more appropriate in my production environment at the moment, but probably G1. If Oracle keeps tuning it, I suspect G1 will ultimately be the clear winner.
与 CMS(它不是默认的服务器 GC,但它应该用于 Web 服务器和其他实时应用程序)相比,典型的暂停更可预测并且可以更短。到目前为止,我对 CMS 的巨大停顿有更好的运气,但这可能是随机的;我每 24 小时只见他们几次。目前我不确定哪个更适合我的生产环境,但可能是 G1。如果 Oracle 继续调整它,我怀疑 G1 最终将成为明显的赢家。
If you're not having a problem with the existing garbage collectors, there's no reason to consider G1 right now. If you are running a low-latency application, such as a GUI application, G1 is probably the right choice, with MaxGCPauseMillis set really low. If you're running a batch-mode application, G1 doesn't buy you anything.
如果您对现有的垃圾收集器没有问题,那么现在就没有理由考虑 G1。如果您正在运行低延迟应用程序,例如 GUI 应用程序,G1 可能是正确的选择,MaxGCPauseMillis 设置得非常低。如果您正在运行批处理模式应用程序,G1 不会为您提供任何东西。
回答by StaxMan
Although I have not tested G1 in production, I thought I would comment that GCs are already problematic for cases without "humongous" heaps. Specifically services with just, say, 2 or 4 gigs can be severely impacted by GC. Young generation GCs are usually not problematic as they finish in single-digit milliseconds (or at most double-digit). But old-generation collections are much more problematic as they take multiple seconds with old-gen sizes of 1 gig or above.
虽然我没有在生产中测试 G1,但我想我会评论说 GC 对于没有“巨大”堆的情况已经有问题了。特别是只有 2 或 4 个演出的服务可能会受到 GC 的严重影响。年轻代 GC 通常没有问题,因为它们在个位数毫秒(或最多两位数)内完成。但是老一代的集合问题要大得多,因为它们需要几秒钟的时间,而老一代的大小为 1 gig 或更高。
Now: in theory CMS can help a lot there, as it can run most of its operation concurrently. However, over time there will be cases where it can not do this and has to fall back to "stop the world" collection. And when that happens (after, say, 1 hour -- not often, but still too often), well, hold on to your f***ing hats. It can take a minute or more. This is especially problematic for services that try to limit maximum latency; instead of it taking, say, 25 milliseconds to serve a request it now takes ten second or more. To add injury to insult clients will then often time out the request and retry, leading to further problems (aka "shit storm").
现在:理论上 CMS 可以提供很多帮助,因为它可以同时运行大部分操作。但是,随着时间的推移,会出现无法做到这一点而不得不退回到“停止世界”收集的情况。当这种情况发生时(比如说,在 1 小时之后——不经常,但仍然太频繁),好吧,坚持你的帽子。可能需要一分钟或更长时间。对于试图限制最大延迟的服务来说,这尤其成问题;比方说,处理一个请求需要 25 毫秒,现在需要 10 秒或更长时间。为了增加侮辱客户的伤害,通常会使请求超时并重试,从而导致进一步的问题(又名“狗屎风暴”)。
This is one area where G1 was hoped to help a lot. I worked for a big company that offers cloud services for storage and message dispatching; and we could not use CMS since although much of the time it worked better than parallel varieties, it had these meltdowns. So for about an hour things were nice; and then stuff hit the fan... and because service was based on clusters, when one node got in trouble, others typically followed (since GC-induced timeouts lead to other nodes believe node had crashed, leading to re-routes).
这是 G1 希望能提供很大帮助的一个领域。我曾在一家提供存储和消息分发云服务的大公司工作;并且我们无法使用 CMS,因为尽管在很多时候它比并行品种运行得更好,但它有这些崩溃。所以大约一个小时,一切都很好;然后东西就火了……而且因为服务是基于集群的,当一个节点出现问题时,其他节点通常会跟随(因为 GC 引起的超时导致其他节点认为节点已经崩溃,从而导致重新路由)。
I don't think GC is that much of a problem for apps, and perhaps even non-clustered services are less often affected. But more and more systems are clustered (esp. thanks to NoSQL data stores) and heap sizes are growing. OldGen GCs are super-linearly related to heap size (meaning that doubling heap size more than doubles GC time, assuming size of live data set also doubles).
我不认为 GC 对应用程序来说是一个很大的问题,甚至非集群服务也不太经常受到影响。但是越来越多的系统被集群化(特别是由于 NoSQL 数据存储),堆大小也在增长。OldGen GC 与堆大小呈超线性相关(意味着将堆大小加倍会使 GC 时间增加一倍以上,假设实时数据集的大小也加倍)。
回答by Daniel
G1 makes the application a lot more agile: the latancy of the application will raise - the app can be named as "soft-real-time". This is done by replacing two kinds of GC runs (small minor ones and one big on Tenured Gen) to equal-sized small ones.
G1 使应用程序更加敏捷:应用程序的延迟会提高 - 应用程序可以命名为“软实时”。这是通过将两种 GC 运行(较小的次要运行和 Tenured Gen 上的一种大运行)替换为大小相同的小型运行来完成的。
For more details look at this: http://geekroom.de/java/java-expertise-g1-fur-java-7/
有关更多详细信息,请查看:http: //geekroom.de/java/java-expertise-g1-fur-java-7/
回答by Fuby
I'm working with Java, for small and large Heap, and the question of the GC and Full GC appears every day, as the constraints may be more strict than others : in certain environment, 0.1 second of scavenger GC or Full GC, kill simply the fonctionnalité, and have fine grained configuration and capability is important (CMS, iCMS, others ... the target is here to have the best possible response time with the nearly real time treatment (here the real time treatment is often 25 ms ), so, basically, any improvements in GC ergonomy ans heuristique are welcome !
我正在使用Java,无论是小堆还是大堆,GC和Full GC的问题每天都会出现,因为约束可能比其他人更严格:在某些环境中,0.1秒的scavenger GC或Full GC,杀死简单的功能,并具有细粒度的配置和功能很重要(CMS,iCMS,其他......目标是通过近乎实时的处理获得最佳的响应时间(这里的实时处理通常是 25 ms ) ,因此,基本上,欢迎对 GC 人体工程学和启发式进行任何改进!
回答by hypheng
G1 GC is supposed to work better. But if setting -XX:MaxGCPauseMillis too aggressively, garbage will be collecting too slowly. And that's why full GC triggered in David Leppik's example.
G1 GC 应该工作得更好。但是如果将 -XX:MaxGCPauseMillis 设置得太激进,垃圾收集就会太慢。这就是为什么在 David Leppik 的示例中触发了完整 GC。
回答by Scott Sellers
Azul's CTO, Gil Tene, has a nice overview of the problems associated with Garbage Collection and a review of various solutions in his Understanding Java Garbage Collection and What You Can Do about Itpresentation, and there's additional detail in this article: http://www.infoq.com/articles/azul_gc_in_detail.
Azul 的 CTO Gil Tene 在他的“理解 Java 垃圾收集和你能做些什么”演示文稿中对与垃圾收集相关的问题进行了很好的概述,并回顾了各种解决方案,本文还有更多细节:http:// www.infoq.com/articles/azul_gc_in_detail。
Azul's C4 Garbage Collector in our Zing JVM is both parallel and concurrent, and uses the same GC mechanism for both the new and old generations, working concurrently and compacting in both cases. Most importantly, C4 has no stop-the-world fall back. All compaction is performed concurrently with the running application. We have customers running very large (hundreds of GBytes) with worse case GC pause times of <10 msec, and depending on the application often times less than 1-2 msec.
在我们的 Zing JVM 中,Azul 的 C4 垃圾收集器是并行和并发的,并且对新旧代使用相同的 GC 机制,在两种情况下并发工作和压缩。最重要的是,C4 没有停止世界的倒退。所有压缩都是与正在运行的应用程序同时执行的。我们有客户运行非常大(数百 GBytes),在更坏的情况下 GC 暂停时间 <10 毫秒,并且取决于应用程序,通常小于 1-2 毫秒。
The problem with CMS and G1 is that at some point Java heap memory must be compacted, and both of those garbage collectors stop-the-world/STW (i.e. pause the application) to perform compaction. So while CMS and G1 can push out STW pauses, they don't eliminate them. Azul's C4, however, does completely eliminate STW pauses and that's why Zing has such low GC pauses even for gigantic heap sizes.
CMS 和 G1 的问题在于,在某些时候必须压缩 Java 堆内存,并且这两个垃圾收集器都停止世界/STW(即暂停应用程序)以执行压缩。因此,虽然 CMS 和 G1 可以推出 STW 暂停,但它们并不能消除它们。然而,Azul 的 C4 确实完全消除了 STW 暂停,这就是为什么 Zing 即使对于巨大的堆大小也具有如此低的 GC 暂停。
And to correct a statement made in an earlier answer, Zing does not require any changes to the Operating System. It runs just like any other JVM on unmodified Linux distros.
为了更正之前回答中的陈述,Zing 不需要对操作系统进行任何更改。它就像在未修改的 Linux 发行版上的任何其他 JVM 一样运行。
回答by Bubi S
It seems like G1 starting JDK7u4 is finally officially supported, see the RN for JDK7u4 http://www.oracle.com/technetwork/java/javase/7u4-relnotes-1575007.html.
似乎终于正式支持 G1 启动 JDK7u4,请参阅 JDK7u4 的 RN http://www.oracle.com/technetwork/java/javase/7u4-relnotes-1575007.html。
From our testing still for big JVMs tuned CMS still acts better than G1 but I guess it will grow better.
从我们对大型 JVM 的测试来看,调整后的 CMS 仍然比 G1 表现得更好,但我想它会变得更好。