Java 大型应用程序的 JVM 性能调优

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/564039/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 16:16:28  来源:igfitidea点击:

JVM performance tuning for large applications

javajvmperformancejvm-arguments

提问by amit

The default JVM parameters are not optimal for running large applications. Any insights from people who have tuned it on a real application would be helpful. We are running the application on a 32-bit windows machine, where the client JVM is used by default. We have added -server and changed the NewRatio to 1:3 (A larger young generation).

默认的 JVM 参数不是运行大型应用程序的最佳选择。在实际应用程序上对其进行调整的人的任何见解都会有所帮助。我们在 32 位 Windows 机器上运行应用程序,默认情况下使用客户端 JVM 。我们添加了 -server 并将 NewRatio 更改为 1:3(更大的年轻代)。

Any other parameters/tuning which you have tried and found useful?

您尝试过并发现有用的任何其他参数/​​调整?

[Update] The specific type of application I'm talking about is a server application that are rarely shutdown, taking at least -Xmx1024m. Also assume that the application is profiled already. I'm looking for general guidelines in terms of JVM performanceonly.

[更新] 我说的具体类型的应用程序是很少关闭的服务器应用程序,至少需要-Xmx1024m。还假设已经对应用程序进行了概要分析。我只在JVM 性能方面寻找一般准则。

采纳答案by Charlie Martin

There are great quantities of that information around.

有大量这样的信息。

First, profile the code before tuning the JVM.

首先,在调优 JVM 之前分析代码。

Second, read the JVM documentationcarefully; there are a lot of sort of "urban legends" around. For example, the -server flag only helps if the JVM is staying resident and running for some time; -server "turns up" the JIT/HotSpot, and that needs to have many passes through the same path to get turned up. -server, on the other hand, slowsinitial execution of the JVM, as there's more setup time.

其次,仔细阅读JVM文档;周围有很多“都市传说”。例如,-server 标志仅在 JVM 保持驻留并运行一段时间时才有帮助;-server“打开”JIT/HotSpot,并且需要多次通过相同的路径才能打开。另一方面,-server会减慢JVM 的初始执行速度,因为有更多的设置时间。

There are several good books and websites around. See, for example, http://www.javaperformancetuning.com/

周围有几本好书和网站。例如,参见 http://www.javaperformancetuning.com/

回答by Gary

This will be highly dependent on your application and the vendor and version of the JVM. You need to be clear about what you consider to be a performance problem. Are you concerned with certain critical sections of code? Have you profiled the app yet? Is the JVM spending too much time garbage collecting?

这将高度依赖于您的应用程序以及 JVM 的供应商和版本。您需要明确您认为的性能问题。您是否关心代码的某些关键部分?您是否已对应用程序进行了分析?JVM 是否在垃圾收集上花费了太多时间?

I would probably start with the -verbose:gc JVM option to watch how garbage collecting is working. Many times, the simplest fix to just increase the max heap size with -Xmx . If you learn to interpret the -verbose:gc output, it will tell you nearly all you need to know about tuning the JVM as a whole. But doing this alone will not magically make badly tuned code just go faster. Most of the JVM tuning options are designed to improve the performance of the garbage collector and/or memory sizes.

我可能会从 -verbose:gc JVM 选项开始,以观察垃圾收集是如何工作的。很多时候,最简单的解决方法是使用 -Xmx 增加最大堆大小。如果您学习解释 -verbose:gc 输出,它会告诉您几乎所有需要了解的有关整体调整 JVM 的信息。但是单独这样做并不会神奇地使调优不当的代码运行得更快。大多数 JVM 调优选项旨在提高垃圾收集器的性能和/或内存大小。

For profiling, I like yourkit.com

对于分析,我喜欢yourkit.com

回答by TofuBeer

Look here (or do a google search for hotspot tuning) http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html

看这里(或谷歌搜索热点调整)http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html

You definitely want to profile your app before you try to tune the vm. NetBeans has a nice profiler built into it that will let you see all sorts of things.

在尝试调整 vm 之前,您肯定希望分析您的应用程序。NetBeans 内置了一个很好的分析器,可以让您查看各种内容。

I once had someone tell me that the GC was broken for their app - I looked at the code and found that they never closed any of their database query results so they were retaining massive amounts of byte arrays. Once we closed the results the time went from over 20 mins and a GB of memory to about 2 mins and a very small amount of memory. They were able to remove the JVM tuning parameters and things were happy.

我曾经有人告诉我,他们的应用程序的 GC 被破坏了 - 我查看了代码,发现他们从未关闭任何数据库查询结果,因此他们保留了大量的字节数组。一旦我们关闭结果,时间从超过 20 分钟和 1 GB 的内存变为大约 2 分钟和非常小的内存量。他们能够删除 JVM 调优参数,事情很顺利。

回答by TofuBeer

The absolute best way to answer this is to perform controlled testing on the application in as close to a 'production' environment as you can create. It's quite possible that the use of -server, a reasonable starting heap size and the relatively smart behavior of recent JVMs will behave as well or better than the vast majority of settings one would normally try.

回答这个问题的绝对最佳方法是在尽可能接近“生产”环境的应用程序上执行受控测试。很有可能 -server 的使用、合理的起始堆大小和最近 JVM 的相对智能的行为将表现得与通常尝试的绝大多数设置一样好或更好。

There is one specific exception to this broad generalization: in the case that you are running in a web container, there is a really high chance that you will want to increase the permanent generation settings.

这种广泛的概括有一个特定的例外:如果您在 Web 容器中运行,您很有可能想要增加永久代设置。

回答by Peter Lawrey

I suggest you profile your application with CPU sampling and object allocation monitoring turned on at the same time. You will find you get very different results which can be helpful in tuning your code. Also try using the built in hprof profiler, it can give very different results as well.

我建议您在同时打开 CPU 采样和对象分配监控的情况下分析您的应用程序。你会发现你得到了非常不同的结果,这有助于调整你的代码。也可以尝试使用内置的 hprof 分析器,它也会给出非常不同的结果。

In general profiling your application makes much more difference than JVM args.

一般来说,分析你的应用程序比 JVM args 有更大的不同。

回答by stones333

Java on 32-bit windows machine, your choices are limited. In my experience, the follow parameter setting will impact the application performance:

32 位 Windows 机器上的 Java,您的选择是有限的。根据我的经验,以下参数设置会影响应用程序性能:

  1. memory sizes
  2. choice of GC collectors
  3. parameters related to GC collectors
  1. 内存大小
  2. GC收集器的选择
  3. GC收集器相关参数

回答by user5994461

Foreword

前言

Background

背景

Been at a Java shop. Spent entire months dedicated to running performance tests on distributed systems, the main apps being in Java. Some of which implying products developed and sold by Sun themselves (then Oracle).

去过一家Java商店。花了整整几个月的时间在分布式系统上运行性能测试,主要应用程序使用 Java。其中一些意味着由 Sun 自己(然后是 Oracle)开发和销售的产品。

I will go over the lessons I learned, some history about the JVM, some talks about the internals, a couple of parameters explained and finally some tuning. Trying to keep it to the point so you can apply it in practice.

我将回顾我学到的经验,一些关于 JVM 的历史,一些关于内部的讨论,一些参数的解释,最后是一些调整。尽量保持它的重点,以便您可以在实践中应用它。

Things are changing fast in the Java world so part of it might be already outdated since the last year I've done all that. (Is Java 10 out already?)

Java 世界中的事情变化很快,因此自去年我完成所有这些工作以来,其中的一部分可能已经过时了。(Java 10 已经发布了吗?)

Good Practices

良好做法

What you SHOULD do: benchmark, Benchmark, BENCHMARK!

你应该做什么:基准,基准,基准!

When you really need to know about performances, you need to perform real benchmarks, specific to your workload. There is no alternatives.

当您确实需要了解性能时,您需要针对您的工作负载执行真实的基准测试。没有其他选择。

Also, you should monitor the JVM. Enable monitoring.The good applications usually provide a monitoring web page and/or an API. Otherwise there is the common Java tooling (JVisualVM, JMX, hprof, and some JVM flags).

此外,您应该监控 JVM。启用监控。好的应用程序通常会提供监控网页和/或 API。另外还有通用的 Java 工具(JVisualVM、JMX、hprof 和一些 JVM 标志)。

Be aware that there is usually no performance to gain by tuning the JVM. It's more a "to crash or not to crash, finding the transition point". It's about knowing that when you give thatamount of resources to your application, you can consistently expect thatamount of performances in return. Knowledge is power.

请注意,通常无法通过调整 JVM 来获得性能。更像是“崩溃还是不崩溃,找到转折点”。这是关于知道当您为应用程序提供如此数量的资源时,您可以始终期望获得如此数量的性能作为回报。知识就是力量。

Performances is mostly dictated by your application. If you want faster, you gotta write better code.

性能主要取决于您的应用程序。如果你想要更快,你必须编写更好的代码。

What you WILL do most of the time: Live with reliable sensitive defaults

大多数时候你会做什么:使用可靠的敏感默认值

We don't get time to optimize and tune every single application out there. Most of the time we'll simply live with sensible defaults.

我们没有时间优化和调整每个应用程序。大多数时候,我们只会接受合理的默认值。

The first thing to do when configuring a new application is to read the documentation. Most of the serious applications comes with a guide for performance tuning, including advice on JVM settings.

配置新应用程序时要做的第一件事是阅读文档。大多数严肃的应用程序都带有性能调优指南,包括有关 JVM 设置的建议。

Then you can configure the application: JAVA_OPTS: -server -Xms???g -Xmx???g

然后你可以配置应用程序: JAVA_OPTS: -server -Xms???g -Xmx???g

  • -server: enable full optimizations (this flag is automatic on most JVM nowadays)
  • -Xms-Xmx: set the minimum and maximum heap (always the same value for both, that's about the only optimizations to do).
  • -server: 启用完全优化(现在这个标志在大多数 JVM 上是自动的)
  • -Xms-Xmx: 设置最小和最大堆(两者的值始终相同,这是唯一要做的优化)。

Well done, you know about all the optimization parameters there is to know about the JVM, congratulations!That was simple :D

干得好,您了解了有关 JVM 的所有优化参数,恭喜!那很简单:D

What you SHALL NOT do, EVER:

你永远不应该做的事情:

Please do NOT copy random string you found on the internet, especially when they take multiple lines like that:

请不要复制您在互联网上找到的随机字符串,尤其是当它们采用多行时:

-server  -Xms1g -Xmx1g  -XX:PermSize=1g -XX:MaxPermSize=256m  -Xmn256m -Xss64k  -XX:SurvivorRatio=30  -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled  -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=10  -XX:+ScavengeBeforeFullGC -XX:+CMSScavengeBeforeRemark  -XX:+PrintGCDateStamps -verbose:gc -XX:+PrintGCDetails -Dsun.net.inetaddr.ttl=5  -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=`date`.hprof   -Dcom.sun.management.jmxremote.port=5616 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -server -Xms2g -Xmx2g -XX:MaxPermSize=256m -XX:NewRatio=1 -XX:+UseConcMarkSweepGC

For instance, this thing found on the first page of google is plain terrible. There are arguments specified multiples times with conflicting values. Some are just forcing the JVM defaults (eventually the defaults from 2 JVM versions ago). A few are obsolete and simply ignored. And finaly at least one parameter is so invalid that it will consistently crash the JVM at startup by it's mere existence.

例如,在 google 的第一页上找到的这个东西简直太糟糕了。有多次指定的参数具有冲突的值。有些只是强制使用 JVM 默认值(最终是 2 个 JVM 版本之前的默认值)。一些已经过时并且被简单地忽略了。最后,至少有一个参数是如此无效,以至于它仅仅存在就会在启动时持续使 JVM 崩溃。

Actual tuning

实际调音

How do you choose the memory size:

如何选择内存大小:

Read the guide from your application, it should give some indication. Monitor production and adjust afterwards. Perform some benchmarks if you need accuracy.

从您的应用程序中阅读指南,它应该给出一些指示。监控生产并随后进行调整。如果您需要准确性,请执行一些基准测试。

Important Note: The java process will take up to max heap PLUS 10%. The X% overhead being the heap management, not included in the heap itself.

重要说明:java 进程将占用max heap PLUS 10%。X% 开销是堆管理,不包括在堆本身中。

All the memory is usually preallocated by the process on startup. You may see the process using max heap ALL THE TIME. It's simply not true. You need to use Java monitoring tools to see what is really being used.

所有内存通常在启动时由进程预先分配。您可能会一直看到使用最大堆的过程。这根本不是真的。您需要使用 Java 监控工具来查看真正在使用什么。

Finding the right size:

找到合适的尺寸:

  • If it crashes with OutOfMemoryException, it ain't enough memory
  • If it doesn't crash with OutOfMemoryException, it's too much memory
  • If it's too much memory BUT the hardware got it and/or is already paid for, it's the perfectnumber, job done!
  • 如果它因 OutOfMemoryException 崩溃,则表示内存不足
  • 如果它没有因 OutOfMemoryException 而崩溃,那就是内存太多
  • 如果它的内存太多但硬件得到它和/或已经支付,这是完美的数字,工作完成!

JVM6 is bronze, JVM7 is gold, JVM8 is platinum...

JVM6是青铜,JVM7是黄金,JVM8是白金……

The JVM is forever improving. Garbage Collection is a very complex thing and there are a lot of very smart people working on it. It had tremendous improvements in the past decade and it will continue to do so.

JVM 一直在改进。垃圾收集是一件非常复杂的事情,有很多非常聪明的人在研究它。它在过去十年中取得了巨大的进步,并将继续这样做。

For informational purpose. They are at least 4 available Garbage Collectors in Oracle Java 7-8 (HotSpot) and OpenJDK 7-8. (Other JVM may be entirely different e.g. Android, IBM, embedded):

仅供参考。它们是 Oracle Java 7-8 (HotSpot) 和 OpenJDK 7-8 中至少 4 个可用的垃圾收集器。(其他 JVM 可能完全不同,例如 Android、IBM、嵌入式):

  • SerialGC
  • ParallelGC
  • ConcurrentMarkSweepGC
  • G1GC
  • (plus variants and settings)
  • 串行GC
  • 并行GC
  • 并发MarkSweepGC
  • G1GC
  • (加上变体和设置)

[Starting from Java 7 and onward. The Oracle and OpenJDK code are partially shared. The GC should be (mostly) the same on both platforms.]

[从 Java 7 开始。Oracle 和 OpenJDK 代码部分共享。两个平台上的 GC 应该(大部分)相同。]

JVM >= 7 have many optimizations and pick decent defaults. It changes a bit by platform. It balances multiple things. For instance deciding to enable multicore optimizations or not whether the CPU has multiple cores. You should let it do it. Do not change or force GC settings.

JVM >= 7 有很多优化并选择了合适的默认值。它因平台而异。它平衡了很多事情。例如,决定是否启用多核优化 CPU 是否具有多核。你应该让它去做。不要更改或强制 GC 设置。

It's okay to let the computer takes decision for you (that's what computers are for). It's better to have the JVM settings being 95%-optimal all the time than forcing a "always 8 core aggressive collection for lower pause times" on all the boxes, half of them being t2.small in the end.

让计算机为您做决定是可以的(这就是计算机的用途)。最好让 JVM 设置始终保持 95% 最佳,而不是在所有机器上强制“始终使用 8 核积极收集以减少暂停时间”,其中一半最终是 t2.small。

Exception: When the application comes with a performance guide and specific tuning in place. It's perfectly okay to leave the provided settings as is.

例外:当应用程序附带性能指南和特定调整到位时。保留提供的设置是完全可以的。

Tip: Moving to a newer JVM to benefit from the latest improvements can sometimes provide a good boost without much effort.

提示:迁移到更新的 JVM 以从最新的改进中受益,有时可以毫不费力地提供良好的提升。

Special Case: -XX:+UseCompressedOops

特殊情况:-XX:+UseCompressedOops

The JVM has a special setting that forces using 32bits index internally (read: pointers-like). That allows to address 4?294?967?295 objects * 8 bytes address => 32 GB of memory. (NOT to be confused with the 4GB address space for REAL pointers).

JVM 有一个特殊的设置,强制在内部使用 32 位索引(读取:类似指针)。这允许寻址 4?294?967?295 个对象 * 8 字节地址 => 32 GB 内存。(不要与 REAL 指针的 4GB 地址空间混淆)。

It reduces the overall memory consumption with a potential positive impact on all caching levels.

它减少了整体内存消耗,对所有缓存级别都有潜在的积极影响。

Real life example: ElasticSearch documentation states that a running 32GB 32bits node may be equivalent to a 40GB 64bits node in terms of actual data kept in memory.

实际示例:ElasticSearch 文档指出,就内存中保存的实际数据而言,正在运行的 32GB 32 位节点可能等效于 40GB 64 位节点。

A note on history: The flag was known to be unstable in pre-java-7 era (maybe even pre-java-6). It's been working perfectly in newer JVM for a while.

历史注释:众所周知,该标志在 java-7 之前的时代(甚至 java-6 之前)是不稳定的。它已经在较新的 JVM 中完美运行了一段时间。

Java HotSpot?Virtual Machine Performance Enhancements

Java HotSpot?虚拟机性能增强

[...] In Java SE 7, use of compressed oops is the default for 64-bit JVM processes when -Xmx isn't specified and for values of -Xmx less than 32 gigabytes. For JDK 6 before the 6u23 release, use the -XX:+UseCompressedOops flag with the java command to enable the feature.

[...] 在 Java SE 7 中,当未指定 -Xmx 且 -Xmx 的值小于 32 GB 时,64 位 JVM 进程默认使用压缩 oops。对于 6u23 版本之前的 JDK 6,将 -XX:+UseCompressedOops 标志与 java 命令一起使用以启用该功能。

See: Once again the JVM is lights years ahead over manual tuning. Still, it's interesting to know about it =)

请参阅:JVM 再次领先于手动调整数年。尽管如此,了解它还是很有趣的 =)

Special Case: -XX:+UseNUMA

特殊情况:-XX:+UseNUMA

Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, the memory access time depends on the memory location relative to the processor. Source: Wikipedia

非均匀内存访问 (NUMA) 是一种用于多处理的计算机内存设计,内存访问时间取决于相对于处理器的内存位置。资料来源:维基百科

Modern systems have extremely complex memory architectures with multiple layers of memory and caches, either private and shared, across cores and CPU.

现代系统具有极其复杂的内存架构,具有跨内核和 CPU 的多层内存和缓存,无论是私有的还是共享的。

Quite obviously accessing a data in the L2 cache in the current processor is A LOT faster than having to go all the way to a memory stick from another socket.

很明显,访问当前处理器中 L2 缓存中的数据比必须从另一个插槽一直访问内存条要快得多。

I believe that all multi-socketsystems sold today are NUMA by design, while all consumers systems are NOT. Check whether your server supports NUMA with the command numactl --showon linux.

我相信所有的多插槽当今出售的系统是由NUMA设计,而所有的消费者系统都没有。使用numactl --showlinux上的命令检查您的服务器是否支持NUMA 。

The NUMA-aware flag tells the JVM to optimize memory allocations for the underlying hardware topology.

NUMA-aware 标志告诉 JVM 优化底层硬件拓扑的内存分配。

The performance boost can be substantial (i.e. two digits: +XX%). In fact someone switching from a "NOT-NUMA 10CPU 100GB" to a "NUMA 40CPU 400GB" might experience a [dramatic] loss in performances if he doesn't know about the flag.

性能提升可能很大(即两位数:+XX%)。事实上,如果有人从“NOT-NUMA 10CPU 100GB”切换到“NUMA 40CPU 400GB”,如果他不知道这个标志,他可能会经历[戏剧性的]性能损失。

Note: There are discussions to detect NUMA and set the flag automatically in the JVM http://openjdk.java.net/jeps/163

注意:有讨论检测 NUMA 并在 JVM 中自动设置标志http://openjdk.java.net/jeps/163

Bonus: All applications intending to run on big fat hardware (i.e. NUMA) needs to be optimized for it. It is not specific to Java applications.

奖励:所有打算在大型硬件(即 NUMA)上运行的应用程序都需要针对它进行优化。它并不特定于 Java 应用程序。

Toward the future: -XX:+UseG1GC

走向未来:-XX:+UseG1GC

The latest improvement in Garbage Collection is the G1 collector (read: Garbage First).

垃圾收集的最新改进是G1 收集器(阅读:垃圾优先)

It is intended for high cores, high memory systems. At the absolute minimum 4 cores + 6 GB memory. It is targeted toward databases and memory intensive applications using 10 times that and beyond.

它适用于高内核、高内存系统。绝对最低 4 核 + 6 GB 内存。它的目标是使用 10 倍甚至更多的数据库和内存密集型应用程序。

Short version, at these sizes the traditional GC are facing too much data to process at once and pauses are getting out of hand. The G1 splits the heap in many small sections that can be managed independently and in parallel while the application is running.

简而言之,在这些规模下,传统的 GC 面临着无法一次性处理的太多数据,并且暂停变得一发不可收拾。G1 将堆分成许多小部分,这些小部分可以在应用程序运行时独立和并行管理。

The first version was available in 2013. It is mature enough for production now but it will not be going as default anytime soon. That is worth a try for large applications.

第一个版本于 2013 年可用。它现在已经足够成熟,可以用于生产,但不会很快成为默认版本。对于大型应用程序,这值得一试。

Do not touch: Generation Sizes (NewGen, PermGen...)

请勿触摸:代大小(NewGen、PermGen...)

The GC split the memory in multiple sections. (Not getting into details, you can google "Java GC Generations".)

GC 将内存分成多个部分。(不深入细节,你可以谷歌“Java GC Generations”。)

The last time I've been spending a week to try 20 different combination of generations flags on an app taking 10000 hit/s. I was getting a magnificent boost ranging from -1% to +1%.

上一次我花了一个星期在一个应用程序上尝试了 20 种不同的世代标记组合,命中率为 10000 次/秒。我得到了从 -1% 到 +1% 的巨大提升。

Java GC generations are an interesting topic to read papers on or to write one about. They are not a thing to tune unless you're part of the 1% who can devote substantial time for negligible gains among the 1% of people who really need optimizations.

Java GC 生成是一个有趣的话题,可以阅读或撰写有关的论文。除非你是那 1% 的人中的一员,他们可以在真正需要优化的 1% 的人中投入大量时间来获得微不足道的收益,否则它们不是一个可以调整的东西。

Conclusion

结论

Hope this can help you. Have fun with the JVM.

希望这可以帮到你。享受 JVM 带来的乐趣。

Java is the best language and the best platform in the world! Go spread the love :D

Java是世界上最好的语言和最好的平台!去传播爱:D