如何在 Java 中编写正确的微基准测试?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/504103/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 15:30:14  来源:igfitidea点击:

How do I write a correct micro-benchmark in Java?

javajvmbenchmarkingjvm-hotspotmicrobenchmark

提问by John Nilsson

How do you write (and run) a correct micro-benchmark in Java?

您如何在 Java 中编写(并运行)正确的微基准测试?

I'm looking for some code samples and comments illustrating various things to think about.

我正在寻找一些代码示例和注释来说明要考虑的各种事情。

Example: Should the benchmark measure time/iteration or iterations/time, and why?

示例:基准测试应该测量时间/迭代还是迭代/时间,为什么?

Related: Is stopwatch benchmarking acceptable?

相关:秒表基准测试是否可以接受?

采纳答案by Peter Lawrey

Tips about writing micro benchmarks from the creators of Java HotSpot:

Java HotSpot 创建者关于编写微基准测试的提示:

Rule 0:Read a reputable paper on JVMs and micro-benchmarking. A good one is Brian Goetz, 2005. Do not expect too much from micro-benchmarks; they measure only a limited range of JVM performance characteristics.

规则 0:阅读有关 JVM 和微基准测试的知名论文。一个好的是Brian Goetz,2005 年。不要对微基准期望过高;它们仅测量有限范围的 JVM 性能特征。

Rule 1:Always include a warmup phase which runs your test kernel all the way through, enough to trigger all initializations and compilations before timing phase(s). (Fewer iterations is OK on the warmup phase. The rule of thumb is several tens of thousands of inner loop iterations.)

规则 1:始终包括一个预热阶段,它一直运行您的测试内核,足以在计时阶段之前触发所有初始化和编译。(预热阶段的迭代次数较少。经验法则是数万次内循环迭代。)

Rule 2:Always run with -XX:+PrintCompilation, -verbose:gc, etc., so you can verify that the compiler and other parts of the JVM are not doing unexpected work during your timing phase.

规则 2:始终使用-XX:+PrintCompilation-verbose:gc等运行,这样您就可以验证编译器和 JVM 的其他部分在您的计时阶段没有做意外的工作。

Rule 2.1:Print messages at the beginning and end of timing and warmup phases, so you can verify that there is no output from Rule 2 during the timing phase.

规则 2.1:在计时和预热阶段的开始和结束时打印消息,以便您可以验证在计时阶段没有规则 2 的输出。

Rule 3:Be aware of the difference between -clientand -server, and OSR and regular compilations. The -XX:+PrintCompilationflag reports OSR compilations with an at-sign to denote the non-initial entry point, for example: Trouble$1::run @ 2 (41 bytes). Prefer server to client, and regular to OSR, if you are after best performance.

规则 3:注意-client-server、 OSR 和常规编译之间的区别。的-XX:+PrintCompilation标志报告OSR汇编与at符号来表示非初始入口点,例如:Trouble$1::run @ 2 (41 bytes)。如果您追求最佳性能,则更喜欢服务器而不是客户端,并且更喜欢 OSR。

Rule 4:Be aware of initialization effects. Do not print for the first time during your timing phase, since printing loads and initializes classes. Do not load new classes outside of the warmup phase (or final reporting phase), unless you are testing class loading specifically (and in that case load only the test classes). Rule 2 is your first line of defense against such effects.

规则 4:注意初始化效果。不要在计时阶段第一次打印,因为打印会加载和初始化类。不要在预热阶段(或最终报告阶段)之外加载新类,除非您专门测试类加载(在这种情况下只加载测试类)。规则 2 是您抵御此类影响的第一道防线。

Rule 5:Be aware of deoptimization and recompilation effects. Do not take any code path for the first time in the timing phase, because the compiler may junk and recompile the code, based on an earlier optimistic assumption that the path was not going to be used at all. Rule 2 is your first line of defense against such effects.

规则 5:注意反优化和重新编译的影响。在计时阶段不要第一次使用任何代码路径,因为编译器可能会根据早先的乐观假设,即根本不会使用该路径,而重新编译代码。规则 2 是您抵御此类影响的第一道防线。

Rule 6:Use appropriate tools to read the compiler's mind, and expect to be surprised by the code it produces. Inspect the code yourself before forming theories about what makes something faster or slower.

规则 6:使用适当的工具来读懂编译器的想法,并期望对它生成的代码感到惊讶。在形成关于什么使某些东西更快或更慢的理论之前,自己检查代码。

Rule 7:Reduce noise in your measurements. Run your benchmark on a quiet machine, and run it several times, discarding outliers. Use -Xbatchto serialize the compiler with the application, and consider setting -XX:CICompilerCount=1to prevent the compiler from running in parallel with itself. Try your best to reduce GC overhead, set Xmx(large enough) equals Xmsand use UseEpsilonGCif it is available.

规则 7:减少测量中的噪声。在安静的机器上运行您的基准测试,并运行几次,丢弃异常值。用于-Xbatch将编译器与应用程序序列化,并考虑设置-XX:CICompilerCount=1以防止编译器与其自身并行运行。尽量减少 GC 开销,设置Xmx(足够大)equalsXmsUseEpsilonGC在可用时使用。

Rule 8:Use a library for your benchmark as it is probably more efficient and was already debugged for this sole purpose. Such as JMH, Caliperor Bill and Paul's Excellent UCSD Benchmarks for Java.

规则 8:为您的基准测试使用库,因为它可能更有效,并且已经为此专门进行了调试。例如JMHCaliperBill and Paul 的优秀 UCSD Benchmarks for Java

回答by Jon Skeet

Important things for Java benchmarks are:

Java 基准测试的重要事项是:

  • Warm up the JIT first by running the code several times before timingit
  • Make sure you run it for long enough to be able to measure the results in seconds or (better) tens of seconds
  • While you can't call System.gc()between iterations, it's a good idea to run it between tests, so that each test will hopefully get a "clean" memory space to work with. (Yes, gc()is more of a hint than a guarantee, but it's very likelythat it really will garbage collect in my experience.)
  • I like to display iterations and time, and a score of time/iteration which can be scaled such that the "best" algorithm gets a score of 1.0 and others are scored in a relative fashion. This means you can run allalgorithms for a longish time, varying both number of iterations and time, but still getting comparable results.
  • 先热身JIT通过运行代码几次定时之前
  • 确保运行时间足够长,以便能够在几秒钟或(更好)几十秒内测量结果
  • 虽然您不能System.gc()在迭代之间调用,但在测试之间运行它是个好主意,这样每个测试都有望获得一个“干净”的内存空间来使用。(是的,gc()与其说是保证,不如说是一种暗示,但根据我的经验,它很可能真的会进行垃圾收集。)
  • 我喜欢显示迭代和时间,以及可以缩放的时间/迭代分数,以便“最佳”算法获得 1.0 分,其他算法以相对方式评分。这意味着您可以长时间运行所有算法,改变迭代次数和时间,但仍然获得可比较的结果。

I'm just in the process of blogging about the design of a benchmarking framework in .NET. I've got a coupleof earlier postswhich may be able to give you some ideas - not everything will be appropriate, of course, but some of it may be.

我正在撰写有关 .NET 中基准测试框架设计的博客。我有一对夫妇较早的帖子这或许可以给你一些想法-而不是一切都将是合适的,当然,但它的一些可能。

回答by Mnementh

There are many possible pitfalls for writing micro-benchmarks in Java.

用 Java 编写微基准测试有很多可能的陷阱。

First: You have to calculate with all sorts of events that take time more or less random: Garbage collection, caching effects (of OS for files and of CPU for memory), IO etc.

首先:您必须计算各种需要时间或多或少随机的事件:垃圾收集、缓存效果(OS 用于文件和 CPU 用于内存)、IO 等。

Second: You cannot trust the accuracy of the measured times for very short intervals.

第二:您不能相信非常短的时间间隔内测量时间的准确性。

Third: The JVM optimizes your code while executing. So different runs in the same JVM-instance will become faster and faster.

第三:JVM 在执行时优化您的代码。因此,在同一个 JVM 实例中的不同运行将变得越来越快。

My recommendations: Make your benchmark run some seconds, that is more reliable than a runtime over milliseconds. Warm up the JVM (means running the benchmark at least once without measuring, that the JVM can run optimizations). And run your benchmark multiple times (maybe 5 times) and take the median-value. Run every micro-benchmark in a new JVM-instance (call for every benchmark new Java) otherwise optimization effects of the JVM can influence later running tests. Don't execute things, that aren't executed in the warmup-phase (as this could trigger class-load and recompilation).

我的建议:让您的基准测试运行几秒钟,这比超过毫秒的运行时间更可靠。预热 JVM(意味着在不测量的情况下至少运行一次基准测试,JVM 可以运行优化)。并多次运行您的基准测试(可能是 5 次)并取中值。在新的 JVM 实例中运行每个微基准测试(调用每个基准测试新 Java),否则 JVM 的优化效果会影响以后运行的测试。不要执行在预热阶段未执行的事情(因为这可能会触发类加载和重新编译)。

回答by Kip

If you are trying to compare two algorithms, do at least two benchmarks for each, alternating the order. i.e.:

如果您尝试比较两种算法,请为每个算法至少做两个基准测试,交替顺序。IE:

for(i=1..n)
  alg1();
for(i=1..n)
  alg2();
for(i=1..n)
  alg2();
for(i=1..n)
  alg1();

I have found some noticeable differences (5-10% sometimes) in the runtime of the same algorithm in different passes..

我发现相同算法在不同阶段的运行时有一些明显的差异(有时 5-10%)。

Also, make sure that nis very large, so that the runtime of each loop is at the very least 10 seconds or so. The more iterations, the more significant figures in your benchmark time and the more reliable that data is.

此外,请确保n非常大,以便每个循环的运行时间至少为 10 秒左右。迭代次数越多,基准时间中的重要数字就越多,数据就越可靠。

回答by Peter ?tibrany

Make sure you somehow use results which are computed in benchmarked code. Otherwise your code can be optimized away.

确保您以某种方式使用在基准代码中计算的结果。否则你的代码可以被优化掉。

回答by Peter Lawrey

Should the benchmark measure time/iteration or iterations/time, and why?

基准测试应该测量时间/迭代还是迭代/时间,为什么?

It depends on whatyou are trying to test.

这要看是什么你想测试。

If you are interested in latency, use time/iteration and if you are interested in throughput, use iterations/time.

如果您对延迟感兴趣,请使用时间/迭代,如果您对吞吐量感兴趣,请使用迭代/时间。

回答by Yuriy

http://opt.sourceforge.net/Java Micro Benchmark - control tasks required to determine the comparative performance characteristics of the computer system on different platforms. Can be used to guide optimization decisions and to compare different Java implementations.

http://opt.sourceforge.net/Java Micro Benchmark - 确定计算机系统在不同平台上的比较性能特征所需的控制任务。可用于指导优化决策和比较不同的 Java 实现。

回答by SpaceTrucker

It should also be noted that it might also be important to analyze the results of the micro benchmark when comparing different implementations. Therefore a significance testshould be made.

还应该注意的是,在比较不同的实现时,分析微基准测试的结果可能也很重要。因此应进行显着性检验

This is because implementation Amight be faster during most of the runs of the benchmark than implementation B. But Amight also have a higher spread, so the measured performance benefit of Awon't be of any significance when compared with B.

这是因为A在基准测试的大多数运行期间,实现可能比实现更快B。但A也可能有更高的传播,因此AB.

So it is also important to write and run a micro benchmark correctly, but also to analyze it correctly.

因此,正确编写和运行微基准测试也很重要,但也要正确分析它。

回答by assylias

jmhis a recent addition to OpenJDK and has been written by some performance engineers from Oracle. Certainly worth a look.

jmh是 OpenJDK 的最新成员,由 Oracle 的一些性能工程师编写。当然值得一看。

The jmh is a Java harness for building, running, and analysing nano/micro/macro benchmarks written in Java and other languages targetting the JVM.

jmh 是一个 Java 工具,用于构建、运行和分析用 Java 和其他面向 JVM 的语言编写的 nano/micro/macro 基准测试。

Very interesting pieces of information buried in the sample tests comments.

隐藏在示例测试评论非常有趣的信息。

See also:

也可以看看: