为什么 StringBuilder#append(int) 在 Java 7 中比在 Java 8 中更快？

Question

提问by thkala

While investigating for a little debatew.r.t. using "" + nand Integer.toString(int)to convert an integer primitive to a string I wrote this JMHmicrobenchmark:

在调查关于使用并将整数原语转换为字符串的小辩论时"" + n，Integer.toString(int)我编写了这个JMH 微基准测试：

@Fork(1)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
public class IntStr {
    protected int counter;


    @GenerateMicroBenchmark
    public String integerToString() {
        return Integer.toString(this.counter++);
    }

    @GenerateMicroBenchmark
    public String stringBuilder0() {
        return new StringBuilder().append(this.counter++).toString();
    }

    @GenerateMicroBenchmark
    public String stringBuilder1() {
        return new StringBuilder().append("").append(this.counter++).toString();
    }

    @GenerateMicroBenchmark
    public String stringBuilder2() {
        return new StringBuilder().append("").append(Integer.toString(this.counter++)).toString();
    }

    @GenerateMicroBenchmark
    public String stringFormat() {
        return String.format("%d", this.counter++);
    }

    @Setup(Level.Iteration)
    public void prepareIteration() {
        this.counter = 0;
    }
}

I ran it with the default JMH options with both Java VMs that exist on my Linux machine (up-to-date Mageia 4 64-bit, Intel i7-3770 CPU, 32GB RAM). The first JVM was the one supplied with Oracle JDK 8u5 64-bit:

我使用 Linux 机器上存在的两个 Java VM（最新的 Mageia 4 64 位、Intel i7-3770 CPU、32GB RAM）使用默认 JMH 选项运行它。第一个 JVM 是随 Oracle JDK 8u5 64 位提供的：

java version "1.8.0_05"
Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)

With this JVM I got pretty much what I expected:

有了这个 JVM，我得到了我所期望的：

Benchmark                    Mode   Samples         Mean   Mean error    Units
b.IntStr.integerToString    thrpt        20    32317.048      698.703   ops/ms
b.IntStr.stringBuilder0     thrpt        20    28129.499      421.520   ops/ms
b.IntStr.stringBuilder1     thrpt        20    28106.692     1117.958   ops/ms
b.IntStr.stringBuilder2     thrpt        20    20066.939     1052.937   ops/ms
b.IntStr.stringFormat       thrpt        20     2346.452       37.422   ops/ms

I.e. using the StringBuilderclass is slower due to the additional overhead of creating the StringBuilderobject and appending an empty string. Using String.format(String, ...)is even slower, by an order of magnitude or so.

即，StringBuilder由于创建StringBuilder对象和附加空字符串的额外开销，使用类的速度较慢。使用String.format(String, ...)速度甚至更慢，大约一个数量级。

The distribution-provided compiler, on the other hand, is based on OpenJDK 1.7:

另一方面，发行版提供的编译器基于 OpenJDK 1.7：

java version "1.7.0_55"
OpenJDK Runtime Environment (mageia-2.4.7.1.mga4-x86_64 u55-b13)
OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)

The results here were interesting:

这里的结果很有趣：

Benchmark                    Mode   Samples         Mean   Mean error    Units
b.IntStr.integerToString    thrpt        20    31249.306      881.125   ops/ms
b.IntStr.stringBuilder0     thrpt        20    39486.857      663.766   ops/ms
b.IntStr.stringBuilder1     thrpt        20    41072.058      484.353   ops/ms
b.IntStr.stringBuilder2     thrpt        20    20513.913      466.130   ops/ms
b.IntStr.stringFormat       thrpt        20     2068.471       44.964   ops/ms

Why does StringBuilder.append(int)appear so much faster with this JVM? Looking at the StringBuilderclass source code revealed nothing particularly interesting - the method in question is almost identical to Integer#toString(int). Interestingly enough, appending the result of Integer.toString(int)(the stringBuilder2microbenchmark) does not appear to be faster.

为什么StringBuilder.append(int)使用这个 JVM会显示得如此之快？查看StringBuilder类源代码没有发现任何特别有趣的东西——所讨论的方法几乎与Integer#toString(int). 有趣的是，附加Integer.toString(int)（微stringBuilder2基准）的结果似乎并没有更快。

Is this performance discrepancy an issue with the testing harness? Or does my OpenJDK JVM contain optimizations that would affect this particular code (anti)-pattern?

这种性能差异是否是测试工具的问题？或者我的 OpenJDK JVM 是否包含会影响这个特定代码（反）模式的优化？

EDIT:

编辑：

For a more straight-forward comparison, I installed Oracle JDK 1.7u55:

为了进行更直接的比较，我安装了 Oracle JDK 1.7u55：

java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)

The results are similar to those of OpenJDK:

结果与OpenJDK类似：

Benchmark                    Mode   Samples         Mean   Mean error    Units
b.IntStr.integerToString    thrpt        20    32502.493      501.928   ops/ms
b.IntStr.stringBuilder0     thrpt        20    39592.174      428.967   ops/ms
b.IntStr.stringBuilder1     thrpt        20    40978.633      544.236   ops/ms

It seems that this is a more general Java 7 vs Java 8 issue. Perhaps Java 7 had more aggressive string optimizations?

似乎这是一个更普遍的 Java 7 与 Java 8 问题。也许 Java 7 有更积极的字符串优化？

EDIT 2:

编辑 2：

For completeness, here are the string-related VM options for both of these JVMs:

为完整起见，以下是这两个 JVM 的与字符串相关的 VM 选项：

For Oracle JDK 8u5:

对于 Oracle JDK 8u5：

$ /usr/java/default/bin/java -XX:+PrintFlagsFinal 2>/dev/null | grep String
     bool OptimizeStringConcat                      = true            {C2 product}
     intx PerfMaxStringConstLength                  = 1024            {product}
     bool PrintStringTableStatistics                = false           {product}
    uintx StringTableSize                           = 60013           {product}

For OpenJDK 1.7:

对于 OpenJDK 1.7：

$ java -XX:+PrintFlagsFinal 2>/dev/null | grep String
     bool OptimizeStringConcat                      = true            {C2 product}        
     intx PerfMaxStringConstLength                  = 1024            {product}           
     bool PrintStringTableStatistics                = false           {product}           
    uintx StringTableSize                           = 60013           {product}           
     bool UseStringCache                            = false           {product}

The UseStringCacheoption was removed in Java 8 with no replacement, so I doubt that makes any difference. The rest of the options appear to have the same settings.

该UseStringCache选项在 Java 8 中被删除，没有替换，所以我怀疑这有什么区别。其余选项似乎具有相同的设置。

EDIT 3:

编辑 3：

A side-by-side comparison of the source code of the AbstractStringBuilder, StringBuilderand Integerclasses from the src.zipfile of reveals nothing noteworty. Apart from a whole lot of cosmetic and documentation changes, Integernow has some support for unsigned integers and StringBuilderhas been slightly refactored to share more code with StringBuffer. None of these changes seem to affect the code paths used by StringBuilder#append(int), although I may have missed something.

的源代码的侧方比较AbstractStringBuilder，StringBuilder以及Integer从所述的类src.zip文件揭示了什么noteworty。除了大量的外观和文档更改之外，Integer现在还对无符号整数提供了一些支持，并且StringBuilder已经稍微重构以与StringBuffer. 这些更改似乎都不会影响使用的代码路径StringBuilder#append(int)，尽管我可能遗漏了一些东西。

A comparison of the assembly code generated for IntStr#integerToString()and IntStr#stringBuilder0()is far more interesting. The basic layout of the code generated for IntStr#integerToString()was similar for both JVMs, although Oracle JDK 8u5 seemed to be more aggressive w.r.t. inlining some calls within the Integer#toString(int)code. There was a clear correspondence with the Java source code, even for someone with minimal assembly experience.

为IntStr#integerToString()和生成的汇编代码的比较IntStr#stringBuilder0()要有趣得多。IntStr#integerToString()为两种 JVM生成的代码的基本布局是相似的，尽管 Oracle JDK 8u5 似乎更积极地在Integer#toString(int)代码中内联了一些调用。与 Java 源代码有明确的对应关系，即使对于汇编经验极少的人也是如此。

The assembly code for IntStr#stringBuilder0(), however, was radically different. The code generated by Oracle JDK 8u5 was once again directly related to the Java source code - I could easily recognise the same layout. On the contrary, the code generated by OpenJDK 7 was almost unrecognisable to the untrained eye (like mine). The new StringBuilder()call was seemingly removed, as was the creation of the array in the StringBuilderconstructor. Additionaly, the disassembler plugin was not able to provide as many references to the source code as it did in JDK 8.

IntStr#stringBuilder0()但是，的汇编代码完全不同。Oracle JDK 8u5 生成的代码再次与 Java 源代码直接相关——我可以轻松识别相同的布局。相反，OpenJDK 7 生成的代码对于未受过训练的眼睛几乎无法识别（就像我的一样）。该new StringBuilder()呼叫被看似除去，因为是在所述阵列的创建StringBuilder构造函数。此外，反汇编插件无法像在 JDK 8 中那样提供尽可能多的源代码引用。

I assume that this is either the result of a much more aggressive optimization pass in OpenJDK 7, or more probably the result of inserting hand-written low-level code for certain StringBuilderoperations. I am unsure why this optimization does not happen in my JVM 8 implementation or why the same optimizations were not implemented for Integer#toString(int)in JVM 7. I guess someone familiar with the related parts of the JRE source code would have to answer these questions...

我认为这要么是 OpenJDK 7 中更积极的优化传递的结果，要么更可能是为某些StringBuilder操作插入手写低级代码的结果。我不确定为什么这种优化不会在我的 JVM 8 实现中发生，或者为什么Integer#toString(int)在 JVM 7中没有实现相同的优化。我想熟悉 JRE 源代码相关部分的人必须回答这些问题......

Answer 1

采纳答案by Aleksey Shipilev

TL;DR:Side effects in appendapparently break StringConcat optimizations.

TL;DR：append明显破坏 StringConcat 优化的副作用。

Very good analysis in the original question and updates!

原始问题和更新中的分析非常好！

For completeness, below are a few missing steps:

为了完整起见，以下是一些缺失的步骤：

See through the -XX:+PrintInliningfor both 7u55 and 8u5. In 7u55, you will see something like this:

 @ 16   org.sample.IntStr::inlineSideEffect (25 bytes)   force inline by CompilerOracle
   @ 4   java.lang.StringBuilder::<init> (7 bytes)   inline (hot)
   @ 18   java.lang.StringBuilder::append (8 bytes)   already compiled into a big method
   @ 21   java.lang.StringBuilder::toString (17 bytes)   inline (hot)

...and in 8u5:

 @ 16   org.sample.IntStr::inlineSideEffect (25 bytes)   force inline by CompilerOracle
   @ 4   java.lang.StringBuilder::<init> (7 bytes)   inline (hot)
     @ 3   java.lang.AbstractStringBuilder::<init> (12 bytes)   inline (hot)
       @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
   @ 18   java.lang.StringBuilder::append (8 bytes)   inline (hot)
     @ 2   java.lang.AbstractStringBuilder::append (62 bytes)   already compiled into a big method
   @ 21   java.lang.StringBuilder::toString (17 bytes)   inline (hot)
     @ 13   java.lang.String::<init> (62 bytes)   inline (hot)
       @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
       @ 55   java.util.Arrays::copyOfRange (63 bytes)   inline (hot)
         @ 54   java.lang.Math::min (11 bytes)   (intrinsic)
         @ 57   java.lang.System::arraycopy (0 bytes)   (intrinsic)

You might notice that 7u55 version is shallower, and it looks like nothing is called after StringBuildermethods -- this is a good indication the string optimizations are in effect. Indeed, if you run 7u55 with -XX:-OptimizeStringConcat, the subcalls will reappear, and performance drops to 8u5 levels.

OK, so we need to figure out why 8u5 does not do the same optimization. Grep http://hg.openjdk.java.net/jdk9/jdk9/hotspotfor "StringBuilder" to figure out where VM handles the StringConcat optimization; this will get you into src/share/vm/opto/stringopts.cpp

hg log src/share/vm/opto/stringopts.cppto figure out the latest changes there. One of the candidates would be:

changeset:   5493:90abdd727e64
user:        iveresov
date:        Wed Oct 16 11:13:15 2013 -0700
summary:     8009303: Tiered: incorrect results in VM tests stringconcat...

Look for the review threads on OpenJDK mailing lists (easy enough to google for changeset summary): http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-October/012084.html
Spot "String concat optimization optimization collapses the pattern [...] into a single allocation of a string and forming the result directly. All possible deopts that may happen in the optimized code restart this pattern from the beginning (starting from the StringBuffer allocation). That means that the whole pattern must me side-effect free." Eureka?

Write out the contrasting benchmark:

@Fork(5)
@Warmup(iterations = 5)
@Measurement(iterations = 5)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
public class IntStr {
    private int counter;

    @GenerateMicroBenchmark
    public String inlineSideEffect() {
        return new StringBuilder().append(counter++).toString();
    }

    @GenerateMicroBenchmark
    public String spliceSideEffect() {
        int cnt = counter++;
        return new StringBuilder().append(cnt).toString();
    }
}

Measure it on JDK 7u55, seeing the same performance for inlined/spliced side effects:

Benchmark                       Mode   Samples         Mean   Mean error    Units
o.s.IntStr.inlineSideEffect     avgt        25       65.460        1.747    ns/op
o.s.IntStr.spliceSideEffect     avgt        25       64.414        1.323    ns/op

Measure it on JDK 8u5, seeing the performance degradation with the inlined effect:

Benchmark                       Mode   Samples         Mean   Mean error    Units
o.s.IntStr.inlineSideEffect     avgt        25       84.953        2.274    ns/op
o.s.IntStr.spliceSideEffect     avgt        25       65.386        1.194    ns/op

Submit the bug report (https://bugs.openjdk.java.net/browse/JDK-8043677) to discuss this behavior with VM guys. The rationale for original fix is rock solid, it is interesting however if we can/should get back this optimization in some trivial cases like these.
???
PROFIT.

看穿-XX:+PrintInlining7u55 和 8u5。在 7u55 中，您将看到如下内容：

 @ 16   org.sample.IntStr::inlineSideEffect (25 bytes)   force inline by CompilerOracle
   @ 4   java.lang.StringBuilder::<init> (7 bytes)   inline (hot)
   @ 18   java.lang.StringBuilder::append (8 bytes)   already compiled into a big method
   @ 21   java.lang.StringBuilder::toString (17 bytes)   inline (hot)

...在 8u5 中：

 @ 16   org.sample.IntStr::inlineSideEffect (25 bytes)   force inline by CompilerOracle
   @ 4   java.lang.StringBuilder::<init> (7 bytes)   inline (hot)
     @ 3   java.lang.AbstractStringBuilder::<init> (12 bytes)   inline (hot)
       @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
   @ 18   java.lang.StringBuilder::append (8 bytes)   inline (hot)
     @ 2   java.lang.AbstractStringBuilder::append (62 bytes)   already compiled into a big method
   @ 21   java.lang.StringBuilder::toString (17 bytes)   inline (hot)
     @ 13   java.lang.String::<init> (62 bytes)   inline (hot)
       @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
       @ 55   java.util.Arrays::copyOfRange (63 bytes)   inline (hot)
         @ 54   java.lang.Math::min (11 bytes)   (intrinsic)
         @ 57   java.lang.System::arraycopy (0 bytes)   (intrinsic)

您可能会注意到 7u55 版本更浅，并且看起来在StringBuilder方法之后没有调用任何东西——这是字符串优化生效的一个很好的迹象。实际上，如果您使用运行 7u55 -XX:-OptimizeStringConcat，子调用将重新出现，并且性能下降到 8u5 级别。

好的，那么我们需要弄清楚为什么8u5没有做同样的优化。Grep http://hg.openjdk.java.net/jdk9/jdk9/hotspotfor "StringBuilder" 找出 VM 在哪里处理 StringConcat 优化；这会让你进入src/share/vm/opto/stringopts.cpp

hg log src/share/vm/opto/stringopts.cpp找出那里的最新变化。其中一位候选人是：

changeset:   5493:90abdd727e64
user:        iveresov
date:        Wed Oct 16 11:13:15 2013 -0700
summary:     8009303: Tiered: incorrect results in VM tests stringconcat...

在 OpenJDK 邮件列表上查找评论线程（很容易用谷歌搜索变更集摘要）：http: //mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-October/012084.html
现货“字符串concat优化优化将模式[...]折叠成一个字符串的单个分配并直接形成结果。优化代码中可能发生的所有可能的deopts从头开始重新启动该模式（从StringBuffer分配开始） .那就是说，整个模式必须对我无副作用。“尤里卡？

写出对比基准：

@Fork(5)
@Warmup(iterations = 5)
@Measurement(iterations = 5)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
public class IntStr {
    private int counter;

    @GenerateMicroBenchmark
    public String inlineSideEffect() {
        return new StringBuilder().append(counter++).toString();
    }

    @GenerateMicroBenchmark
    public String spliceSideEffect() {
        int cnt = counter++;
        return new StringBuilder().append(cnt).toString();
    }
}

在 JDK 7u55 上测量它，看到内联/拼接副作用的相同性能：

Benchmark                       Mode   Samples         Mean   Mean error    Units
o.s.IntStr.inlineSideEffect     avgt        25       65.460        1.747    ns/op
o.s.IntStr.spliceSideEffect     avgt        25       64.414        1.323    ns/op

在 JDK 8u5 上测量，看到性能下降和内联效果：

Benchmark                       Mode   Samples         Mean   Mean error    Units
o.s.IntStr.inlineSideEffect     avgt        25       84.953        2.274    ns/op
o.s.IntStr.spliceSideEffect     avgt        25       65.386        1.194    ns/op

提交错误报告 ( https://bugs.openjdk.java.net/browse/JDK-8043677) 与 VM 人员讨论此行为。原始修复的基本原理是坚如磐石，有趣的是，如果我们可以/应该在像这样的一些微不足道的情况下恢复这种优化。
？？？
利润。

And yeah, I should post the results for the benchmark which moves the increment from the StringBuilderchain, doing it before the entire chain. Also, switched to average time, and ns/op. This is JDK 7u55:

是的，我应该发布基准测试的结果，该基准从StringBuilder链中移动增量，在整个链之前进行。此外，切换到平均时间和 ns/op。这是 JDK 7u55：

Benchmark                      Mode   Samples         Mean   Mean error    Units
o.s.IntStr.integerToString     avgt        25      153.805        1.093    ns/op
o.s.IntStr.stringBuilder0      avgt        25      128.284        6.797    ns/op
o.s.IntStr.stringBuilder1      avgt        25      131.524        3.116    ns/op
o.s.IntStr.stringBuilder2      avgt        25      254.384        9.204    ns/op
o.s.IntStr.stringFormat        avgt        25     2302.501      103.032    ns/op

Benchmark                      Mode   Samples         Mean   Mean error    Units
o.s.IntStr.integerToString     avgt        25      153.805        1.093    ns/op
o.s.IntStr.stringBuilder0      avgt        25      128.284        6.797    ns/op
o.s.IntStr.stringBuilder1      avgt        25      131.524        3.116    ns/op
o.s.IntStr.stringBuilder2      avgt        25      254.384        9.204    ns/op
o.s.IntStr.stringFormat        avgt        25     2302.501      103.032    ns/op

And this is 8u5:

这是8u5：

Benchmark                      Mode   Samples         Mean   Mean error    Units
o.s.IntStr.integerToString     avgt        25      153.032        3.295    ns/op
o.s.IntStr.stringBuilder0      avgt        25      127.796        1.158    ns/op
o.s.IntStr.stringBuilder1      avgt        25      131.585        1.137    ns/op
o.s.IntStr.stringBuilder2      avgt        25      250.980        2.773    ns/op
o.s.IntStr.stringFormat        avgt        25     2123.706       25.105    ns/op

Benchmark                      Mode   Samples         Mean   Mean error    Units
o.s.IntStr.integerToString     avgt        25      153.032        3.295    ns/op
o.s.IntStr.stringBuilder0      avgt        25      127.796        1.158    ns/op
o.s.IntStr.stringBuilder1      avgt        25      131.585        1.137    ns/op
o.s.IntStr.stringBuilder2      avgt        25      250.980        2.773    ns/op
o.s.IntStr.stringFormat        avgt        25     2123.706       25.105    ns/op

stringFormatis actually a bit faster in 8u5, and all other tests are the same. This solidifies the hypothesis the side-effect breakage in SB chains in the major culprit in the original question.

stringFormat实际上在 8u5 中要快一些，所有其他测试都相同。这巩固了原始问题的主要罪魁祸首 SB 链中的副作用断裂的假设。

Answer 2

回答by Alex Suo

I think this has to do with the CompileThresholdflag which controls when the byte code is compiled into machine code by JIT.

我认为这与CompileThreshold控制字节码何时被 JIT 编译成机器码的标志有关。

The Oracle JDK has a default count of 10,000 as document at http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html.

Oracle JDK 的默认计数为 10,000，作为http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html 上的文档。

Where OpenJDK I couldn't find a latest document on this flag; but some mail threads suggest a much lower threshold: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2010-November/004239.html

在 OpenJDK 中，我找不到有关此标志的最新文档；但一些邮件线程建议一个低得多的阈值：http: //mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2010-November/004239.html

Also, try turn on / off the Oracle JDK flags like -XX:+UseCompressedStringsand -XX:+OptimizeStringConcat. I am not sure if those flags are turned on by default on OpenJDK though. Could someone please suggest.

此外，尝试打开/关闭 Oracle JDK 标志，如-XX:+UseCompressedStrings和-XX:+OptimizeStringConcat。不过，我不确定 OpenJDK 上是否默认启用了这些标志。有人可以建议。

One experiement you can do, is to firstly run the program by a lot of times, say, 30,000 loops, do a System.gc() and then try to look at the performance. I believe they would yield the same.

您可以做的一个实验是，首先将程序运行很多次，例如 30,000 次循环，然后执行 System.gc()，然后尝试查看性能。我相信他们会产生同样的结果。

And I assume your GC setting is the same too. Otherwise you are allocating a lot of objects and the GC might well be the major part of your run time.

而且我假设您的 GC 设置也相同。否则，您将分配大量对象，而 GC 很可能是您运行时的主要部分。

为什么 StringBuilder#append(int) 在 Java 7 中比在 Java 8 中更快？

提问by thkala

采纳答案by Aleksey Shipilev

回答by Alex Suo

相关推荐

最近更新

标签

为什么 StringBuilder#append(int) 在 Java 7 中比在 Java 8 中更快？

提问by thkala

采纳答案by Aleksey Shipilev

回答by Alex Suo

相关推荐

Java “使用未经检查或不安全的操作”

Java BeanCreationException NoSuchBeanDefinitionException 为以下 Spring 代码启动 Google App Server 时

Java Liquibase：如何在 MySQL 数据库表上设置字符集 UTF-8？

Java RSA 加密 :InvalidKeyException: 无效的密钥格式

相关推荐

最近更新

标签