Scala 中的高效字符串连接
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25628257/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Efficient string concatenation in Scala
提问by deamon
The JVM optimzes String concatenation with +and replaces it with a StringBuilder. This should be the same in Scala. But what happens if strings are concatenated with ++=?
JVM 使用 优化字符串连接+并将其替换为StringBuilder. 这在 Scala 中应该是一样的。但是如果字符串与 连接会发生什么++=?
var x = "x"
x ++= "y"
x ++= "z"
As far as I know this methods treats strings like char seqences, so even if the JVM would create a StringBuilderit would lead to many method calls, right? Would it be better to use a StringBuilder instead?
据我所知,此方法将字符串视为字符序列,因此即使 JVM 会创建一个StringBuilder它也会导致许多方法调用,对吗?改用 StringBuilder 会更好吗?
To what type is the String converted implicitly?
String 隐式转换为什么类型?
采纳答案by som-snytt
Actually, the inconvenient truth is StringOpsusually remains an allocation:
实际上,不方便的事实StringOps通常仍然是一个分配:
scala> :pa
// Entering paste mode (ctrl-D to finish)
class Concat {
var x = "x"
x ++= "y"
x ++= "z"
}
// Exiting paste mode, now interpreting.
defined class Concat
scala> :javap -prv Concat
Binary file Concat contains $line3.$read$$iw$$iw$Concat
Size 1211 bytes
MD5 checksum 1900522728cbb0ed0b1d3f8b962667ad
Compiled from "<console>"
public class $line3.$read$$iw$$iw$Concat
SourceFile: "<console>"
[snip]
public $line3.$read$$iw$$iw$Concat();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=6, locals=1, args_size=1
0: aload_0
1: invokespecial #19 // Method java/lang/Object."<init>":()V
4: aload_0
5: ldc #20 // String x
7: putfield #10 // Field x:Ljava/lang/String;
10: aload_0
11: new #22 // class scala/collection/immutable/StringOps
14: dup
15: getstatic #28 // Field scala/Predef$.MODULE$:Lscala/Predef$;
18: aload_0
19: invokevirtual #30 // Method x:()Ljava/lang/String;
22: invokevirtual #34 // Method scala/Predef$.augmentString:(Ljava/lang/String;)Ljava/lang/String;
25: invokespecial #36 // Method scala/collection/immutable/StringOps."<init>":(Ljava/lang/String;)V
28: new #22 // class scala/collection/immutable/StringOps
31: dup
32: getstatic #28 // Field scala/Predef$.MODULE$:Lscala/Predef$;
35: ldc #38 // String y
37: invokevirtual #34 // Method scala/Predef$.augmentString:(Ljava/lang/String;)Ljava/lang/String;
40: invokespecial #36 // Method scala/collection/immutable/StringOps."<init>":(Ljava/lang/String;)V
43: getstatic #28 // Field scala/Predef$.MODULE$:Lscala/Predef$;
46: invokevirtual #42 // Method scala/Predef$.StringCanBuildFrom:()Lscala/collection/generic/CanBuildFrom;
49: invokevirtual #46 // Method scala/collection/immutable/StringOps.$plus$plus:(Lscala/collection/GenTraversableOnce;Lscala/collection/generic/CanBuildFrom;)Ljava/lang/Object;
52: checkcast #48 // class java/lang/String
55: invokevirtual #50 // Method x_$eq:(Ljava/lang/String;)V
See more demonstration at this answer.
在此答案中查看更多演示。
Edit: To say more, you're building the String on each reassignment, so, no you're not using a single StringBuilder.
编辑:说得更多,您正在每次重新分配时构建 String,因此,不,您没有使用单个StringBuilder.
However, the optimization is done by javacand not the JIT compiler, so to compare fruits of the same kind:
但是,优化是由javacJIT 编译器完成的,而不是由 JIT 编译器完成的,因此比较同类结果:
public class Strcat {
public String strcat(String s) {
String t = " hi ";
String u = " by ";
return s + t + u; // OK
}
public String strcat2(String s) {
String t = s + " hi ";
String u = t + " by ";
return u; // bad
}
}
whereas
然而
$ scala
Welcome to Scala version 2.11.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_11).
Type in expressions to have them evaluated.
Type :help for more information.
scala> :se -Xprint:typer
scala> class K { def f(s: String, t: String, u: String) = s ++ t ++ u }
[[syntax trees at end of typer]] // <console>
def f(s: String, t: String, u: String): String = scala.this.Predef.augmentString(scala.this.Predef.augmentString(s).++[Char, String](scala.this.Predef.augmentString(t))(scala.this.Predef.StringCanBuildFrom)).++[Char, String](scala.this.Predef.augmentString(u))(scala.this.Predef.StringCanBuildFrom)
is bad. Or, worse, to unroll Rex's explanation:
不好。或者,更糟糕的是,展开 Rex 的解释:
"abc" ++ "def"
augmentString("abc").++[Char, String](augmentString("def"))(StringCanBuildFrom)
collection.mutable.StringBuilder.newBuilder ++= new WrappedString(augmentString("def"))
val b = collection.mutable.StringBuilder.newBuilder
new WrappedString(augmentString("def")) foreach b.+=
As Rex explained, StringBuilderoverrides ++=(String)but not Growable.++=(Traversable[Char]).
正如 Rex 解释的那样,StringBuilder覆盖++=(String)但不是Growable.++=(Traversable[Char]).
In case you've ever wondered what unaugmentStringis for:
如果您曾经想知道什么unaugmentString是用于:
28: invokevirtual #40 // Method scala/Predef$.augmentString:(Ljava/lang/String;)Ljava/lang/String;
31: invokevirtual #43 // Method scala/Predef$.unaugmentString:(Ljava/lang/String;)Ljava/lang/String;
34: invokespecial #46 // Method scala/collection/immutable/WrappedString."<init>":(Ljava/lang/String;)V
And just to show that you do finally call unadorned +=(Char)but after boxing and unboxing:
并且只是为了表明您确实最终调用了 unadorned+=(Char)但在装箱和拆箱之后:
public final scala.collection.mutable.StringBuilder apply(char);
flags: ACC_PUBLIC, ACC_FINAL
Code:
stack=2, locals=2, args_size=2
0: aload_0
1: getfield #19 // Field b:Lscala/collection/mutable/StringBuilder;
4: iload_1
5: invokevirtual #24 // Method scala/collection/mutable/StringBuilder.$plus$eq:(C)Lscala/collection/mutable/StringBuilder;
8: areturn
LocalVariableTable:
Start Length Slot Name Signature
0 9 0 this L$line10/$read$$iw$$iw$$anonfun;
0 9 1 x C
LineNumberTable:
line 9: 0
public final java.lang.Object apply(java.lang.Object);
flags: ACC_PUBLIC, ACC_FINAL, ACC_BRIDGE, ACC_SYNTHETIC
Code:
stack=2, locals=2, args_size=2
0: aload_0
1: aload_1
2: invokestatic #35 // Method scala/runtime/BoxesRunTime.unboxToChar:(Ljava/lang/Object;)C
5: invokevirtual #37 // Method apply:(C)Lscala/collection/mutable/StringBuilder;
8: areturn
LocalVariableTable:
Start Length Slot Name Signature
0 9 0 this L$line10/$read$$iw$$iw$$anonfun;
0 9 1 v1 Ljava/lang/Object;
LineNumberTable:
line 9: 0
A good laugh does get some oxygen into the bloodstream.
开怀大笑确实可以让一些氧气进入血液。
回答by Rex Kerr
There is a huge HUGEdifference in time taken.
花费的时间有巨大的差异。
If you add strings repeatedly using +=you do notoptimize away the O(n^2)cost of creating incrementally longer strings. So for adding one or two you won't see a difference, but it doesn't scale; by the time you get to adding 100 (short) strings, using a StringBuilder is over 20x faster. (Precise data: 1.3 us vs. 27.1 us to add the string representations of the numbers 0 to 100; timings should be reproducible to about += 5% and of course are for my machine.)
如果使用重复添加字符串+=,则不会优化O(n^2)创建增量更长字符串的成本。因此,添加一两个您不会看到差异,但它不会扩展;到添加 100 个(短)字符串时,使用 StringBuilder 的速度要快 20 倍以上。(精确数据:1.3 us vs. 27.1 us 添加数字 0 到 100 的字符串表示;时间应该可重现到大约 += 5%,当然适用于我的机器。)
Using ++=on a varStringis far far worse yet, because you are then instructing Scala to treat a string as a character-by-character collection which then requires all sorts of wrappers to make the String look like a collection (including boxed character-by-character addition using the generic version of ++!). Now you're 16x slower again on 100 additions! (Precise data: 428.8 us for ++=on a var string instead of +='s 26.7 us.)
++=在 avarString上使用要糟糕得多,因为您然后指示 Scala 将字符串视为逐个字符的集合,然后需要各种包装器使 String 看起来像一个集合(包括装箱的逐个字符另外使用++!的通用版本。现在,添加 100 次后,您的速度再次慢了 16 倍!(精确数据:428.8 us 用于++=var 字符串,而不是+=26.7 us。)
If you write a single statement with a bunch of +es then the Scala compiler will use a StringBuilder and end up with an efficient result (Data: 1.8 us on non-constant strings pulled out of an array).
如果你用一堆+es编写一条语句,那么 Scala 编译器将使用 StringBuilder 并最终得到一个有效的结果(数据:1.8 us 在从数组中拉出的非常量字符串上)。
So, if you add strings with anything other than +in line, and you care about efficiency, use a StringBuilder. Definitely don't use ++=to add another Stringto a varString; there just isn't any reason to do it, and there's a big runtime penalty.
因此,如果您添加除+in 行以外的任何字符串,并且您关心效率,请使用StringBuilder. 绝对不要用++=另一个添加String到varString; 只是没有任何理由这样做,并且有很大的运行时间损失。
(Note--very often you don't care at all how efficient your string additions are! Don't clutter your code with extra StringBuilders unless you have reason to suspect that this particular code path is getting called a lot.)
(注意——通常你根本不在乎你的字符串添加的效率有多高!不要用额外的StringBuilders把你的代码弄乱,除非你有理由怀疑这个特定的代码路径被调用了很多。)

