为什么 40 亿次迭代的 Java 循环只需要 2 毫秒?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47957337/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why does a 4 billion-iteration Java loop take only 2 ms?
提问by twimo
I'm running the following Java code on a laptop with 2.7 GHz Intel Core i7. I intended to let it measure how long it takes to finish a loop with 2^32 iterations, which I expected to be roughly 1.48 seconds (4/2.7 = 1.48).
我在配备 2.7 GHz Intel Core i7 的笔记本电脑上运行以下 Java 代码。我打算让它测量完成 2^32 次迭代所需的时间,我预计大约需要 1.48 秒(4/2.7 = 1.48)。
But actually it only takes 2 milliseconds, instead of 1.48 s. I'm wondering if this is a result of any JVM optimization underneath?
但实际上只需要 2 毫秒,而不是 1.48 秒。我想知道这是否是底层任何 JVM 优化的结果?
public static void main(String[] args)
{
long start = System.nanoTime();
for (int i = Integer.MIN_VALUE; i < Integer.MAX_VALUE; i++){
}
long finish = System.nanoTime();
long d = (finish - start) / 1000000;
System.out.println("Used " + d);
}
回答by van dench
There are one of two possibilities going on here:
这里有两种可能性之一:
The compiler realized that the loop is redundant and doing nothing so it optimized it away.
The JIT (just-in-time compiler) realized that the loop is redundant and doing nothing, so it optimized it away.
编译器意识到循环是多余的并且什么都不做,所以它优化了它。
JIT(即时编译器)意识到循环是多余的并且什么都不做,所以它优化了它。
Modern compilers are very intelligent; they can see when code is useless. Try putting an empty loop into GodBoltand look at the output, then turn on -O2
optimizations, you will see that the output is something along the lines of
现代编译器非常智能;他们可以看到代码何时无用。尝试将一个空循环放入GodBolt并查看输出,然后打开-O2
优化,您将看到输出类似于
main():
xor eax, eax
ret
I would like to clarify something, in Java most of the optimizations are done by the JIT. In some other languages (like C/C++) most of the optimizations are done by the first compiler.
我想澄清一些事情,在 Java 中,大部分优化都是由 JIT 完成的。在其他一些语言(如 C/C++)中,大多数优化是由第一个编译器完成的。
回答by Akavall
It looks like it was optimized away by JIT compiler. When I turn it off (-Djava.compiler=NONE
), the code runs much slower:
看起来它被 JIT 编译器优化掉了。当我关闭它 ( -Djava.compiler=NONE
) 时,代码运行得更慢:
$ javac MyClass.java
$ java MyClass
Used 4
$ java -Djava.compiler=NONE MyClass
Used 40409
I put OP's code inside of class MyClass
.
我将 OP 的代码放在class MyClass
.
回答by Eugene
I just will state the obvious - that this is a JVM optimization that happens, the loop will simply be remove at all. Here is a small test that shows what a hugedifference JIT
has when enabled/enabled only for C1 Compiler
and disabled at all.
我只是要说明一个明显的 - 这是一个 JVM 优化发生,循环将被简单地删除。这是一个小测试,显示了仅启用/启用和完全禁用时的巨大差异。JIT
C1 Compiler
Disclaimer: don't write tests like this - this is just to prove that the actual loop "removal" happens in the C2 Compiler
:
免责声明:不要写这样的测试 - 这只是为了证明实际循环“删除”发生在C2 Compiler
:
@Benchmark
@Fork(1)
public void full() {
long result = 0;
for (int i = Integer.MIN_VALUE; i < Integer.MAX_VALUE; i++) {
++result;
}
}
@Benchmark
@Fork(1)
public void minusOne() {
long result = 0;
for (int i = Integer.MIN_VALUE; i < Integer.MAX_VALUE - 1; i++) {
++result;
}
}
@Benchmark
@Fork(value = 1, jvmArgsAppend = { "-XX:TieredStopAtLevel=1" })
public void withoutC2() {
long result = 0;
for (int i = Integer.MIN_VALUE; i < Integer.MAX_VALUE - 1; i++) {
++result;
}
}
@Benchmark
@Fork(value = 1, jvmArgsAppend = { "-Xint" })
public void withoutAll() {
long result = 0;
for (int i = Integer.MIN_VALUE; i < Integer.MAX_VALUE - 1; i++) {
++result;
}
}
The results show that depending on which part of the JIT
is enabled, method gets faster (so much faster that it looks like it's doing "nothing" - loop removal, which seems to be happening in the C2 Compiler
- which is the maximum level):
结果表明,根据启用的哪个部分JIT
,方法变得更快(快得多,看起来它似乎在“什么都不做” - 循环删除,这似乎发生在C2 Compiler
- 这是最高级别):
Benchmark Mode Cnt Score Error Units
Loop.full avgt 2 ≈ 10?? ms/op
Loop.minusOne avgt 2 ≈ 10?? ms/op
Loop.withoutAll avgt 2 51782.751 ms/op
Loop.withoutC2 avgt 2 1699.137 ms/op
回答by Oleksandr Pyrohov
As already pointed out, JIT(just-in-time) compiler can optimize an empty loop in order to remove unnecessary iterations. But how?
正如已经指出的那样,JIT(即时)编译器可以优化空循环以删除不必要的迭代。但是如何?
Actually, there are two JIT compilers: C1& C2. First, the code is compiled with the C1. C1 collects the statistics and helps the JVM to discover that in 100% cases our empty loop doesn't change anything and is useless. In this situation C2 enters the stage. When the code is get called very often, it can be optimized and compiled with the C2 using collected statistics.
实际上,有两个 JIT 编译器:C1和C2。首先,代码是用C1编译的。C1 收集统计信息并帮助 JVM 发现在 100% 的情况下我们的空循环不会改变任何东西并且是无用的。在这种情况下,C2 进入阶段。当代码被频繁调用时,可以使用收集的统计信息使用 C2 对其进行优化和编译。
As an example, I will test the next code snippet (my JDK is set to slowdebug build 9-internal):
例如,我将测试下一个代码片段(我的 JDK 设置为slowdebug build 9-internal):
public class Demo {
private static void run() {
for (int i = Integer.MIN_VALUE; i < Integer.MAX_VALUE; i++) {
}
System.out.println("Done!");
}
}
With the following command line options:
使用以下命令行选项:
-XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=print,*Demo.run
And there are different versions of my runmethod, compiled with the C1 and C2 appropriately. For me, the final variant (C2) looks something like this:
我的run方法有不同版本,分别使用 C1 和 C2 编译。对我来说,最终的变体 (C2) 看起来像这样:
...
; B1: # B3 B2 <- BLOCK HEAD IS JUNK Freq: 1
0x00000000125461b0: mov dword ptr [rsp+0ffffffffffff7000h], eax
0x00000000125461b7: push rbp
0x00000000125461b8: sub rsp, 40h
0x00000000125461bc: mov ebp, dword ptr [rdx]
0x00000000125461be: mov rcx, rdx
0x00000000125461c1: mov r10, 57fbc220h
0x00000000125461cb: call indirect r10 ; *iload_1
0x00000000125461ce: cmp ebp, 7fffffffh ; 7fffffff => 2147483647
0x00000000125461d4: jnl 125461dbh ; jump if not less
; B2: # B3 <- B1 Freq: 0.999999
0x00000000125461d6: mov ebp, 7fffffffh ; *if_icmpge
; B3: # N44 <- B1 B2 Freq: 1
0x00000000125461db: mov edx, 0ffffff5dh
0x0000000012837d60: nop
0x0000000012837d61: nop
0x0000000012837d62: nop
0x0000000012837d63: call 0ae86fa0h
...
It is a little bit messy, but If you look closely, you may notice that there is no long running loop here. There are 3 blocks: B1, B2 and B3 and the execution steps can be B1 -> B2 -> B3
or B1 -> B3
. Where Freq: 1
- normalized estimated frequency of a block execution.
有点乱,但是如果仔细观察,您可能会注意到这里没有长时间运行的循环。有 3 个块:B1、B2 和 B3,执行步骤可以是B1 -> B2 -> B3
或B1 -> B3
。其中Freq: 1
- 块执行的标准化估计频率。
回答by Peter Lawrey
You are measuring the time it take to detect the loop doesn't do anything, compile the code in a background thread and eliminate the code.
您正在测量检测循环不执行任何操作所需的时间,在后台线程中编译代码并消除代码。
for (int t = 0; t < 5; t++) {
long start = System.nanoTime();
for (int i = Integer.MIN_VALUE; i < Integer.MAX_VALUE; i++) {
}
long time = System.nanoTime() - start;
String s = String.format("%d: Took %.6f ms", t, time / 1e6);
Thread.sleep(50);
System.out.println(s);
Thread.sleep(50);
}
If you run this with -XX:+PrintCompilation
you can see the code has been compiled in the background to level 3 or C1 compiler and after a few loops to level 4 of C4.
如果你用它运行,-XX:+PrintCompilation
你可以看到代码已经在后台编译到 3 级或 C1 编译器,并在几次循环后编译到 C4 级 4。
129 34 % 3 A::main @ 15 (93 bytes)
130 35 3 A::main (93 bytes)
130 36 % 4 A::main @ 15 (93 bytes)
131 34 % 3 A::main @ -2 (93 bytes) made not entrant
131 36 % 4 A::main @ -2 (93 bytes) made not entrant
0: Took 2.510408 ms
268 75 % 3 A::main @ 15 (93 bytes)
271 76 % 4 A::main @ 15 (93 bytes)
274 75 % 3 A::main @ -2 (93 bytes) made not entrant
1: Took 5.629456 ms
2: Took 0.000000 ms
3: Took 0.000364 ms
4: Took 0.000365 ms
If you change the loop to use a long
it doesn't get as optimised.
如果您将循环更改为使用 along
它不会得到优化。
for (long i = Integer.MIN_VALUE; i < Integer.MAX_VALUE; i++) {
}
instead you get
相反,你得到
0: Took 1579.267321 ms
1: Took 1674.148662 ms
2: Took 1885.692166 ms
3: Took 1709.870567 ms
4: Took 1754.005112 ms
回答by DHARMENDRA SINGH
You consider start and finish time in nanosecond and you divide by 10^6 for calculate the latency
您以纳秒为单位考虑开始和结束时间,然后除以 10^6 以计算延迟
long d = (finish - start) / 1000000
it should be 10^9
because 1
second = 10^9
nanosecond.
应该是10^9
因为1
秒 =10^9
纳秒。