带有流和性能的 Java 8 嵌套循环

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21968918/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 11:44:23  来源:igfitidea点击:

Java 8 nested loops with streams & performance

javaperformancejava-8nested-loopsjava-stream

提问by Konrad H?ffner

In order to practise the Java 8 streams I tried converting the following nested loop to the Java 8 stream API. It calculates the largest digit sum of a^b (a,b < 100) and takes ~0.135s on my Core i5 760.

为了练习 Java 8 流,我尝试将以下嵌套循环转换为 Java 8 流 API。它计算 a^b (a,b < 100) 的最大数字和,并在我的 Core i5 760 上花费 ~0.135s。

public static int digitSum(BigInteger x)
{
    int sum = 0;
    for(char c: x.toString().toCharArray()) {sum+=Integer.valueOf(c+"");}
    return sum;
}

@Test public void solve()
    {
        int max = 0;
        for(int i=1;i<100;i++)
            for(int j=1;j<100;j++)
                max = Math.max(max,digitSum(BigInteger.valueOf(i).pow(j)));
        System.out.println(max);
    }

My solution, which I expected to be faster because of the paralellism actually took 0.25s (0.19s without the parallel()):

我的解决方案,由于并行性,我预计会更快,实际上需要 0.25 秒(没有 0.19 秒parallel()):

int max =   IntStream.range(1,100).parallel()
            .map(i -> IntStream.range(1, 100)
            .map(j->digitSum(BigInteger.valueOf(i).pow(j)))
            .max().getAsInt()).max().getAsInt();

My questions

我的问题

  • did I do the conversion right or is there a better way to convert nested loops to stream calculations?
  • why is the stream variant so much slower than the old one?
  • why did the parallel() statement actually increased the time from 0.19s to 0.25s?
  • 我做了正确的转换还是有更好的方法将嵌套循环转换为流计算?
  • 为什么流变体比旧变体慢这么多?
  • 为什么parallel() 语句实际上将时间从0.19s 增加到0.25s?

I know that microbenchmarks are fragile and parallelism is only worth it for big problems but for a CPU, even 0.1s is an eternity, right?

我知道微基准测试是脆弱的,并行性只在处理大问题时才值得,但对于 CPU 来说,即使 0.1 秒也是永恒的,对吧?

Update

更新

I measure with the Junit 4 framework in Eclipse Kepler (it shows the time taken for executing a test).

我使用 Eclipse Kepler 中的 Junit 4 框架进行测量(它显示了执行测试所花费的时间)。

My results for a,b<1000 instead of 100:

我对 a,b<1000 而不是 100 的结果:

  • traditional loop 186s
  • sequential stream 193s
  • parallel stream 55s
  • 传统循环 186s
  • 顺序流 193s
  • 并行流 55s

Update 2Replacing sum+=Integer.valueOf(c+"");with sum+= c - '0';(thanks Peter!) shaved off 10 whole seconds of the parallel method, bringing it to 45s. Didn't expect such a big performance impact!

更新 2替换sum+=Integer.valueOf(c+"");sum+= c - '0';(感谢 Peter!)将并行方法的整整 10 秒减少到 45 秒。没想到性能影响这么大!

Also, reducing the parallelism to the number of CPU cores (4 in my case) didn't do much as it reduced the time only to 44.8s (yes, it adds a and b=0 but I think this won't impact the performance much):

此外,将并行度减少到 CPU 内核的数量(在我的情况下为 4)并没有太大作用,因为它只将时间减少到 44.8 秒(是的,它增加了 a 和 b=0,但我认为这不会影响表现多):

int max = IntStream.range(0, 3).parallel().
          .map(m -> IntStream.range(0,250)
          .map(i -> IntStream.range(1, 1000)
          .map(j->.digitSum(BigInteger.valueOf(250*m+i).pow(j)))
          .max().getAsInt()).max().getAsInt()).max().getAsInt();

采纳答案by assylias

I have created a quick and dirty micro benchmark based on your code. The results are:

我根据您的代码创建了一个快速而肮脏的微基准测试。结果是:

loop: 3192
lambda: 3140
lambda parallel: 868

循环:3192
lambda:3140
lambda 并行:868

So the loop and lambda are equivalent and the parallel stream significantly improves the performance. I suspect your results are unreliable due to your benchmarking methodology.

所以循环和 lambda 是等效的,并行流显着提高了性能。由于您的基准测试方法,我怀疑您的结果不可靠。

public static void main(String[] args) {
    int sum = 0;

    //warmup
    for (int i = 0; i < 100; i++) {
        solve();
        solveLambda();
        solveLambdaParallel();
    }

    {
        long start = System.nanoTime();
        for (int i = 0; i < 100; i++) {
            sum += solve();
        }
        long end = System.nanoTime();
        System.out.println("loop: " + (end - start) / 1_000_000);
    }
    {
        long start = System.nanoTime();
        for (int i = 0; i < 100; i++) {
            sum += solveLambda();
        }
        long end = System.nanoTime();
        System.out.println("lambda: " + (end - start) / 1_000_000);
    }
    {
        long start = System.nanoTime();
        for (int i = 0; i < 100; i++) {
            sum += solveLambdaParallel();
        }
        long end = System.nanoTime();
        System.out.println("lambda parallel : " + (end - start) / 1_000_000);
    }
    System.out.println(sum);
}

public static int digitSum(BigInteger x) {
    int sum = 0;
    for (char c : x.toString().toCharArray()) {
        sum += Integer.valueOf(c + "");
    }
    return sum;
}

public static int solve() {
    int max = 0;
    for (int i = 1; i < 100; i++) {
        for (int j = 1; j < 100; j++) {
            max = Math.max(max, digitSum(BigInteger.valueOf(i).pow(j)));
        }
    }
    return max;
}

public static int solveLambda() {
    return  IntStream.range(1, 100)
            .map(i -> IntStream.range(1, 100).map(j -> digitSum(BigInteger.valueOf(i).pow(j))).max().getAsInt())
            .max().getAsInt();
}

public static int solveLambdaParallel() {
    return  IntStream.range(1, 100)
            .parallel()
            .map(i -> IntStream.range(1, 100).map(j -> digitSum(BigInteger.valueOf(i).pow(j))).max().getAsInt())
            .max().getAsInt();
}


I have also run it with jmh which is more reliable than manual tests. The results are consistent with above (micro seconds per call):

我也用 jmh 运行它,它比手动测试更可靠。结果与上面一致(每次调用微秒):

Benchmark                                Mode   Mean        Units
c.a.p.SO21968918.solve                   avgt   32367.592   us/op
c.a.p.SO21968918.solveLambda             avgt   31423.123   us/op
c.a.p.SO21968918.solveLambdaParallel     avgt   8125.600    us/op

回答by Peter Lawrey

The problem you have is you are looking at sub-optimal code. When you have code which might be heavily optimised you are very dependant on whether the JVM is smart enough to optimise your code. Loops have been around much longer and are better understood.

您遇到的问题是您正在查看次优代码。当您的代码可能被大量优化时,您非常依赖于 JVM 是否足够智能来优化您的代码。循环出现的时间更长,也更容易理解。

One big difference in your loop code, is you working set is very small. You are only considering one maximum digit sum at a time. This means the code is cache friendly and you have very short lived objects. In the stream() case you are building up collections for which there more in the working set at any one time, using more cache, with more overhead. I would expect your GC times to be longer and/or more frequent as well.

您的循环代码的一大区别是您的工作集非常小。您一次只考虑一位最大数字总和。这意味着代码是缓存友好的,并且您拥有非常短暂的对象。在 stream() 的情况下,您正在构建集合,在任何时候工作集中都有更多的集合,使用更多的缓存,更多的开销。我希望您的 GC 时间更长和/或更频繁。

why is the stream variant so much slower than the old one?

为什么流变体比旧变体慢这么多?

Loops are fairly well optimised having been around since before Java was developed. They can be mapped very efficiently to hardware. Streams are fairly new and not as heavily optimised.

循环在 Java 开发之前就已经得到了很好的优化。它们可以非常有效地映射到硬件。Streams 是相当新的并且没有经过大量优化。

why did the parallel() statement actually increased the time from 0.19s to 0.25s?

为什么parallel() 语句实际上将时间从0.19s 增加到0.25s?

Most likely you have a bottle neck on a shared resource. You create quite a bit of garbage but this is usually fairly concurrent. Using more threads, only guarantees you will have more overhead, it doesn't ensure you can take advantage of the extra CPU power you have.

您很可能在共享资源上遇到瓶颈。您创建了相当多的垃圾,但这通常是相当并发的。使用更多线程只能保证您将有更多开销,并不能确保您可以利用您拥有的额外 CPU 能力。