Java 8:流与集合的性能

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22658322/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 17:08:37  来源:igfitidea点击:

Java 8: performance of Streams vs Collections

javaperformancecollectionsjava-8java-stream

提问by Mister Smith

I'm new to Java 8. I still don't know the API in depth, but I've made a small informal benchmark to compare the performance of the new Streams API vs the good old Collections.

我是 Java 8 的新手。我仍然不深入了解 API,但我做了一个小的非正式基准测试来比较新的 Streams API 与旧的集合的性能。

The test consists in filtering a list of Integer, and for each even number, calculate the square root and storing it in a result Listof Double.

测试包括过滤 的列表Integer,对于每个偶数,计算平方根并将其存储在 的结果ListDouble

Here is the code:

这是代码:

    public static void main(String[] args) {
        //Calculating square root of even numbers from 1 to N       
        int min = 1;
        int max = 1000000;

        List<Integer> sourceList = new ArrayList<>();
        for (int i = min; i < max; i++) {
            sourceList.add(i);
        }

        List<Double> result = new LinkedList<>();


        //Collections approach
        long t0 = System.nanoTime();
        long elapsed = 0;
        for (Integer i : sourceList) {
            if(i % 2 == 0){
                result.add(Math.sqrt(i));
            }
        }
        elapsed = System.nanoTime() - t0;       
        System.out.printf("Collections: Elapsed time:\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9));


        //Stream approach
        Stream<Integer> stream = sourceList.stream();       
        t0 = System.nanoTime();
        result = stream.filter(i -> i%2 == 0).map(i -> Math.sqrt(i)).collect(Collectors.toList());
        elapsed = System.nanoTime() - t0;       
        System.out.printf("Streams: Elapsed time:\t\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9));


        //Parallel stream approach
        stream = sourceList.stream().parallel();        
        t0 = System.nanoTime();
        result = stream.filter(i -> i%2 == 0).map(i -> Math.sqrt(i)).collect(Collectors.toList());
        elapsed = System.nanoTime() - t0;       
        System.out.printf("Parallel streams: Elapsed time:\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9));      
    }.

And here are the results for a dual core machine:

以下是双核机器的结果:

    Collections: Elapsed time:        94338247 ns   (0,094338 seconds)
    Streams: Elapsed time:           201112924 ns   (0,201113 seconds)
    Parallel streams: Elapsed time:  357243629 ns   (0,357244 seconds)

For this particular test, streams are about twice as slow as collections, and parallelism doesn't help (or either I'm using it the wrong way?).

对于这个特定的测试,流的速度大约是集合的两倍,并且并行性无济于事(或者我以错误的方式使用它?)。

Questions:

问题:

  • Is this test fair? Have I made any mistake?
  • Are streams slower than collections? Has anyone made a good formal benchmark on this?
  • Which approach should I strive for?
  • 这个测试公平吗?我犯了什么错误吗?
  • 流比集合慢吗?有没有人对此做出过很好的正式基准测试?
  • 我应该争取哪种方法?


Updated results.

更新结果。

I ran the test 1k times after JVM warmup (1k iterations) as advised by @pveentjer:

我按照@pveentjer 的建议在 JVM 预热(1k 次迭代)后运行了 1k 次测试:

    Collections: Average time:      206884437,000000 ns     (0,206884 seconds)
    Streams: Average time:           98366725,000000 ns     (0,098367 seconds)
    Parallel streams: Average time: 167703705,000000 ns     (0,167704 seconds)

In this case streams are more performant. I wonder what would be observed in an app where the filtering function is only called once or twice during runtime.

在这种情况下,流的性能更高。我想知道在一个应用程序中会观察到什么,其中过滤函数在运行时只被调用一两次。

回答by pveentjer

For what you are trying to do, I would not use regular java api's anyway. There is a ton of boxing/unboxing going on, so there is a huge performance overhead.

对于您要执行的操作,无论如何我都不会使用常规的 Java api。有大量的装箱/拆箱正在进行,因此存在巨大的性能开销。

Personally I think that a lot of API designed are crap because they create a lot of object litter.

我个人认为很多 API 设计都是垃圾,因为它们会产生大量的对象垃圾。

Try to use a primitive arrays of double/int and try to do it single threaded and see what the performance is.

尝试使用 double/int 的原始数组并尝试单线程进行,看看性能如何。

PS: You might want to have a look at JMH to take care of doing the benchmark. It takes care of some of the typical pitfalls like warming up the JVM.

PS:您可能想看看 JMH 来处理基准测试。它处理了一些典型的陷阱,比如预热 JVM。

回答by Sergey Fedorov

1) You see time less than 1 second using you benchmark. That means there can be strong influence of side effects on your results. So, I increased your task 10 times

1) 使用基准测试,您看到的时间少于 1 秒。这意味着副作用可能会对您的结果产生很大影响。所以,我把你的任务增加了 10 倍

    int max = 10_000_000;

and ran your benchmark. My results:

并运行您的基准测试。我的结果:

Collections: Elapsed time:   8592999350 ns  (8.592999 seconds)
Streams: Elapsed time:       2068208058 ns  (2.068208 seconds)
Parallel streams: Elapsed time:  7186967071 ns  (7.186967 seconds)

without edit (int max = 1_000_000) results were

没有编辑 ( int max = 1_000_000) 结果是

Collections: Elapsed time:   113373057 ns   (0.113373 seconds)
Streams: Elapsed time:       135570440 ns   (0.135570 seconds)
Parallel streams: Elapsed time:  104091980 ns   (0.104092 seconds)

It's like your results: stream is slower than collection. Conclusion:much time were spent for stream initialization/values transmitting.

这就像你的结果:流比收集慢。结论:流初始化/值传输花费了很多时间。

2) After increasing task stream became faster (that's OK), but parallel stream remained too slow. What's wrong? Note: you have collect(Collectors.toList())in you command. Collecting to single collection essentially introduces performance bottleneck and overhead in case of concurrent execution. It is possible to estimate the relative cost of overhead by replacing

2)增加任务流后变得更快(没关系),但并行流仍然太慢。怎么了?注意:你有collect(Collectors.toList())你的命令。收集到单个收集本质上会在并发执行的情况下引入性能瓶颈和开销。可以通过替换来估计间接费用的相对成本

collecting to collection -> counting the element count

For streams it can be done by collect(Collectors.counting()). I got results:

对于流,它可以通过collect(Collectors.counting()). 我得到了结果:

Collections: Elapsed time:   41856183 ns    (0.041856 seconds)
Streams: Elapsed time:       546590322 ns   (0.546590 seconds)
Parallel streams: Elapsed time:  1540051478 ns  (1.540051 seconds)

That' s for a big task! (int max = 10000000) Conclusion:collecting items to collection took majority of time. The slowest part is adding to list. BTW, simple ArrayListis used for Collectors.toList().

这是一项艰巨的任务!( int max = 10000000)结论:收集物品到收集花费了大部分时间。最慢的部分是添加到列表中。顺便说一句,简单ArrayList用于Collectors.toList().

回答by leventov

  1. Stop using LinkedListfor anything but heavy removing from the middle of the list using iterator.

  2. Stop writing benchmarking code by hand, use JMH.

  1. LinkedList除了使用迭代器从列表中间大量删除之外,停止使用任何东西。

  2. 停止手动编写基准测试代码,使用JMH

Proper benchmarks:

适当的基准:

@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(StreamVsVanilla.N)
public class StreamVsVanilla {
    public static final int N = 10000;

    static List<Integer> sourceList = new ArrayList<>();
    static {
        for (int i = 0; i < N; i++) {
            sourceList.add(i);
        }
    }

    @Benchmark
    public List<Double> vanilla() {
        List<Double> result = new ArrayList<>(sourceList.size() / 2 + 1);
        for (Integer i : sourceList) {
            if (i % 2 == 0){
                result.add(Math.sqrt(i));
            }
        }
        return result;
    }

    @Benchmark
    public List<Double> stream() {
        return sourceList.stream()
                .filter(i -> i % 2 == 0)
                .map(Math::sqrt)
                .collect(Collectors.toCollection(
                    () -> new ArrayList<>(sourceList.size() / 2 + 1)));
    }
}

Result:

结果:

Benchmark                   Mode   Samples         Mean   Mean error    Units
StreamVsVanilla.stream      avgt        10       17.588        0.230    ns/op
StreamVsVanilla.vanilla     avgt        10       10.796        0.063    ns/op

Just as I expected stream implementation is fairly slower. JIT is able to inline all lambda stuff but doesn't produce as perfectly concise code as vanilla version.

正如我预期的那样,流实现相当慢。JIT 能够内联所有 lambda 内容,但不会产生像 vanilla 版本那样完美简洁的代码。

Generally, Java 8 streams are not magic. They couldn't speedup already well-implemented things (with, probably, plain iterations or Java 5's for-each statements replaced with Iterable.forEach()and Collection.removeIf()calls). Streams are more about coding convenience and safety. Convenience -- speed tradeoff is working here.

通常,Java 8 流并不神奇。他们无法加速已经很好实现的事情(可能是简单的迭代或 Java 5 的 for-each 语句被替换为Iterable.forEach()Collection.removeIf()调用)。流更多的是关于编码的便利性和安全性。方便——速度权衡在这里起作用。

回答by Mellon

    public static void main(String[] args) {
    //Calculating square root of even numbers from 1 to N       
    int min = 1;
    int max = 10000000;

    List<Integer> sourceList = new ArrayList<>();
    for (int i = min; i < max; i++) {
        sourceList.add(i);
    }

    List<Double> result = new LinkedList<>();


    //Collections approach
    long t0 = System.nanoTime();
    long elapsed = 0;
    for (Integer i : sourceList) {
        if(i % 2 == 0){
            result.add( doSomeCalculate(i));
        }
    }
    elapsed = System.nanoTime() - t0;       
    System.out.printf("Collections: Elapsed time:\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9));


    //Stream approach
    Stream<Integer> stream = sourceList.stream();       
    t0 = System.nanoTime();
    result = stream.filter(i -> i%2 == 0).map(i -> doSomeCalculate(i))
            .collect(Collectors.toList());
    elapsed = System.nanoTime() - t0;       
    System.out.printf("Streams: Elapsed time:\t\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9));


    //Parallel stream approach
    stream = sourceList.stream().parallel();        
    t0 = System.nanoTime();
    result = stream.filter(i -> i%2 == 0).map(i ->  doSomeCalculate(i))
            .collect(Collectors.toList());
    elapsed = System.nanoTime() - t0;       
    System.out.printf("Parallel streams: Elapsed time:\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9));      
}

static double doSomeCalculate(int input) {
    for(int i=0; i<100000; i++){
        Math.sqrt(i+input);
    }
    return Math.sqrt(input);
}

I change the code a bit, ran on my mac book pro which has 8 cores, I got a reasonable result:

我稍微更改了代码,在我的 8 核 mac book pro 上运行,我得到了一个合理的结果:

Collections: Elapsed time: 1522036826 ns (1.522037 seconds)

集合:经过时间:1522036826 ns(1.522037 秒)

Streams: Elapsed time: 4315833719 ns (4.315834 seconds)

流:经过时间:4315833719 ns(4.315834 秒)

Parallel streams: Elapsed time: 261152901 ns (0.261153 seconds)

并行流:经过时间:261152901 ns(0.261153 秒)