Java 8 - 转换列表的最佳方式:map 还是 foreach?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28319064/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 05:59:37  来源:igfitidea点击:

Java 8 - Best way to transform a list: map or foreach?

javajava-8java-stream

提问by Emilien Brigand

I have a list myListToParsewhere I want to filter the elements and apply a method on each element, and add the result in another list myFinalList.

我有一个列表myListToParse,我想过滤元素并在每个元素上应用一个方法,然后将结果添加到另一个列表中myFinalList

With Java 8 I noticed that I can do it in 2 different ways. I would like to know the more efficient way between them and understand why one way is better than the other one.

在 Java 8 中,我注意到我可以通过两种不同的方式来实现。我想知道它们之间更有效的方式,并了解为什么一种方式比另一种方式更好。

I'm open for any suggestion about a third way.

我愿意接受关于第三种方式的任何建议。

Method 1:

方法一:

myFinalList = new ArrayList<>();
myListToParse.stream()
        .filter(elt -> elt != null)
        .forEach(elt -> myFinalList.add(doSomething(elt)));

Method 2:

方法二:

myFinalList = myListToParse.stream()
        .filter(elt -> elt != null)
        .map(elt -> doSomething(elt))
        .collect(Collectors.toList()); 

采纳答案by herman

Don't worry about any performance differences, they're going to be minimal in this case normally.

不要担心任何性能差异,在这种情况下它们通常会很小。

Method 2 is preferable because

方法 2 更可取,因为

  1. it doesn't require mutating a collection that exists outside the lambda expression,

  2. it's more readable because the different steps that are performed in the collection pipeline are written sequentially: first a filter operation, then a map operation, then collecting the result (for more info on the benefits of collection pipelines, see Martin Fowler's excellent article),

  3. you can easily change the way values are collected by replacing the Collectorthat is used. In some cases you may need to write your own Collector, but then the benefit is that you can easily reuse that.

  1. 它不需要改变存在于 lambda 表达式之外的集合,

  2. 它更具可读性,因为在收集管道中执行的不同步骤是按顺序编写的:首先是过滤操作,然后是映射操作,然后收集结果(有关收集管道好处的更多信息,请参阅 Martin Fowler 的优秀文章),

  3. 您可以通过替换Collector使用的来轻松更改收集值的方式。在某些情况下,您可能需要编写自己的Collector,但好处是您可以轻松地重用它。

回答by Eran

I prefer the second way.

我更喜欢第二种方式。

When you use the first way, if you decide to use a parallel stream to improve performance, you'll have no control over the order in which the elements will be added to the output list by forEach.

当您使用第一种方式时,如果您决定使用并行流来提高性能,您将无法控制元素添加到输出列表的顺序forEach

When you use toList, the Streams API will preserve the order even if you use a parallel stream.

当您使用 时toList,即使您使用并行流,Streams API 也会保留顺序。

回答by assylias

I agree with the existing answers that the second form is better because it does not have any side effects and is easier to parallelise (just use a parallel stream).

我同意现有的答案,即第二种形式更好,因为它没有任何副作用并且更容易并行化(只需使用并行流)。

Performance wise, it appears they are equivalent until you start using parallel streams. In that case, mapwill perform really much better. See below the micro benchmarkresults:

性能方面,在您开始使用并行流之前,它们似乎是等效的。在这种情况下,map 的性能会好得多。请参阅下面的微基准测试结果:

Benchmark                         Mode  Samples    Score   Error  Units
SO28319064.forEach                avgt      100  187.310 ± 1.768  ms/op
SO28319064.map                    avgt      100  189.180 ± 1.692  ms/op
SO28319064.mapWithParallelStream  avgt      100   55,577 ± 0,782  ms/op

You can't boost the first example in the same manner because forEachis a terminal method - it returns void - so you are forced to use a stateful lambda. But that is really a bad idea if you are using parallel streams.

你不能以同样的方式提升第一个例子,因为forEach是一个终端方法 - 它返回 void - 所以你被迫使用有状态的 lambda。但是,如果您使用并行流,那确实是个坏主意

Finally note that your second snippet can be written in a sligthly more concise way with method references and static imports:

最后请注意,您的第二个代码段可以通过方法引用和静态导入以更简洁的方式编写:

myFinalList = myListToParse.stream()
    .filter(Objects::nonNull)
    .map(this::doSomething)
    .collect(toList()); 

回答by M.K.

One of the main benefits of using streams is that it gives the ability to process data in a declarative way, that is, using a functional style of programming. It also gives multi-threading capability for free meaning there is no need to write any extra multi-threaded code to make your stream concurrent.

使用流的主要好处之一是它提供了以声明方式处理数据的能力,即使用函数式编程风格。它还免费提供多线程功能,这意味着无需编写任何额外的多线程代码来使您的流并发。

Assuming the reason you are exploring this style of programming is that you want to exploit these benefits then your first code sample is potentially not functional since the foreachmethod is classed as being terminal (meaning that it can produce side-effects).

假设您探索这种编程风格的原因是您想利用这些好处,那么您的第一个代码示例可能无法正常工作,因为该foreach方法被归类为终端(意味着它会产生副作用)。

The second way is preferred from functional programming point of view since the map function can accept stateless lambda functions. More explicitly, the lambda passed to the map function should be

从函数式编程的角度来看,第二种方式是首选,因为 map 函数可以接受无状态的 lambda 函数。更明确地说,传递给 map 函数的 lambda 应该是

  1. Non-interfering, meaning that the function should not alter the source of the stream if it is non-concurrent (e.g. ArrayList).
  2. Stateless to avoid unexpected results when doing parallel processing (caused by thread scheduling differences).
  1. 无干扰,这意味着如果它是非并发的(例如ArrayList),该函数不应更改流的源。
  2. 无状态以避免在进行并行处理时出现意外结果(由线程调度差异引起)。

Another benefit with the second approach is if the stream is parallel and the collector is concurrent and unordered then these characteristics can provide useful hints to the reduction operation to do the collecting concurrently.

第二种方法的另一个好处是,如果流是并行的,并且收集器是并发且无序的,那么这些特征可以为减少操作提供有用的提示,以便同时进行收集。

回答by Craig P. Motlin

If you use Eclipse Collectionsyou can use the collectIf()method.

如果您使用Eclipse Collections,则可以使用该collectIf()方法。

MutableList<Integer> source =
    Lists.mutable.with(1, null, 2, null, 3, null, 4, null, 5);

MutableList<String> result = source.collectIf(Objects::nonNull, String::valueOf);

Assert.assertEquals(Lists.immutable.with("1", "2", "3", "4", "5"), result);

It evaluates eagerly and should be a bit faster than using a Stream.

它急切地求值并且应该比使用 Stream 快一点。

Note:I am a committer for Eclipse Collections.

注意:我是 Eclipse Collections 的提交者。

回答by harshtuna

There is a third option - using stream().toArray()- see comments under why didn't stream have a toList method. It turns out to be slower than forEach() or collect(), and less expressive. It might be optimised in later JDK builds, so adding it here just in case.

还有第三个选项 - 使用stream().toArray()- 请参阅为什么流没有 toList 方法下的评论。结果证明它比 forEach() 或 collect() 慢,并且表达能力较差。它可能会在以后的 JDK 构建中进行优化,因此在此处添加它以防万一。

assuming List<String>

假设 List<String>

    myFinalList = Arrays.asList(
            myListToParse.stream()
                    .filter(Objects::nonNull)
                    .map(this::doSomething)
                    .toArray(String[]::new)
    );

with a micro-micro benchmark, 1M entries, 20% nulls and simple transform in doSomething()

在 doSomething() 中使用微基准测试、100 万个条目、20% 的空值和简单的转换

private LongSummaryStatistics benchmark(final String testName, final Runnable methodToTest, int samples) {
    long[] timing = new long[samples];
    for (int i = 0; i < samples; i++) {
        long start = System.currentTimeMillis();
        methodToTest.run();
        timing[i] = System.currentTimeMillis() - start;
    }
    final LongSummaryStatistics stats = Arrays.stream(timing).summaryStatistics();
    System.out.println(testName + ": " + stats);
    return stats;
}

the results are

结果是

parallel:

平行线:

toArray: LongSummaryStatistics{count=10, sum=3721, min=321, average=372,100000, max=535}
forEach: LongSummaryStatistics{count=10, sum=3502, min=249, average=350,200000, max=389}
collect: LongSummaryStatistics{count=10, sum=3325, min=265, average=332,500000, max=368}

sequential:

顺序:

toArray: LongSummaryStatistics{count=10, sum=5493, min=517, average=549,300000, max=569}
forEach: LongSummaryStatistics{count=10, sum=5316, min=427, average=531,600000, max=571}
collect: LongSummaryStatistics{count=10, sum=5380, min=444, average=538,000000, max=557}

parallel without nulls and filter (so the stream is SIZED): toArrays has the best performance in such case, and .forEach()fails with "indexOutOfBounds" on the recepient ArrayList, had to replace with .forEachOrdered()

没有空值和滤波器(所以流是平行SIZED):toArrays具有在这种情况下,最好的性能,并且.forEach()将失败,并在recepient ArrayList的“indexOutOfBounds”,曾与替换.forEachOrdered()

toArray: LongSummaryStatistics{count=100, sum=75566, min=707, average=755,660000, max=1107}
forEach: LongSummaryStatistics{count=100, sum=115802, min=992, average=1158,020000, max=1254}
collect: LongSummaryStatistics{count=100, sum=88415, min=732, average=884,150000, max=1014}

回答by Kumar Abhishek

May be Method 3.

可能是方法3。

I always prefer to keep logic separate.

我总是喜欢将逻辑分开。

Predicate<Long> greaterThan100 = new Predicate<Long>() {
            @Override
            public boolean test(Long currentParameter) {
                return currentParameter > 100;
            }
        };

        List<Long> sourceLongList = Arrays.asList(1L, 10L, 50L, 80L, 100L, 120L, 133L, 333L);
        List<Long> resultList = sourceLongList.parallelStream().filter(greaterThan100).collect(Collectors.toList());

回答by John McClean

If using 3rd Pary Libaries is ok cyclops-reactdefines Lazy extended collections with this functionality built in. For example we could simply write

如果使用 3rd Pary Libaries 没问题,那么 cyclops -react定义了内置此功能的 Lazy 扩展集合。例如,我们可以简单地编写

ListX myListToParse;

ListX myListToParse;

ListX myFinalList = myListToParse.filter(elt -> elt != null) .map(elt -> doSomething(elt));

ListX myFinalList = myListToParse.filter(elt -> elt != null) .map(elt -> doSomething(elt));

myFinalList is not evaluated until first access (and there after the materialized list is cached and reused).

myFinalList 在第一次访问之前不会被评估(并且在物化列表被缓存和重用之后)。

[Disclosure I am the lead developer of cyclops-react]

[披露我是独眼巨人反应的首席开发人员]