Java 我应该尽可能使用并行流吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20375176/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 01:28:32  来源:igfitidea点击:

Should I always use a parallel stream when possible?

javaparallel-processingjava-8java-stream

提问by Matsemann

With Java 8 and lambdas it's easy to iterate over collections as streams, and just as easy to use a parallel stream. Two examples from the docs, the second one using parallelStream:

使用 Java 8 和 lambdas,可以轻松地将集合作为流进行迭代,并且与使用并行流一样容易。来自docs 的两个示例,第二个使用 parallelStream:

myShapesCollection.stream()
    .filter(e -> e.getColor() == Color.RED)
    .forEach(e -> System.out.println(e.getName()));

myShapesCollection.parallelStream() // <-- This one uses parallel
    .filter(e -> e.getColor() == Color.RED)
    .forEach(e -> System.out.println(e.getName()));

As long as I don't care about the order, would it always be beneficial to use the parallel? One would think it is faster dividing the work on more cores.

只要我不在乎顺序,使用并行总是有益的吗?有人会认为将工作分配到更多内核上会更快。

Are there other considerations? When should parallel stream be used and when should the non-parallel be used?

还有其他考虑吗?什么时候应该使用并行流,什么时候应该使用非并行流?

(This question is asked to trigger a discussion about how and when to use parallel streams, not because I think always using them is a good idea.)

(提出这个问题是为了引发关于如何以及何时使用并行流的讨论,不是因为我认为总是使用它们是个好主意。)

采纳答案by JB Nizet

A parallel stream has a much higher overhead compared to a sequential one. Coordinating the threads takes a significant amount of time. I would use sequential streams by default and only consider parallel ones if

与顺序流相比,并行流的开销要高得多。协调线程需要大量时间。默认情况下,我会使用顺序流,并且仅在以下情况下才考虑并行流

  • I have a massive amount of items to process (or the processing of each item takes time and is parallelizable)

  • I have a performance problem in the first place

  • I don't already run the process in a multi-thread environment (for example: in a web container, if I already have many requests to process in parallel, adding an additional layer of parallelism inside each request could have more negative than positive effects)

  • 我有大量的项目要处理(或者每个项目的处理都需要时间并且是可并行的)

  • 我首先有一个性能问题

  • 我还没有在多线程环境中运行该进程(例如:在 Web 容器中,如果我已经有许多并行处理的请求,则在每个请求中添加额外的并行层可能会产生更多的负面影响而不是正面影响)

In your example, the performance will anyway be driven by the synchronized access to System.out.println(), and making this process parallel will have no effect, or even a negative one.

在您的示例中,性能无论如何都将由对 的同步访问驱动System.out.println(),并且使此过程并行将没有效果,甚至会产生负面影响。

Moreover, remember that parallel streams don't magically solve all the synchronization problems. If a shared resource is used by the predicates and functions used in the process, you'll have to make sure that everything is thread-safe. In particular, side effects are things you really have to worry about if you go parallel.

此外,请记住并行流并不能神奇地解决所有同步问题。如果进程中使用的谓词和函数使用共享资源,则必须确保一切都是线程安全的。尤其是,如果你并行,副作用是你真正需要担心的事情。

In any case, measure, don't guess! Only a measurement will tell you if the parallelism is worth it or not.

无论如何,衡量,不要猜测!只有测量才能告诉您并行性是否值得。

回答by edharned

JB hit the nail on the head. The only thing I can add is that Java 8 doesn't do pure parallel processing, it does paraquential. Yes I wrote the article and I've been doing F/J for thirty years so I do understand the issue.

JB 一针见血。我唯一可以补充的是 Java 8 不做纯并行处理,它做paraquential。是的,我写了这篇文章,我做 F/J 已经三十年了,所以我确实理解这个问题。

回答by Brian Goetz

The Stream API was designed to make it easy to write computations in a way that was abstracted away from how they would be executed, making switching between sequential and parallel easy.

Stream API 旨在以一种抽象出计算方式的方式轻松编写计算,从而轻松地在顺序和并行之间切换。

However, just because its easy, doesn't mean its always a good idea, and in fact, it is a badidea to just drop .parallel()all over the place simply because you can.

然而,仅仅因为它很容易,并不意味着它总是一个好主意,事实上,仅仅因为你可以就到处乱扔是一个主意.parallel()

First, note that parallelism offers no benefits other than the possibility of faster execution when more cores are available. A parallel execution will always involve more work than a sequential one, because in addition to solving the problem, it also has to perform dispatching and coordinating of sub-tasks. The hope is that you'll be able to get to the answer faster by breaking up the work across multiple processors; whether this actually happens depends on a lot of things, including the size of your data set, how much computation you are doing on each element, the nature of the computation (specifically, does the processing of one element interact with processing of others?), the number of processors available, and the number of other tasks competing for those processors.

首先,请注意并行性除了在更多内核可用时可以更快地执行之外没有任何好处。并行执行总是比顺序执行涉及更多的工作,因为除了解决问题之外,它还必须执行子任务的调度和协调。希望通过拆分多个处理器的工作,您能够更快地找到答案;这是否真的发生取决于很多事情,包括你的数据集的大小,你对每个元素进行了多少计算,计算的性质(具体来说,一个元素的处理是否与其他元素的处理相互作用?) ,可用处理器的数量,以及竞争这些处理器的其他任务的数量。

Further, note that parallelism also often exposes nondeterminism in the computation that is often hidden by sequential implementations; sometimes this doesn't matter, or can be mitigated by constraining the operations involved (i.e., reduction operators must be stateless and associative.)

此外,请注意并行性还经常暴露计算中的非确定性,这通常被顺序实现隐藏;有时这并不重要,或者可以通过约束所涉及的操作来缓解(即,归约运算符必须是无状态和关联的。)

In reality, sometimes parallelism will speed up your computation, sometimes it will not, and sometimes it will even slow it down. It is best to develop first using sequential execution and then apply parallelism where (A) you know that there's actually benefit to increased performance and (B) that it will actually deliver increased performance. (A) is a business problem, not a technical one. If you are a performance expert, you'll usually be able to look at the code and determine (B), but the smart path is to measure. (And, don't even bother until you're convinced of (A); if the code is fast enough, better to apply your brain cycles elsewhere.)

实际上,有时并行性会加快计算速度,有时不会,有时甚至会减慢计算速度。最好首先使用顺序执行进行开发,然后应用并行性,其中 (A) 您知道提高性能实际上有好处,并且 (B) 它实际上会提供提高的性能。(A) 是业务问题,而不是技术问题。如果您是性能专家,您通常能够查看代码并确定 (B),但明智的做法是进行测量。(而且,在您确信 (A) 之前,请不要打扰;如果代码足够快,最好将您的大脑循环应用于其他地方。)

The simplest performance model for parallelism is the "NQ" model, where N is the number of elements, and Q is the computation per element. In general, you need the product NQ to exceed some threshold before you start getting a performance benefit. For a low-Q problem like "add up numbers from 1 to N", you will generally see a breakeven between N=1000 and N=10000. With higher-Q problems, you'll see breakevens at lower thresholds.

最简单的并行性能模型是“NQ”模型,其中 N 是元素的数量,Q 是每个元素的计算量。通常,在开始获得性能优势之前,您需要产品 NQ 超过某个阈值。对于像“将数字从 1 加到 N”这样的低 Q 问题,您通常会看到 N=1000 和 N=10000 之间的盈亏平衡点。对于更高 Q 的问题,您会在较低的阈值处看到盈亏平衡点。

But the reality is quite complicated. So until you achieve experthood, first identify when sequential processing is actually costing you something, and then measure if parallelism will help.

但实际情况相当复杂。因此,在您获得专业知识之前,首先要确定顺序处理何时真正让您付出了代价,然后衡量并行性是否会有所帮助。

回答by Ram Patra

I watched one of the presentationsof Brian Goetz(Java Language Architect & specification lead for Lambda Expressions). He explains in detail the following 4 points to consider before going for parallelization:

我观看了Brian Goetz (Java 语言架构师和 Lambda 表达式规范负责人)的演讲之一。他详细解释了在进行并行化之前要考虑的以下 4 点:

Splitting / decomposition costs
– Sometimes splitting is more expensive than just doing the work!
Task dispatch / management costs
– Can do a lot of work in the time it takes to hand work to another thread.
Result combination costs
– Sometimes combination involves copying lots of data. For example, adding numbers is cheap whereas merging sets is expensive.
Locality
– The elephant in the room. This is an important point which everyone may miss. You should consider cache misses, if a CPU waits for data because of cache misses then you wouldn't gain anything by parallelization. That's why array-based sources parallelize the best as the next indices (near the current index) are cached and there are fewer chances that CPU would experience a cache miss.

拆分/分解成本
——有时拆分比做工作更昂贵!
任务调度/管理成本
——可以在将工作交给另一个线程所需的时间内完成大量工作。
结果组合成本
——有时组合涉及复制大量数据。例如,添加数字很便宜,而合并集合很昂贵。
位置
——房间里的大象。这是每个人都可能会错过的重要点。您应该考虑缓存未命中,如果 CPU 由于缓存未命中而等待数据,那么您将不会通过并行化获得任何收益。这就是为什么基于数组的源在下一个索引(当前索引附近)被缓存时并行化最好,并且 CPU 遇到缓存未命中的机会更少。

He also mentions a relatively simple formula to determine a chance of parallel speedup.

他还提到了一个相对简单的公式来确定并行加速的机会。

NQ Model:

NQ 型号:

N x Q > 10000

where,
N = number of data items
Q = amount of work per item

其中,
N = 数据项的数量
Q = 每个项目的工作量

回答by ruhong

Other answers have already covered profiling to avoid premature optimization and overhead cost in parallel processing. This answer explains the ideal choice of data structures for parallel streaming.

其他答案已经涵盖了避免过早优化和并行处理中的开销成本的分析。这个答案解释了并行流数据结构的理想选择。

As a rule, performance gains from parallelism are best on streams over ArrayList, HashMap, HashSet, and ConcurrentHashMapinstances; arrays; intranges; and longranges. What these data structures have in common is that they can all be accurately and cheaply split into subranges of any desired sizes, which makes it easy to divide work among parallel threads. The abstraction used by the streams library to perform this task is the spliterator , which is returned by the spliteratormethod on Streamand Iterable.

Another important factor that all of these data structures have in common is that they provide good-to-excellent locality of reference when processed sequentially: sequential element references are stored together in memory. The objects referred to by those references may not be close to one another in memory, which reduces locality-of-reference. Locality-of-reference turns out to be critically important for parallelizing bulk operations: without it, threads spend much of their time idle, waiting for data to be transferred from memory into the processor's cache. The data structures with the best locality of reference are primitive arrays because the data itself is stored contiguously in memory.

作为一项规则,从并行性能提升是最好的流过ArrayListHashMapHashSet,和ConcurrentHashMap实例; 数组;int范围;和long范围。这些数据结构的共同点是,它们都可以准确且廉价地拆分为任何所需大小的子范围,这使得在并行线程之间划分工作变得容易。流库用于执行此任务的抽象是 spliterator ,它由spliteratoronStream和方法返回Iterable

所有这些数据结构的另一个共同点是,它们在顺序处理时提供从良好到卓越的引用局部性:顺序元素引用一起存储在内存中。这些引用所引用的对象在内存中可能不会彼此接近,这会降低引用的局部性。事实证明,引用位置对于并行化批量操作至关重要:没有它,线程将花费大量时间空闲,等待数据从内存传输到处理器的缓存中。具有最佳引用局部性的数据结构是原始数组,因为数据本身连续存储在内存中。

Source: Item #48 Use Caution When Making Streams Parallel, Effective Java 3e by Joshua Bloch

来源:Joshua Bloch 的 Item #48 使用 Caution When Making Streams Parallel, Effective Java 3e

回答by tkruse

Never parallelize an infinite stream with a limit. Here is what happens:

永远不要并行化有限制的无限流。这是发生的事情:

    public static void main(String[] args) {
        // let's count to 1 in parallel
        System.out.println(
            IntStream.iterate(0, i -> i + 1)
                .parallel()
                .skip(1)
                .findFirst()
                .getAsInt());
    }

Result

结果

    Exception in thread "main" java.lang.OutOfMemoryError
        at ...
        at java.base/java.util.stream.IntPipeline.findFirst(IntPipeline.java:528)
        at InfiniteTest.main(InfiniteTest.java:24)
    Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.stream.SpinedBuffer$OfInt.newArray(SpinedBuffer.java:750)
        at ...

Same if you use .limit(...)

如果您使用相同 .limit(...)

Explanation here: Java 8, using .parallel in a stream causes OOM error

此处说明: Java 8,在流中使用 .parallel 会导致 OOM 错误

Similarly, don't use parallel if the stream is ordered and has much more elements than you want to process, e.g.

类似地,如果流是有序的并且具有比您想要处理的多得多的元素,则不要使用并行,例如

public static void main(String[] args) {
    // let's count to 1 in parallel
    System.out.println(
            IntStream.range(1, 1000_000_000)
                    .parallel()
                    .skip(100)
                    .findFirst()
                    .getAsInt());
}

This may run much longer because the parallel threads may work on plenty of number ranges instead of the crucial one 0-100, causing this to take very long time.

这可能会运行更长的时间,因为并行线程可能会在大量数字范围上工作,而不是在关键的 0-100 范围内工作,从而导致这需要很长时间。