Java 8 Stream:limit() 和 skip() 之间的区别

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32414088/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 12:29:55  来源:igfitidea点击:

Java 8 Stream: difference between limit() and skip()

javajava-8limitjava-streamskip

提问by Luigi Cortese

Talking about Streams, when I execute this piece of code

说到Streams,当我执行这段代码时

public class Main {
    public static void main(String[] args) {
        Stream.of(1,2,3,4,5,6,7,8,9)
        .peek(x->System.out.print("\nA"+x))
        .limit(3)
        .peek(x->System.out.print("B"+x))
        .forEach(x->System.out.print("C"+x));
    }
}

I get this output

我得到这个输出

A1B1C1
A2B2C2
A3B3C3

because limiting my stream to the first three components forces actions A, Band Cto be executed only three times.

因为将我的流限制为前三个组件会强制操作ABC只执行三次。

Trying to perform an analogous computation on the last three elements by using skip()method, shows a different behaviour: this

尝试使用skip()方法对最后三个元素执行类似计算,显示出不同的行为:

public class Main {
    public static void main(String[] args) {
        Stream.of(1,2,3,4,5,6,7,8,9)
        .peek(x->System.out.print("\nA"+x))
        .skip(6)
        .peek(x->System.out.print("B"+x))
        .forEach(x->System.out.print("C"+x));
    }
}

outputs this

输出这个

A1
A2
A3
A4
A5
A6
A7B7C7
A8B8C8
A9B9C9

Why, in this case, actions A1to A6are being executed? It must have something to do with the fact that limitis a short-circuiting stateful intermediate operation, while skipis not, but I don't understand practical implications of this property. Is it just that "every action before skipis executed while not everyone before limitis"?

在这种情况下,为什么要执行动作A1A6?它一定与limit是一个短路状态中间操作的事实有关,而skip不是,但我不明白这个属性的实际含义。只是“跳过之前的每个动作都执行而限制之前的每个动作都不是”?

采纳答案by RealSkeptic

What you have here are two stream pipelines.

您在这里拥有的是两个流管道。

These stream pipelines each consist of a source, several intermediate operations, and a terminal operation.

这些流管道每个都由一个源、几个中间操作和一个终端操作组成。

But the intermediate operations are lazy. This means that nothing happens unless a downstream operation requires an item. When it does, then the intermediate operation does all it needs to produce the required item, and then again waits until another item is requested, and so on.

但是中间操作是懒惰的。这意味着除非下游操作需要项目,否则不会发生任何事情。当它这样做时,中间操作会执行它需要的所有操作来生成所需的项目,然后再次等待直到请求另一个项目,依此类推。

The terminal operations are usually "eager". That is, they ask for all the items in the stream that are needed for them to complete.

终端操作通常是“热切的”。也就是说,他们要求他们完成流中的所有项目。

So you should really think of the pipeline as the forEachasking the stream behind it for the next item, and that stream asks the stream behind it, and so on, all the way to the source.

因此,您应该真正将管道视为forEach向其后面的流询问下一项,并且该流向其后面的流询问,依此类推,一直到源。

With that in mind, let's see what we have with your first pipeline:

考虑到这一点,让我们看看您的第一个管道有什么:

Stream.of(1,2,3,4,5,6,7,8,9)
        .peek(x->System.out.print("\nA"+x))
        .limit(3)
        .peek(x->System.out.print("B"+x))
        .forEach(x->System.out.print("C"+x));

So, the forEachis asking for the first item. That means the "B" peekneeds an item, and asks the limitoutput stream for it, which means limitwill need to ask the "A" peek, which goes to the source. An item is given, and goes all the way up to the forEach, and you get your first line:

所以,这forEach是要求第一项。这意味着“B”peek需要一个项目,并向limit输出流询问它,这意味着limit需要询问“A” peek,它转到源。给出一个项目,一直到forEach,你得到你的第一行:

A1B1C1

The forEachasks for another item, then another. And each time, the request is propagated up the stream, and performed. But when forEachasks for the fourth item, when the request gets to the limit, it knows that it has already given all the items it is allowed to give.

forEach请求另一个项目,然后另一个。并且每次,请求都会向上传播并执行。但是当forEach请求第四项时,当请求到达 时limit,它知道它已经给出了它允许给出的所有项。

Thus, it is not asking the "A" peek for another item. It immediately indicates that its items are exhausted, and thus, no more actions are performed and forEachterminates.

因此,它不会要求“A”查看另一个项目。它立即指示其项目已用完,因此不再执行任何操作并forEach终止。

What happens in the second pipeline?

第二个管道会发生什么?

    Stream.of(1,2,3,4,5,6,7,8,9)
    .peek(x->System.out.print("\nA"+x))
    .skip(6)
    .peek(x->System.out.print("B"+x))
    .forEach(x->System.out.print("C"+x));

Again, forEachis asking for the first item. This is propagated back. But when it gets to the skip, it knows it has to ask for 6 items from its upstream before it can pass one downstream. So it makes a request upstream from the "A" peek, consumes it without passing it downstream, makes another request, and so on. So the "A" peek gets 6 requests for an item and produces 6 prints, but these items are not passed down.

再次,forEach要求第一项。这是传播回来的。但是当它到达 时skip,它知道它必须从上游请求 6 个项目,然后才能通过一个下游。因此,它从 "A" 向上游发出请求,在peek不向下游传递的情况下消耗它,发出另一个请求,依此类推。因此,“A” peek 会收到 6 个对某个项目的请求并生成 6 个打印件,但这些项目并未传递下去。

A1
A2
A3
A4
A5
A6

On the 7th request made by skip, the item is passed down to the "B" peek and from it to the forEach, so the full print is done:

在由 发出的第 7 个请求中skip,该项目被传递到“B” peek 并从它传递到forEach,因此完成了完整的打印:

A7B7C7

Then it's just like before. The skipwill now, whenever it gets a request, ask for an item upstream and pass it downstream, as it "knows" it has already done its skipping job. So the rest of the prints are going through the entire pipe, until the source is exhausted.

然后就像以前一样。在skip现在,一旦进入一个请求,要求一个项目的上游和下游传递,因为它“知道”它已经完成了它的跳跃任务。因此,其余的打印件将通过整个管道,直到源用尽。

回答by Lukas Eder

The fluent notation of the streamed pipeline is what's causing this confusion. Think about it this way:

流式管道的流畅符号是造成这种混乱的原因。这样想:

limit(3)

limit(3)

All the pipelined operations are evaluated lazily, except forEach(), which is a terminal operation, which triggers "execution of the pipeline".

所有流水线操作都是惰性求值的,除了forEach(),这是一个终端操作,它触发“流水线的执行”

When the pipeline is executed, intermediary stream definitions will not make any assumptions about what happens "before"or "after". All they're doing is take an input stream and transform it into an output stream:

执行管道时,中间流定义不会对“之前”“之后”发生的事情做出任何假设。他们所做的只是获取输入流并将其转换为输出流:

Stream<Integer> s1 = Stream.of(1,2,3,4,5,6,7,8,9);
Stream<Integer> s2 = s1.peek(x->System.out.print("\nA"+x));
Stream<Integer> s3 = s2.limit(3);
Stream<Integer> s4 = s3.peek(x->System.out.print("B"+x));

s4.forEach(x->System.out.print("C"+x));
  • s1contains 9 different Integervalues.
  • s2peeks at all values that pass it and prints them.
  • s3passes the first 3 values to s4and aborts the pipeline after the third value. No further values are produced by s3. This doesn't mean that no more values are in the pipeline. s2would still produce (and print) more values, but no one requests those values and thus execution stops.
  • s4again peeks at all values that pass it and prints them.
  • forEachconsumes and prints whatever s4passes to it.
  • s1包含 9 个不同的Integer值。
  • s2查看通过它的所有值并打印它们。
  • s3将前 3 个值传递给s4并在第三个值之后中止管道。不再产生其他值s3这并不意味着没有更多的值在管道中。s2仍然会产生(和打印)更多的值,但没有人请求这些值,因此执行停止。
  • s4再次查看通过它的所有值并打印它们。
  • forEach消耗并打印s4传递给它的任何内容。

Think about it this way. The whole stream is completely lazy. Only the terminal operation actively pullsnew values from the pipeline. After it has pulled 3 values from s4 <- s3 <- s2 <- s1, s3will no longer produce new values and it will no longer pull any values from s2 <- s1. While s1 -> s2would still be able to produce 4-9, those values are just never pulled from the pipeline, and thus never printed by s2.

这样想想。整个流是完全懒惰的。只有终端操作主动从管道中提取新值。从 中提取 3 个值后s4 <- s3 <- s2 <- s1s3将不再产生新值,也不再从 中提取任何值s2 <- s1。虽然s1 -> s2仍然能够产生4-9,但这些值永远不会从管道中提取,因此永远不会被 打印s2

skip(6)

skip(6)

With skip()the same thing happens:

有了skip()同样的事情发生:

Stream<Integer> s1 = Stream.of(1,2,3,4,5,6,7,8,9);
Stream<Integer> s2 = s1.peek(x->System.out.print("\nA"+x));
Stream<Integer> s3 = s2.skip(6);
Stream<Integer> s4 = s3.peek(x->System.out.print("B"+x));

s4.forEach(x->System.out.print("C"+x));
  • s1contains 9 different Integervalues.
  • s2peeks at all values that pass it and prints them.
  • s3consumes the first 6 values, "skipping them", which means the first 6 values aren't passed to s4, only the subsequent values are.
  • s4again peeks at all values that pass it and prints them.
  • forEachconsumes and prints whatever s4passes to it.
  • s1包含 9 个不同的Integer值。
  • s2查看通过它的所有值并打印它们。
  • s3消耗前 6 个值,“跳过它们”,这意味着前 6 个值不会传递给s4,只有后续值传递给。
  • s4再次查看通过它的所有值并打印它们。
  • forEach消耗并打印s4传递给它的任何内容。

The important thing here is that s2is not aware of the remaining pipeline skipping any values. s2peeks at all values independently of what happens afterwards.

这里重要的s2是不知道剩余的管道跳过任何值。s2独立于之后发生的事情查看所有值。

Another example:

另一个例子:

Consider this pipeline, which is listed in this blog post

考虑这个管道,它在这篇博文中列出

IntStream.iterate(0, i -> ( i + 1 ) % 2)
         .distinct()
         .limit(10)
         .forEach(System.out::println);

When you execute the above, the program will never halt. Why? Because:

当您执行上述操作时,程序将永远不会停止。为什么?因为:

IntStream i1 = IntStream.iterate(0, i -> ( i + 1 ) % 2);
IntStream i2 = i1.distinct();
IntStream i3 = i2.limit(10);

i3.forEach(System.out::println);

Which means:

意思是:

  • i1generates an infinite amount of alternating values: 0, 1, 0, 1, 0, 1, ...
  • i2consumes all values that have been encountered before, passing on only "new"values, i.e. there are a total of 2 values coming out of i2.
  • i3passes on 10 values, then stops.
  • i1生成无限数量的交替值:0, 1, 0, 1, 0, 1, ...
  • i2消耗之前遇到的所有值,仅传递“新”值,即总共有 2 个值来自i2.
  • i3传递 10 个值,然后停止。

This algorithm will never stop, because i3waits for i2to produce 8 more values after 0and 1, but those values never appear, while i1never stops feeding values to i2.

这个算法永远不会停止,因为在和之后i3等待i2再产生 8 个值,但这些值永远不会出现,而永远不会停止向 提供值。01i1i2

It doesn't matter that at some point in the pipeline, more than 10 values had been produced. All that matters is that i3has never seen those 10 values.

在管道的某个时刻,产生了 10 个以上的值并不重要。重要的是i3从未见过这 10 个值。

To answer your question:

回答你的问题:

Is it just that "every action before skip is executed while not everyone before limit is"?

只是“跳过之前的每个动作都执行而限制之前的每个动作都不是”?

Nope. All operations before either skip()or limit()are executed. In both of your executions, you get A1- A3. But limit()may short-circuit the pipeline, aborting value consumption once the event of interest (the limit is reached) has occurred.

不。执行skip()or之前的所有操作limit()。在你的两次执行中,你都会得到A1- A3。但limit()可能会短路管道,一旦发生感兴趣的事件(达到限制),就中止价值消耗。

回答by Tagir Valeev

All streams are based on spliterators, which have basically two operations: advance (move forward one element, similar to iterator) and split (divide oneself in arbitrary position, which is suitable for parallel processing). You can stop taking input elements at any moment you like (which is done by limit), but you cannot just jump to the arbitrary position (there's no such operation in Spliteratorinterface). Thus skipoperation need to actually read the first elements from the source just to ignore them. Note that in some cases you can perform actual jump:

所有的流都是基于spliterators的,它基本上有两个操作:advanced(向前移动一个元素,类似于iterator)和split(将自己分割在任意位置,适合并行处理)。您可以随时停止获取输入元素(由 完成limit),但您不能只是跳转到任意位置(Spliterator界面中没有这样的操作)。因此skip操作需要从源中实际读取第一个元素来忽略它们。请注意,在某些情况下,您可以执行实际跳转:

List<Integer> list = Arrays.asList(1,2,3,4,5,6,7,8,9);

list.stream().skip(3)... // will read 1,2,3, but ignore them
list.subList(3, list.size()).stream()... // will actually jump over the first three elements

回答by Amm Sokun

It is complete blasphemy to look at steam operations individually because that is not how a stream is evaluated.

单独查看蒸汽操作完全是亵渎神灵,因为这不是评估流的方式。

Talking about limit(3), it is a short circuit operation, which makes sense because thinking about it, whatever operation is beforeand afterthe limit, having a limit in a stream would stop iteration after getting nelements tillthe limit operation, but this doesn't mean that only n stream elements would be processed. Take this different stream operation for an example

谈到limit(3),这是一个短路操作,这是有道理的,因为考虑一下,无论之前之后的操作是什么limit,在流中具有限制都会在获得n 个元素直到限制操作后停止迭代,但是这并不意味着只会处理 n 个流元素。以这个不同的流操作为例

public class App 
{
    public static void main(String[] args) {
        Stream.of(1,2,3,4,5,6,7,8,9)
        .peek(x->System.out.print("\nA"+x))
        .filter(x -> x%2==0)
        .limit(3)
        .peek(x->System.out.print("B"+x))
        .forEach(x->System.out.print("C"+x));
    }
}

would output

会输出

A1
A2B2C2
A3
A4B4C4
A5
A6B6C6

which seem right, because limit is waiting for 3 stream elements to pass through the operation chain, although 6 elements of stream are processed.

这看起来是对的,因为 limit 正在等待 3 个流元素通过操作链,尽管处理了 6 个流元素。

回答by yaccob

Maybe this little diagram helps to get some natural "feeling" for how the stream is processed.

也许这个小图有助于对流的处理方式获得一些自然的“感觉”。

The first line =>8=>=7=...===depicts the stream. The elements 1..8 are flowing from the left to the right. There are three "windows":

第一行=>8=>=7=......===描绘了流。元素 1..8 从左向右流动。共有三个“窗口”:

  1. In the first window (peek A) you see everything
  2. In the second window (skip 6or limit 3) a kind of filtering is done. Either the first or the last elements are "eliminated" - means not passed on for further processing.
  3. In the third window you see only those items that were passed on
  1. 在第一个窗口 ( peek A) 中,您可以看到所有内容
  2. 在第二个窗口(skip 6limit 3)中进行了一种过滤。第一个或最后一个元素被“消除” - 意味着不会传递给进一步处理。
  3. 在第三个窗口中,您只能看到传递的那些项目

┌────────────────────────────────────────────────────────────────────────────┐ │ │ │?????????????????????????????????? ??????????? ?????????? ????????? │ │ 8 7 6 5 4 3 2 1 │ │?????????????????????????????????? ▲ ??????????? ▲ ?????????? ▲ ????????? │ │ │ │ │ │ │ │ skip 6 │ │ │ peek A limit 3 peek B │ └────────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────────┐ │ │ │?????????????????????????????????? ??????????? ?????????? ????????? │ │ 8 7 6 5 4 3 2 1 │ │?????????????????????????????????? ▲ ??????????? ▲ ?????????? ▲ ????????? │ │ │ │ │ │ │ │ skip 6 │ │ │ peek A limit 3 peek B │ └────────────────────────────────────────────────────────────────────────────┘

Probably not everything (maybe not even anything) in this explanation is technically completely correct. But when I see it like this it's quite clear to me what items reach which of the concatenated instructions.

在这个解释中可能并非所有(甚至可能不是任何)在技术上都是完全正确的。但是当我这样看到它时,我很清楚哪些项目达到了哪些串联指令。