Java 沿元素将列表拆分为子列表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29095967/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 07:21:27  来源:igfitidea点击:

Splitting List into sublists along elements

javalistjava-8collectors

提问by Oneiros

I have this list (List<String>):

我有这个列表(List<String>):

["a", "b", null, "c", null, "d", "e"]

And I'd like something like this:

我想要这样的东西:

[["a", "b"], ["c"], ["d", "e"]]

In other words I want to split my list in sublists using the nullvalue as separator, in order to obtain a list of lists (List<List<String>>). I'm looking for a Java 8 solution. I've tried with Collectors.partitioningBybut I'm not sure it is what I'm looking for. Thanks!

换句话说,我想使用该null值作为分隔符将我的列表拆分为子列表,以获得列表列表 ( List<List<String>>)。我正在寻找 Java 8 解决方案。我试过,Collectors.partitioningBy但我不确定这是我要找的。谢谢!

采纳答案by Alexis C.

The only solution I come up with for the moment is by implementing your own custom collector.

我目前想出的唯一解决方案是实现您自己的自定义收集器。

Before reading the solution, I want to add a few notes about this. I took this question more as a programming exercise, I'm not sure if it can be done with a parallel stream.

在阅读解决方案之前,我想为此添加一些注释。我把这个问题更多地作为一个编程练习,我不确定它是否可以用并行流来完成。

So you have to be aware that it'll silently breakif the pipeline is run in parallel.

所以你必须知道,如果管道并行运行,它会悄悄地中断

This is nota desirable behavior and should be avoided. This is why I throw an exception in the combiner part (instead of (l1, l2) -> {l1.addAll(l2); return l1;}), as it's used in parallel when combining the two lists, so that you have an exception instead of a wrong result.

不是一种理想的行为,应该避免。这就是我在组合器部分(而不是(l1, l2) -> {l1.addAll(l2); return l1;})中抛出异常的原因,因为它在组合两个列表时并行使用,因此您会得到异常而不是错误结果。

Also this is not very efficient due to list copying (although it uses a native method to copy the underlying array).

由于列表复制,这也不是很有效(尽管它使用本机方法复制底层数组)。

So here's the collector implementation:

所以这是收集器的实现:

private static Collector<String, List<List<String>>, List<List<String>>> splitBySeparator(Predicate<String> sep) {
    final List<String> current = new ArrayList<>();
    return Collector.of(() -> new ArrayList<List<String>>(),
        (l, elem) -> {
            if (sep.test(elem)) {
                l.add(new ArrayList<>(current));
                current.clear();
            }
            else {
                current.add(elem);
            }
        },
        (l1, l2) -> {
            throw new RuntimeException("Should not run this in parallel");
        },
        l -> {
            if (current.size() != 0) {
                l.add(current);
                return l;
            }
        );
}

and how to use it:

以及如何使用它:

List<List<String>> ll = list.stream().collect(splitBySeparator(Objects::isNull));

Output:

输出:

[[a, b], [c], [d, e]]



作为 answer of Joop Eggen is outJoop Eggen 的答案出来了,看来它可以并行完成(为此归功于他!)。有了它,它将自定义收集器实现减少到:

private static Collector<String, List<List<String>>, List<List<String>>> splitBySeparator(Predicate<String> sep) {
    return Collector.of(() -> new ArrayList<List<String>>(Arrays.asList(new ArrayList<>())),
                        (l, elem) -> {if(sep.test(elem)){l.add(new ArrayList<>());} else l.get(l.size()-1).add(elem);},
                        (l1, l2) -> {l1.get(l1.size() - 1).addAll(l2.remove(0)); l1.addAll(l2); return l1;});
}

which let the paragraph about parallelism a bit obsolete, however I let it as it can be a good reminder.

这让关于并行性的段落有点过时了,但是我让它成为一个很好的提醒。



Note that the Stream API is not always a substitute. There are tasks that are easier and more suitable using the streams and there are tasks that are not. In your case, you could also create a utility method for that:

请注意,Stream API 并不总是替代品。有些任务更容易、更适合使用流,有些任务则不然。在您的情况下,您还可以为此创建一个实用程序方法:

private static <T> List<List<T>> splitBySeparator(List<T> list, Predicate<? super T> predicate) {
    final List<List<T>> finalList = new ArrayList<>();
    int fromIndex = 0;
    int toIndex = 0;
    for(T elem : list) {
        if(predicate.test(elem)) {
            finalList.add(list.subList(fromIndex, toIndex));
            fromIndex = toIndex + 1;
        }
        toIndex++;
    }
    if(fromIndex != toIndex) {
        finalList.add(list.subList(fromIndex, toIndex));
    }
    return finalList;
}

and call it like List<List<String>> list = splitBySeparator(originalList, Objects::isNull);.

并称之为List<List<String>> list = splitBySeparator(originalList, Objects::isNull);.

It can be improved for checking edge-cases.

它可以改进以检查边缘情况。

回答by Rohit Jain

Here's another approach, which uses a grouping function, which makes use of list indices for grouping.

这是另一种方法,它使用分组函数,它利用列表索引进行分组。

Here I'm grouping the element by the first index following that element, with value null. So, in your example, "a"and "b"would be mapped to 2. Also, I'm mapping nullvalue to -1index, which should be removed later on.

在这里,我按该元素后面的第一个索引对元素进行分组,值为null。因此,在您的示例中,"a"and"b"将映射到2. 另外,我正在将null值映射到-1索引,稍后应该将其删除。

List<String> list = Arrays.asList("a", "b", null, "c", null, "d", "e");

Function<String, Integer> indexGroupingFunc = (str) -> {
             if (str == null) {
                 return -1;
             }
             int index = list.indexOf(str) + 1;
             while (index < list.size() && list.get(index) != null) {
                 index++;
             }
             return index;
         };

Map<Integer, List<String>> grouped = list.stream()
               .collect(Collectors.groupingBy(indexGroupingFunc));

grouped.remove(-1);  // Remove null elements grouped under -1
System.out.println(grouped.values()); // [[a, b], [c], [d, e]]

You can also avoid getting the first index of nullelement every time, by caching the current min index in an AtomicInteger. The updated Functionwould be like:

您还可以避免null每次都获取元素的第一个索引,方法是将当前最小索引缓存在AtomicInteger. 更新Function后将如下所示:

AtomicInteger currentMinIndex = new AtomicInteger(-1);

Function<String, Integer> indexGroupingFunc = (str) -> {
        if (str == null) {
            return -1;
        }
        int index = names.indexOf(str) + 1;

        if (currentMinIndex.get() > index) {
            return currentMinIndex.get();
        } else {
            while (index < names.size() && names.get(index) != null) {
              index++;
            }
            currentMinIndex.set(index);
            return index;
        }
    };

回答by Arnaud Denoyelle

Please do not vote. I do not have enough place to explain this in comments.

请不要投票。我没有足够的地方在评论中解释这一点

This is a solution with a Streamand a foreachbut this is strictly equivalent to Alexis's solution or a foreachloop (and less clear, and I could not get rid of the copy constructor) :

这是一个带有 aStream和 a的解决方案,foreach但这严格等同于 Alexis 的解决方案或foreach循环(不太清楚,我无法摆脱复制构造函数):

List<List<String>> result = new ArrayList<>();
final List<String> current = new ArrayList<>();
list.stream().forEach(s -> {
      if (s == null) {
        result.add(new ArrayList<>(current));
        current.clear();
      } else {
        current.add(s);
      }
    }
);
result.add(current);

System.out.println(result);

I understand that you want to find a more elegant solution with Java 8 but I truly think that it has not been designed for this case. And as said by Mr spoon, highly prefer the naive way in this case.

我知道您想用 Java 8 找到更优雅的解决方案,但我确实认为它不是为这种情况设计的。正如勺子先生所说,在这种情况下非常喜欢天真的方式。

回答by Joop Eggen

The solution is to use Stream.collect. To create a Collector using its builder pattern is already given as solution. The alternative is the other overloaded collectbeing a tiny bit more primitive.

解决方案是使用Stream.collect. 已经给出了使用其构建器模式创建收集器的解决方案。另一种选择是另一个重载collect更原始一点。

    List<String> strings = Arrays.asList("a", "b", null, "c", null, "d", "e");
    List<List<String>> groups = strings.stream()
            .collect(() -> {
                List<List<String>> list = new ArrayList<>();
                list.add(new ArrayList<>());
                return list;
            },
            (list, s) -> {
                if (s == null) {
                    list.add(new ArrayList<>());
                } else {
                    list.get(list.size() - 1).add(s);
                }
            },
            (list1, list2) -> {
                // Simple merging of partial sublists would
                // introduce a false level-break at the beginning.
                list1.get(list1.size() - 1).addAll(list2.remove(0));
                list1.addAll(list2);
            });

As one sees, I make a list of string lists, where there always is at least one last (empty) string list.

正如人们所见,我制作了一个字符串列表列表,其中总是至少有一个最后的(空)字符串列表。

  • The first function creates a starting list of string lists. It specifies the result (typed) object.
  • The second function is called to process each element. It is an action on the partial result and an element.
  • The third is not really used, it comes into play on parallelising the processing, when partial results must be combined.
  • 第一个函数创建字符串列表的起始列表。它指定结果(类型化)对象。
  • 调用第二个函数来处理每个元素。它是对部分结果和元素的操作。
  • 第三个没有真正使用,当必须合并部分结果时,它在并行处理中起作用。


A solution with an accumulator:

带累加器的解决方案:

As @StuartMarks points out, the combiner does not fullfill the contract for parallelism.

正如@StuartMarks 指出的那样,组合器并没有完成并行性契约。

Due to the comment of @ArnaudDenoyelle a version using reduce.

由于@ArnaudDenoyelle 的评论,一个使用reduce.

    List<List<String>> groups = strings.stream()
            .reduce(new ArrayList<List<String>>(),
                    (list, s) -> {
                        if (list.isEmpty()) {
                            list.add(new ArrayList<>());
                        }
                        if (s == null) {
                            list.add(new ArrayList<>());
                        } else {
                            list.get(list.size() - 1).add(s);
                        }
                        return list;
                    },
                    (list1, list2) -> {
                            list1.addAll(list2);
                            return list1;
                    });
  • The first parameter is the accumulated object.
  • The second function accumulates.
  • The third is the aforementioned combiner.
  • 第一个参数是累积的对象。
  • 第二个函数累积。
  • 第三种是前面提到的合路器。

回答by Flown

This is a very interesting problem. I came up with a one line solution. It might not very performant but it works.

这是一个非常有趣的问题。我想出了一个单行解决方案。它的性能可能不是很好,但它有效。

List<String> list = Arrays.asList("a", "b", null, "c", null, "d", "e");
Collection<List<String>> cl = IntStream.range(0, list.size())
    .filter(i -> list.get(i) != null).boxed()
    .collect(Collectors.groupingBy(
        i -> IntStream.range(0, i).filter(j -> list.get(j) == null).count(),
        Collectors.mapping(i -> list.get(i), Collectors.toList()))
    ).values();

It is a similar idea that @Rohit Jain came up with. I'm grouping the space between the null values. If you really want a List<List<String>>you may append:

这是@Rohit Jain 提出的类似想法。我正在对空值之间的空间进行分组。如果你真的想要一个List<List<String>>你可以附加:

List<List<String>> ll = cl.stream().collect(Collectors.toList());

回答by Stuart Marks

Although there are several answers already, and an accepted answer, there are still a couple points missing from this topic. First, the consensus seems to be that solving this problem using streams is merely an exercise, and that the conventional for-loop approach is preferable. Second, the answers given thus far have overlooked an approach using array or vector-style techniques that I think improves the streams solution considerably.

尽管已经有几个答案,并且是公认的答案,但该主题仍然缺少一些要点。首先,共识似乎是使用流解决这个问题只是一种练习,传统的 for-loop 方法更可取。其次,迄今为止给出的答案忽略了使用数组或矢量样式技术的方法,我认为这种方法大大改进了流解决方案。

First, here's a conventional solution, for purposes of discussion and analysis:

首先,这是一个传统的解决方案,用于讨论和分析:

static List<List<String>> splitConventional(List<String> input) {
    List<List<String>> result = new ArrayList<>();
    int prev = 0;

    for (int cur = 0; cur < input.size(); cur++) {
        if (input.get(cur) == null) {
            result.add(input.subList(prev, cur));
            prev = cur + 1;
        }
    }
    result.add(input.subList(prev, input.size()));

    return result;
}

This is mostly straightforward but there's a bit of subtlety. One point is that a pending sublist from prevto curis always open. When we encounter nullwe close it, add it to the result list, and advance prev. After the loop we close the sublist unconditionally.

这主要是直截了当的,但也有一些微妙之处。一点是来自prevto的待处理子列表cur始终是打开的。当我们遇到null我们关闭它,将它添加到结果列表中,然后前进prev。在循环之后,我们无条件地关闭子列表。

Another observation is that this is a loop over indexes, not over the values themselves, thus we use an arithmetic for-loop instead of the enhanced "for-each" loop. But it suggests that we can stream using the indexes to generate subranges instead of streaming over values and putting the logic into the collector (as was done by Joop Eggen's proposed solution).

另一个观察结果是,这是对索引的循环,而不是对值本身的循环,因此我们使用算术 for 循环而不是增强的“for-each”循环。但它表明我们可以使用索引进行流式传输来生成子范围,而不是流式传输值并将逻辑放入收集器中(正如Joop Eggen 提出的解决方案所做的那样)。

Once we've realized that, we can see that each position of nullin the input is the delimiter for a sublist: it's the right end of the sublist to the left, and it (plus one) is the left end of the sublist to the right. If we can handle the edge cases, it leads to an approach where we find the indexes at which nullelements occur, map them to sublists, and collect the sublists.

一旦我们意识到这一点,我们可以看到null输入中的每个位置都是子列表的分隔符:它是子列表左侧的右端,它(加一)是子列表的左端到对。如果我们可以处理边缘情况,那么它会导致我们找到null元素出现的索引,将它们映射到子列表并收集子列表的方法。

The resulting code is as follows:

结果代码如下:

static List<List<String>> splitStream(List<String> input) {
    int[] indexes = Stream.of(IntStream.of(-1),
                              IntStream.range(0, input.size())
                                       .filter(i -> input.get(i) == null),
                              IntStream.of(input.size()))
                          .flatMapToInt(s -> s)
                          .toArray();

    return IntStream.range(0, indexes.length-1)
                    .mapToObj(i -> input.subList(indexes[i]+1, indexes[i+1]))
                    .collect(toList());
}

Getting the indexes at which nulloccurs is pretty easy. The stumbling block is adding -1at the left and sizeat the right end. I've opted to use Stream.ofto do the appending and then flatMapToIntto flatten them out. (I tried several other approaches but this one seemed like the cleanest.)

获取null发生的索引非常容易。绊脚石是-1在左端和size右端添加。我选择使用Stream.of来进行追加,然后flatMapToInt将它们展平。(我尝试了其他几种方法,但这种方法似乎最干净。)

It's a bit more convenient to use arrays for the indexes here. First, the notation for accessing an array is nicer than for a List: indexes[i]vs. indexes.get(i). Second, using an array avoids boxing.

在这里使用数组作为索引会更方便一些。首先,访问数组的符号比列表更好:indexes[i]vs indexes.get(i). . 其次,使用数组可以避免装箱。

At this point, each index value in the array (except for the last) is one less than the beginning position of a sublist. The index to its immediate right is the end of the sublist. We simply stream over the array and map each pair of indexes into a sublist and collect the output.

此时,数组中的每个索引值(除了最后一个)都比子列表的开始位置少一个。紧邻其右侧的索引是子列表的结尾。我们简单地遍历数组并将每对索引映射到一个子列表中并收集输出。

Discussion

讨论

The streams approach is slightly shorter than the for-loop version, but it's denser. The for-loop version is familiar, because we do this stuff in Java all the time, but if you're not already aware of what this loop is supposed to be doing, it's not obvious. You might have to simulate a few loop executions before you figure out what previs doing and why the open sublist has to be closed after the end of the loop. (I initially forgot to have it, but I caught this in testing.)

流方法比 for 循环版本略短,但更密集。for 循环版本很熟悉,因为我们一直在 Java 中做这些事情,但是如果您还没有意识到这个循环应该做什么,这并不明显。您可能需要模拟一些循环执行,然后才能弄清楚prev正在做什么以及为什么必须在循环结束后关闭打开的子列表。(我最初忘记了它,但我在测试中发现了这一点。)

The streams approach is, I think, easier to conceptualize what's going on: get a list (or an array) that indicates the boundaries between sublists. That's an easy streams two-liner. The difficulty, as I mentioned above, is finding a way to tack the edge values onto the ends. If there were a better syntax for doing this, e.g.,

我认为,流方法更容易概念化正在发生的事情:获取指示子列表之间边界的列表(或数组)。这是一个简单的流两行。正如我上面提到的,困难在于找到一种方法将边缘值添加到末端。如果有更好的语法来执行此操作,例如,

    // Java plus pidgin Scala
    int[] indexes =
        [-1] ++ IntStream.range(0, input.size())
                         .filter(i -> input.get(i) == null) ++ [input.size()];

it would make things a lot less cluttered. (What we really need is array or list comprehension.) Once you have the indexes, it's a simple matter to map them into actual sublists and collect them into the result list.

它会让事情变得不那么混乱。(我们真正需要的是数组或列表理解。)一旦有了索引,将它们映射到实际的子列表并将它们收集到结果列表中就很简单了。

And of course this is safe when run in parallel.

当然,这在并行运行时是安全的。

UPDATE 2016-02-06

更新 2016-02-06

Here's a nicer way to create the array of sublist indexes. It's based on the same principles, but it adjusts the index range and adds some conditions to the filter to avoid having to concatenate and flatmap the indexes.

这是创建子列表索引数组的更好方法。它基于相同的原理,但它调整了索引范围并向过滤器添加了一些条件,以避免必须连接和平面映射索引。

static List<List<String>> splitStream(List<String> input) {
    int sz = input.size();
    int[] indexes =
        IntStream.rangeClosed(-1, sz)
                 .filter(i -> i == -1 || i == sz || input.get(i) == null)
                 .toArray();

    return IntStream.range(0, indexes.length-1)
                    .mapToObj(i -> input.subList(indexes[i]+1, indexes[i+1]))
                    .collect(toList());
}

UPDATE 2016-11-23

更新 2016-11-23

I co-presented a talk with Brian Goetz at Devoxx Antwerp 2016, "Thinking In Parallel" (video) that featured this problem and my solutions. The problem presented there is a slight variation that splits on "#" instead of null, but it's otherwise the same. In the talk, I mentioned that I had a bunch of unit tests for this problem. I've appended them below, as a standalone program, along with my loop and streams implementations. An interesting exercise for readers is to run solutions proposed in other answers against the test cases I've provided here, and to see which ones fail and why. (The other solutions will have to be adapted to split based on a predicate instead of splitting on null.)

我在 Devoxx Antwerp 2016 上与 Brian Goetz 共同发表了一个演讲“并行思考”(视频),其中介绍了这个问题和我的解决方案。出现的问题有一个轻微的变化,在“#”而不是空值上分裂,但在其他方面是相同的。在演讲中,我提到我针对这个问题进行了大量单元测试。我已将它们作为独立程序以及我的循环和流实现附加在下面。对读者来说,一个有趣的练习是针对我在此处提供的测试用例运行其他答案中提出的解决方案,并查看哪些失败以及原因。(其他解决方案必须适应基于谓词的拆分,而不是基于 null 的拆分。)

import java.util.*;
import java.util.function.*;
import java.util.stream.*;

import static java.util.Arrays.asList;

public class ListSplitting {
    static final Map<List<String>, List<List<String>>> TESTCASES = new LinkedHashMap<>();
    static {
        TESTCASES.put(asList(),
                  asList(asList()));
        TESTCASES.put(asList("a", "b", "c"),
                  asList(asList("a", "b", "c")));
        TESTCASES.put(asList("a", "b", "#", "c", "#", "d", "e"),
                  asList(asList("a", "b"), asList("c"), asList("d", "e")));
        TESTCASES.put(asList("#"),
                  asList(asList(), asList()));
        TESTCASES.put(asList("#", "a", "b"),
                  asList(asList(), asList("a", "b")));
        TESTCASES.put(asList("a", "b", "#"),
                  asList(asList("a", "b"), asList()));
        TESTCASES.put(asList("#"),
                  asList(asList(), asList()));
        TESTCASES.put(asList("a", "#", "b"),
                  asList(asList("a"), asList("b")));
        TESTCASES.put(asList("a", "#", "#", "b"),
                  asList(asList("a"), asList(), asList("b")));
        TESTCASES.put(asList("a", "#", "#", "#", "b"),
                  asList(asList("a"), asList(), asList(), asList("b")));
    }

    static final Predicate<String> TESTPRED = "#"::equals;

    static void testAll(BiFunction<List<String>, Predicate<String>, List<List<String>>> f) {
        TESTCASES.forEach((input, expected) -> {
            List<List<String>> actual = f.apply(input, TESTPRED);
            System.out.println(input + " => " + expected);
            if (!expected.equals(actual)) {
                System.out.println("  ERROR: actual was " + actual);
            }
        });
    }

    static <T> List<List<T>> splitStream(List<T> input, Predicate<? super T> pred) {
        int[] edges = IntStream.range(-1, input.size()+1)
                               .filter(i -> i == -1 || i == input.size() ||
                                       pred.test(input.get(i)))
                               .toArray();

        return IntStream.range(0, edges.length-1)
                        .mapToObj(k -> input.subList(edges[k]+1, edges[k+1]))
                        .collect(Collectors.toList());
    }

    static <T> List<List<T>> splitLoop(List<T> input, Predicate<? super T> pred) {
        List<List<T>> result = new ArrayList<>();
        int start = 0;

        for (int cur = 0; cur < input.size(); cur++) {
            if (pred.test(input.get(cur))) {
                result.add(input.subList(start, cur));
                start = cur + 1;
            }
        }
        result.add(input.subList(start, input.size()));

        return result;
    }

    public static void main(String[] args) {
        System.out.println("===== Loop =====");
        testAll(ListSplitting::splitLoop);
        System.out.println("===== Stream =====");
        testAll(ListSplitting::splitStream);
    }
}

回答by Bohemian

Well, after a bit of work U have come up with a one-line stream-based solution. It ultimately uses reduce()to do the grouping, which seemed the natural choice, but it was a bit ugly getting the strings into the List<List<String>>required by reduce:

嗯,经过一些工作,你想出了一个基于单行流的解决方案。它最终用于reduce()进行分组,这似乎是很自然的选择,但是将字符串放入List<List<String>>reduce 所需的内容有点难看:

List<List<String>> result = list.stream()
  .map(Arrays::asList)
  .map(x -> new LinkedList<String>(x))
  .map(Arrays::asList)
  .map(x -> new LinkedList<List<String>>(x))
  .reduce( (a, b) -> {
    if (b.getFirst().get(0) == null) 
      a.add(new LinkedList<String>());
    else
      a.getLast().addAll(b.getFirst());
    return a;}).get();

It ishowever 1 line!

然而它1行!

When run with input from the question,

当使用问题的输入运行时,

System.out.println(result);

Produces:

产生:

[[a, b], [c], [d, e]]

回答by Tagir Valeev

In my StreamExlibrary there's a groupRunsmethod which can help you to solve this:

在我的StreamEx库中,有一种groupRuns方法可以帮助您解决这个问题:

List<String> input = Arrays.asList("a", "b", null, "c", null, "d", "e");
List<List<String>> result = StreamEx.of(input)
        .groupRuns((a, b) -> a != null && b != null)
        .remove(list -> list.get(0) == null).toList();

The groupRunsmethod takes a BiPredicatewhich for the pair of adjacent elements returns true if they should be grouped. After that we remove groups containing nulls and collect the rest to the List.

如果应该对相邻元素进行分组,则该groupRuns方法采用BiPredicatewhich 对相邻元素返回 true 。之后,我们删除包含空值的组并将其余的组收集到列表中。

This solution is parallel-friendly: you may use it for parallel stream as well. Also it works nice with any stream source (not only random access lists like in some other solutions) and it's somewhat better than collector-based solutions as here you can use any terminal operation you want without intermediate memory waste.

此解决方案是并行友好的:您也可以将其用于并行流。此外,它适用于任何流源(不仅仅是一些其他解决方案中的随机访问列表),并且比基于收集器的解决方案要好一些,因为在这里您可以使用任何您想要的终端操作而不会浪费中间内存。

回答by Nick Vanderhoven

Although the answer of Marks Stuartis concise, intuitive and parallel safe (and the best), I want to share another interesting solution that doesn't need the start/end boundaries trick.

尽管Marks Stuart答案简洁、直观且并行安全(并且是最好的),但我想分享另一个不需要开始/结束边界技巧的有趣解决方案。

If we look at the problem domain and think about parallelism, we can easy solve this with a divide-and-conquer strategy. Instead of thinking about the problem as a serial list we have to traverse, we can look at the problem as a composition of the same basic problem: splitting a list at a nullvalue. We can intuitively see quite easily that we can recursively break down the problem with the following recursive strategy:

如果我们查看问题域并考虑并行性,我们可以使用分而治之的策略轻松解决这个问题。与其将问题视为我们必须遍历的串行列表,我们还可以将问题视为相同基本问题的组合:在一个null值处拆分列表。我们可以很容易地直观地看到,我们可以使用以下递归策略递归地分解问题:

split(L) :
  - if (no null value found) -> return just the simple list
  - else -> cut L around 'null' naming the resulting sublists L1 and L2
            return split(L1) + split(L2)

In this case, we first search any nullvalue and the moment find one, we immediately cut the list and invoke a recursive call on the sublists. If we don't find null(the base case), we are finished with this branch and just return the list. Concatenating all the results will return the List we are searching for.

在这种情况下,我们首先搜索任何null值,一旦找到,我们立即剪切列表并对子列表调用递归调用。如果我们没有找到null(基本情况),我们就完成了这个分支,只返回列表。连接所有结果将返回我们正在搜索的列表。

A picture is worth a thousand words:

一张图片胜过千言万语:

enter image description here

在此处输入图片说明

The algorithm is simple and complete: we don't need any special tricks to handle the edge cases of the start/end of the list. We don't need any special tricks to handle edge cases such as empty lists, or lists with only nullvalues. Or lists ending with nullor starting with null.

该算法简单而完整:我们不需要任何特殊技巧来处理列表开始/结束的边缘情况。我们不需要任何特殊技巧来处理边缘情况,例如空列表或只有null值的列表。或以 结尾null或开头的列表null

A simple naive implementation of this strategy looks as follows:

该策略的简单实现如下所示:

public List<List<String>> split(List<String> input) {

    OptionalInt index = IntStream.range(0, input.size())
                                 .filter(i -> input.get(i) == null)
                                 .findAny();

    if (!index.isPresent())
        return asList(input);

    List<String> firstHalf  = input.subList(0, index.getAsInt());
    List<String> secondHalf = input.subList(index.getAsInt()+1, input.size());

    return asList(firstHalf, secondHalf).stream()
                 .map(this::split)
                 .flatMap(List::stream)
                 .collect(toList());

}

We first search for the index of any nullvalue in the list. If we don't find one, we return the list. If we find one, we split the list in 2 sublists, stream over them and recursively call the splitmethod again. The resulting lists of the sub-problem are then extracted and combined for the return value.

我们首先搜索null列表中任何值的索引。如果我们没有找到,我们返回列表。如果找到一个,我们将列表分成 2 个子列表,对它们进行流式处理并split再次递归调用该方法。然后提取子问题的结果列表并将其组合为返回值。

Remark that the 2 streams can easily be made parallel()and the algorithm will still work because of the functional decomposition of the problem.

请注意,可以轻松地将 2 个流设置为 parallel()并且由于问题的功能分解,该算法仍然可以工作。

Although the code is already pretty concise, it can always be adapted in numerous ways. For the sake of an example, instead of checking the optional value in the base case, we could take advantage of the orElsemethod on the OptionalIntto return the end-index of the list, enabling us to re-use the second stream and additionally filter out empty lists:

尽管代码已经非常简洁,但它始终可以通过多种方式进行调整。举个例子,我们可以利用 上的orElse方法OptionalInt返回列表的结束索引,而不是检查基本情况下的可选值,使我们能够重用第二个流并额外过滤掉空列表:

public List<List<String>> split(List<String> input) {

    int index =  IntStream.range(0, input.size())
                          .filter(i -> input.get(i) == null)
                          .findAny().orElse(input.size());

    return asList(input.subList(0, index), input.subList(index+1, input.size())).stream()
                 .map(this::split)
                 .flatMap(List::stream)
                 .filter(list -> !list.isEmpty())
                 .collect(toList());
}

The example is only given to indicate the mere simplicity, adaptability and elegance of a recursive approach. Indeed, this version would introduce a small performance penalty and fail if the input was empty (and as such might need an extra empty check).

给出这个例子只是为了表明递归方法的简单性、适应性和优雅。事实上,如果输入为空(因此可能需要额外的空检查),这个版本会引入一个小的性能损失并失败。

In this case, recursion might probably not be the best solution (Stuart Marksalgorithm to find indexes is only O(N)and mapping/splitting lists has a significant cost), but it expresses the solution with a simple, intuitive parallelizable algorithm without any side effects.

在这种情况下,递归可能不是最好的解决方案(查找索引的Stuart Marks算法仅为O(N)并且映射/拆分列表的成本很高),但它使用简单、直观的可并行化算法表达了解决方案,没有任何副作用。

I won't digg deeper into the complexity and advantages/disadvantages or use cases with stop criteria and/or partial result availability. I just felt the need to share this solution strategy, since the other approaches were merely iterative or using an overly complex solution algorithm that was not parallelizable.

我不会深入研究具有停止标准和/或部分结果可用性的复杂性和优点/缺点或用例。我只是觉得有必要分享这个解决方案策略,因为其他方法只是迭代或使用不可并行化的过于复杂的解决方案算法。

回答by user_3380739

Here is code by AbacusUtil

这是AbacusUtil 的代码

List<String> list = N.asList(null, null, "a", "b", null, "c", null, null, "d", "e");
Stream.of(list).splitIntoList(null, (e, any) -> e == null, null).filter(e -> e.get(0) != null).forEach(N::println);

Declaration: I'm the developer of AbacusUtil.

声明: 我是 AbacusUtil 的开发者。