Java groupingBy 后排序列表

Question

提问by ctst

I am wondering, if there is already an implemented feature in streams (or Collectors) which has sorted Lists as values. E.g. the following codes both produce gender-grouped Lists of persons, sorted by age. The first solution has some overhead sorting (and looks a little bit scruffy). The second solution needs to look at every person twice but does the job in a pretty way.

我想知道，流（或收集器）中是否已经实现了将列表作为值排序的功能。例如，以下代码都生成按年龄排序的按性别分组的人员列表。第一个解决方案有一些开销排序（看起来有点邋遢）。第二种解决方案需要对每个人看两次，但以一种漂亮的方式完成工作。

First sorting then grouping in one stream:

首先排序然后在一个流中分组：

Map<Gender, List<Person>> sortedListsByGender = (List<Person>) roster
        .stream()
        .sorted(Person::compareByAge)
        .collect(Collectors.groupingBy(Person::getGender));

First grouping, then sorting every value:

首先分组，然后对每个值进行排序：

Map<Gender, List<Person>> sortedListsByGender = (List<Person>) roster
        .stream()
        .collect(Collectors.groupingBy(Person::getGender));
sortedListsByGender.values()
        .forEach(list -> Collections.sort(list, Person::compareByAge));

I am just wondering, if there is already something implemented, which does this in one run, like groupingBySorted.

我只是想知道，是否已经实现了一些东西，它可以一次性完成，例如groupingBySorted.

Answer 1

采纳答案by Holger

When using sorted(comparator)on the stream before the collectoperation, the stream has to buffer the entire stream contents to be able to sort it and the sorting may involve much more data movement within that buffer, compared to sorting the smaller lists of the groups afterwards. So the performance is not as good as sorting the individual groups though the implementation will utilize multiple cores if parallel processing has been enabled.

在操作sorted(comparator)之前在流上collect使用时，流必须缓冲整个流内容才能对其进行排序，并且与之后对较小的组列表进行排序相比，排序可能涉及该缓冲区内更多的数据移动。因此，性能不如对单个组进行排序，尽管如果启用了并行处理，实现将使用多个内核。

But note that using sortedListsByGender.values().forEach(…)is not a parallelizable operation and even using sortedListsByGender.values().parallelStream().forEach(…)would only allow parallel processing of groups while each sort operation still is sequential.

但请注意， usingsortedListsByGender.values().forEach(…)不是可并行化的操作，甚至 usingsortedListsByGender.values().parallelStream().forEach(…)也仅允许并行处理组，而每个排序操作仍然是顺序的。

When performing the sort operation within a collector as in

在收集器中执行排序操作时

static <T> Collector<T,?,List<T>> toSortedList(Comparator<? super T> c) {
    return Collectors.collectingAndThen(
        Collectors.toCollection(ArrayList::new), l->{ l.sort(c); return l; } );
}

?

Map<Gender, List<Person>> sortedListsByGender = roster.stream()
    .collect(Collectors.groupingBy(Person::getGender, toSortedList(Person::compareByAge)));

the sort operation behaves the same (thanks to Tagir Valeev for correcting me), but you can easily check how a sort-on-insertion strategy performs. Just change the collector implementation to:

排序操作的行为相同（感谢 Tagir Valeev 纠正我），但您可以轻松检查插入时排序策略的执行情况。只需将收集器实现更改为：

static <T> Collector<T,?,List<T>> toSortedList(Comparator<? super T> c) {
    return Collectors.collectingAndThen(
        Collectors.toCollection(()->new TreeSet<>(c)), ArrayList::new);
}

For completeness, if you want a collector which inserts sorted into an ArrayListin the first place to avoid the final copy step, you can use a more elaborated collector like this:

为了完整起见，如果您想要一个首先将插入排序到 an 的收集器ArrayList以避免最后的复制步骤，您可以使用更详细的收集器，如下所示：

static <T> Collector<T,?,List<T>> toSortedList(Comparator<? super T> c) {
    return Collector.of(ArrayList::new,
        (l,t) -> {
            int ix=Collections.binarySearch(l, t, c);
            l.add(ix<0? ~ix: ix, t);
        },
        (list1,list2) -> {
            final int s1=list1.size();
            if(list1.isEmpty()) return list2;
            if(!list2.isEmpty()) {
                list1.addAll(list2);
                if(c.compare(list1.get(s1-1), list2.get(0))>0)
                    list1.sort(c);
            }
            return list1;
        });
}

It's efficient for the sequential usage but its merge function is not optimal. The underlying sort algorithm will benefit from presorted ranges but has to find these ranges first despite our merge function actually knows these ranges. Unfortunately, there's no public API in the JRE allowing us to utilize these information (efficiently; we can pass subLists to binarySearchbut creating a new sub list for each element of list2may turn out to be too expensive). If we want to raise the performance of the parallel execution further, we have to re-implement the merge part of the sorting algorithm:

它对于顺序使用是有效的，但它的合并功能不是最佳的。底层排序算法将受益于预先排序的范围，但必须首先找到这些范围，尽管我们的合并函数实际上知道这些范围。不幸的是，JRE 中没有公共 API 允许我们利用这些信息（有效地；我们可以将subLists传递给，binarySearch但为每个元素创建一个新的子列表list2可能会变得太昂贵）。如果我们想进一步提高并行执行的性能，我们必须重新实现排序算法的合并部分：

static <T> Collector<T,?,List<T>> toSortedList(Comparator<? super T> c) {
    return Collector.of(ArrayList::new,
        (l,t) -> l.add(insertPos(l, 0, l.size(), t, c), t),
        (list1,list2) -> merge(list1, list2, c));
}
static <T> List<T> merge(List<T> list1, List<T> list2, Comparator<? super T> c) {
    if(list1.isEmpty()) return list2;
    for(int ix1=0, ix2=0, num1=list1.size(), num2=list2.size(); ix2<num2; ix2++, num1++) {
        final T element = list2.get(ix2);
        ix1=insertPos(list1, ix1, num1, element, c);
        list1.add(ix1, element);
        if(ix1==num1) {
            while(++ix2<num2) list1.add(list2.get(ix2));
            return list1;
        }
    }
    return list1;
}
static <T> int insertPos(
    List<? extends T> list, int low, int high, T t, Comparator<? super T> c) {
    high--;
    while(low <= high) {
        int mid = (low+high)>>>1, cmp = c.compare(list.get(mid), t);
        if(cmp < 0) low = mid + 1;
        else if(cmp > 0) high = mid - 1;
        else {
            mid++;
            while(mid<=high && c.compare(list.get(mid), t)==0) mid++;
            return mid;
        }
    }
    return low;
}

Note that this last solution, unlike the simple binarySearchbased insertion, is a stable sort implementation, i.e. in your case, Persons with the same age and Genderwon't change their relative order, if the source stream has a defined encounter order.

请注意，与binarySearch基于简单的插入不同，这最后一个解决方案是一种稳定的排序实现，即在您的情况下，如果源流具有定义的相遇顺序，则Persons 具有相同的年龄并且Gender不会改变它们的相对顺序。

Java groupingBy 后排序列表

提问by ctst

采纳答案by Holger

相关推荐

最近更新

标签

Java groupingBy 后排序列表

提问by ctst

采纳答案by Holger

相关推荐

Java JAAS 人类

Java FreeMarker 模板错误：以下已评估为空或缺失 | 但不是真的

Java 在处理发生时动态刷新 JTextArea？

Java Spring框架有多少种配置方式？它们在技术上有什么区别？（没有优点或缺点..）

相关推荐

最近更新

标签