java 分组和减少对象列表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32700449/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 20:38:11  来源:igfitidea点击:

Group and Reduce list of objects

javajava-8

提问by ryber

I have a list of objects with many duplicated and some fields that need to be merged. I want to reduce this down to a list of unique objects using only Java 8 Streams (I know how to do this via old-skool means but this is an experiment.)

我有一个包含许多重复的对象列表和一些需要合并的字段。我想将其简化为仅使用 Java 8 Streams 的唯一对象列表(我知道如何通过 old-skool 方法来做到这一点,但这是一个实验。)

This is what I have right now. I don't really like this because the map-building seems extraneous and the values() collection is a view of the backing map, and you need to wrap it in a new ArrayList<>(...)to get a more specific collection. Is there a better approach, perhaps using the more general reduction operations?

这就是我现在所拥有的。我真的不喜欢这样,因为地图构建似乎无关紧要,而 values() 集合是支持地图的视图,您需要将其包装在一个新的中ArrayList<>(...)以获得更具体的集合。有没有更好的方法,也许使用更通用的归约操作?

    @Test
public void reduce() {
    Collection<Foo> foos = Stream.of("foo", "bar", "baz")
                     .flatMap(this::getfoos)
                     .collect(Collectors.toMap(f -> f.name, f -> f, (l, r) -> {
                         l.ids.addAll(r.ids);
                         return l;
                     })).values();

    assertEquals(3, foos.size());
    foos.forEach(f -> assertEquals(10, f.ids.size()));
}

private Stream<Foo> getfoos(String n) {
    return IntStream.range(0,10).mapToObj(i -> new Foo(n, i));
}

public static class Foo {
    private String name;
    private List<Integer> ids = new ArrayList<>();

    public Foo(String n, int i) {
        name = n;
        ids.add(i);
    }
}

采纳答案by Brian Kent

If you break the grouping and reducing steps up, you can get something cleaner:

如果你打破分组并减少步骤,你可以获得更清晰的东西:

Stream<Foo> input = Stream.of("foo", "bar", "baz").flatMap(this::getfoos);

Map<String, Optional<Foo>> collect = input.collect(Collectors.groupingBy(f -> f.name, Collectors.reducing(Foo::merge)));

Collection<Optional<Foo>> collected = collect.values();

This assumes a few convenience methods in your Fooclass:

这假设您的Foo类中有一些方便的方法:

public Foo(String n, List<Integer> ids) {
    this.name = n;
    this.ids.addAll(ids);
}

public static Foo merge(Foo src, Foo dest) {
    List<Integer> merged = new ArrayList<>();
    merged.addAll(src.ids);
    merged.addAll(dest.ids);
    return new Foo(src.name, merged);
}

回答by Evan VanderZee

As already pointed out in the comments, a map is a very natural thing to use when you want to identify unique objects. If all you needed to do was find the unique objects, you could use the Stream::distinctmethod. This method hides the fact that there is a map involved, but apparently it does use a map internally, as hinted by this questionthat shows you should implement a hashCodemethod or distinctmay not behave correctly.

正如评论中已经指出的那样,当您想要识别独特的对象时,使用地图是很自然的事情。如果您需要做的只是找到唯一的对象,则可以使用该Stream::distinct方法。此方法隐藏了涉及地图的事实,但显然它确实在内部使用了地图,正如此问题所暗示的那样,该问题表明您应该实现一个hashCode方法或distinct可能无法正确运行。

In the case of the distinctmethod, where no merging is necessary, it is possible to return some of the results before all of the input has been processed. In your case, unless you can make additional assumptions about the input that haven't been mentioned in the question, you do need to finish processing all of the input before you return any results. Thus this answer does use a map.

distinct不需要合并的方法的情况下,可以在处理所有输入之前返回一些结果。在您的情况下,除非您可以对问题中未提及的输入做出其他假设,否则您确实需要在返回任何结果之前完成所有输入的处理。因此,这个答案确实使用了地图。

It is easy enough to use streams to process the values of the map and turn it back into an ArrayList, though. I show that in this answer, as well as providing a way to avoid the appearance of an Optional<Foo>, which shows up in one of the other answers.

不过,使用流来处理映射的值并将其转换回 ArrayList 很容易。我在这个答案中展示了这一点,并提供了一种避免出现 的方法Optional<Foo>,它出现在其他答案之一中。

public void reduce() {
    ArrayList<Foo> foos = Stream.of("foo", "bar", "baz").flatMap(this::getfoos)
            .collect(Collectors.collectingAndThen(Collectors.groupingBy(f -> f.name,
            Collectors.reducing(Foo.identity(), Foo::merge)),
            map -> map.values().stream().
                collect(Collectors.toCollection(ArrayList::new))));

    assertEquals(3, foos.size());
    foos.forEach(f -> assertEquals(10, f.ids.size()));
}

private Stream<Foo> getfoos(String n) {
    return IntStream.range(0, 10).mapToObj(i -> new Foo(n, i));
}

public static class Foo {
    private String name;
    private List<Integer> ids = new ArrayList<>();

    private static final Foo BASE_FOO = new Foo("", 0);

    public static Foo identity() {
        return BASE_FOO;
    }

    // use only if side effects to the argument objects are okay
    public static Foo merge(Foo fooOne, Foo fooTwo) {
        if (fooOne == BASE_FOO) {
            return fooTwo;
        } else if (fooTwo == BASE_FOO) {
            return fooOne;
        }
        fooOne.ids.addAll(fooTwo.ids);
        return fooOne;
    }

    public Foo(String n, int i) {
        name = n;
        ids.add(i);
    }
}

回答by Tagir Valeev

If the input elements are supplied in the random order, then having intermediate map is probably the best solution. However if you know in advance that all the foos with the same name are adjacent(this condition is actually met in your test), the algorithm can be greatly simplified: you just need to compare the current element with the previous one and merge them if the name is the same.

如果输入元素以随机顺序提供,那么使用中间映射可能是最好的解决方案。但是如果你事先知道所有同名的 foos 是相邻的(这个条件在你的测试中实际上是满足的),那么算法可以大大简化:你只需要将当前元素与前一个元素进行比较并合并它们,如果名字是一样的。

Unfortunately there's no Stream API method which would allow you do to such thing easily and effectively. One possible solution is to write custom collector like this:

不幸的是,没有 Stream API 方法可以让您轻松有效地执行此类操作。一种可能的解决方案是像这样编写自定义收集器:

public static List<Foo> withCollector(Stream<Foo> stream) {
    return stream.collect(Collector.<Foo, List<Foo>>of(ArrayList::new,
             (list, t) -> {
                 Foo f;
                 if(list.isEmpty() || !(f = list.get(list.size()-1)).name.equals(t.name))
                     list.add(t);
                 else
                     f.ids.addAll(t.ids);
             },
             (l1, l2) -> {
                 if(l1.isEmpty())
                     return l2;
                 if(l2.isEmpty())
                     return l1;
                 if(l1.get(l1.size()-1).name.equals(l2.get(0).name)) {
                     l1.get(l1.size()-1).ids.addAll(l2.get(0).ids);
                     l1.addAll(l2.subList(1, l2.size()));
                 } else {
                     l1.addAll(l2);
                 }
                 return l1;
             }));
}

My tests show that this collector is always faster than collecting to map (up to 2x depending on average number of duplicate names), both in sequential and parallel mode.

我的测试表明,无论是在顺序模式还是并行模式下,此收集器始终比收集到映射要快(最多 2 倍,具体取决于重复名称的平均数量)。

Another approach is to use my StreamExlibrary which provides a bunch of "partial reduction" methods including collapse:

另一种方法是使用我的StreamEx库,它提供了一堆“部分减少”方法,包括collapse

public static List<Foo> withStreamEx(Stream<Foo> stream) {
    return StreamEx.of(stream)
            .collapse((l, r) -> l.name.equals(r.name), (l, r) -> {
                l.ids.addAll(r.ids);
                return l;
            }).toList();
}

This method accepts two arguments: a BiPredicatewhich is applied for two adjacent elements and should return true if elements should be merged and the BinaryOperatorwhich performs merging. This solution is a little bit slower in sequential mode than the custom collector (in parallel the results are very similar), but it's still significantly faster than toMapsolution and it's simpler and somewhat more flexible as collapseis an intermediate operation, so you can collect in another way.

此方法接受两个参数: aBiPredicate应用于两个相邻元素,如果元素应该合并,则返回 true,而BinaryOperatorwhich 执行合并。这个解决方案在顺序模式下比自定义收集器慢一点(并行结果非常相似),但它仍然比toMap解决方案快得多,而且它更简单,更灵活一些,因为它collapse是一个中间操作,所以你可以在另一个收集器中收集大大地。

Again both these solutions work only if foos with the same name are known to be adjacent. It's a bad idea to sort the input stream by foo name, then using these solutions, because the sorting will drastically reduce the performance making it slower than toMapsolution.

同样,这两种解决方案仅在已知具有相同名称的 foos 相邻时才有效。按 foo 名称对输入流进行排序,然后使用这些解决方案是一个坏主意,因为排序会大大降低性能,使其比toMap解决方案慢。

回答by Holger

As already pointed out by others, an intermediate Mapis unavoidable, as that's the way of finding the objects to merge. Further, you should not modify source data during reduction.

正如其他人已经指出的那样,中间Map是不可避免的,因为这是找到要合并的对象的方式。此外,您不应在缩减过程中修改源数据。

Nevertheless, you can achieve both without creating multiple Fooinstances:

不过,您可以在不创建多个Foo实例的情况下实现两者:

List<Foo> foos = Stream.of("foo", "bar", "baz")
                 .flatMap(n->IntStream.range(0,10).mapToObj(i -> new Foo(n, i)))

                 .collect(collectingAndThen(groupingBy(f -> f.name),
                    m->m.entrySet().stream().map(e->new Foo(e.getKey(),
                       e.getValue().stream().flatMap(f->f.ids.stream()).collect(toList())))
                    .collect(toList())));

This assumes that you add a constructor

这假设您添加了一个构造函数

    public Foo(String n, List<Integer> l) {
        name = n;
        ids=l;
    }

to your Fooclass, as it should have if Foois really supposed to be capable of holding a list of IDs. As a side note, having a type which serves as single item as well as a container for merged results seems unnatural to me. This is exactly why to code turns out to be so complicated.

到你的Foo班级,因为它应该有,如果Foo真的应该能够保存一个 ID 列表。作为旁注,将类型用作单个项目以及合并结果的容器对我来说似乎不自然。这正是编码变得如此复杂的原因。

If the source items had a single id, using something like groupingBy(f -> f.name, mapping(f -> id, toList()), followed by mapping the entries of (String, List<Integer>)to the merged items was sufficient.

如果源项目有一个id,使用类似groupingBy(f -> f.name, mapping(f -> id, toList()),然后将 的条目映射(String, List<Integer>)到合并项目就足够了。

Since this is not the case and Java?8 lacks the flatMappingcollector, the flatmapping step is moved to the second step, making it look much more complicated.

由于情况并非如此,Java?8 缺少flatMapping收集器,因此平面映射步骤移至第二步,使其看起来复杂得多。

But in both cases, the second step is not obsolete as it is where the result items are actually created and converting the map to the desired list type comes for free.

但在这两种情况下,第二步并没有过时,因为它是实际创建结果项的地方,并且可以免费将地图转换为所需的列表类型。