java 分组和减少对象列表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32700449/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Group and Reduce list of objects
提问by ryber
I have a list of objects with many duplicated and some fields that need to be merged. I want to reduce this down to a list of unique objects using only Java 8 Streams (I know how to do this via old-skool means but this is an experiment.)
我有一个包含许多重复的对象列表和一些需要合并的字段。我想将其简化为仅使用 Java 8 Streams 的唯一对象列表(我知道如何通过 old-skool 方法来做到这一点,但这是一个实验。)
This is what I have right now. I don't really like this because the map-building seems extraneous and the values() collection is a view of the backing map, and you need to wrap it in a new ArrayList<>(...)
to get a more specific collection. Is there a better approach, perhaps using the more general reduction operations?
这就是我现在所拥有的。我真的不喜欢这样,因为地图构建似乎无关紧要,而 values() 集合是支持地图的视图,您需要将其包装在一个新的中ArrayList<>(...)
以获得更具体的集合。有没有更好的方法,也许使用更通用的归约操作?
@Test
public void reduce() {
Collection<Foo> foos = Stream.of("foo", "bar", "baz")
.flatMap(this::getfoos)
.collect(Collectors.toMap(f -> f.name, f -> f, (l, r) -> {
l.ids.addAll(r.ids);
return l;
})).values();
assertEquals(3, foos.size());
foos.forEach(f -> assertEquals(10, f.ids.size()));
}
private Stream<Foo> getfoos(String n) {
return IntStream.range(0,10).mapToObj(i -> new Foo(n, i));
}
public static class Foo {
private String name;
private List<Integer> ids = new ArrayList<>();
public Foo(String n, int i) {
name = n;
ids.add(i);
}
}
采纳答案by Brian Kent
If you break the grouping and reducing steps up, you can get something cleaner:
如果你打破分组并减少步骤,你可以获得更清晰的东西:
Stream<Foo> input = Stream.of("foo", "bar", "baz").flatMap(this::getfoos);
Map<String, Optional<Foo>> collect = input.collect(Collectors.groupingBy(f -> f.name, Collectors.reducing(Foo::merge)));
Collection<Optional<Foo>> collected = collect.values();
This assumes a few convenience methods in your Foo
class:
这假设您的Foo
类中有一些方便的方法:
public Foo(String n, List<Integer> ids) {
this.name = n;
this.ids.addAll(ids);
}
public static Foo merge(Foo src, Foo dest) {
List<Integer> merged = new ArrayList<>();
merged.addAll(src.ids);
merged.addAll(dest.ids);
return new Foo(src.name, merged);
}
回答by Evan VanderZee
As already pointed out in the comments, a map is a very natural thing to use when you want to identify unique objects. If all you needed to do was find the unique objects, you could use the Stream::distinct
method. This method hides the fact that there is a map involved, but apparently it does use a map internally, as hinted by this questionthat shows you should implement a hashCode
method or distinct
may not behave correctly.
正如评论中已经指出的那样,当您想要识别独特的对象时,使用地图是很自然的事情。如果您需要做的只是找到唯一的对象,则可以使用该Stream::distinct
方法。此方法隐藏了涉及地图的事实,但显然它确实在内部使用了地图,正如此问题所暗示的那样,该问题表明您应该实现一个hashCode
方法或distinct
可能无法正确运行。
In the case of the distinct
method, where no merging is necessary, it is possible to return some of the results before all of the input has been processed. In your case, unless you can make additional assumptions about the input that haven't been mentioned in the question, you do need to finish processing all of the input before you return any results. Thus this answer does use a map.
在distinct
不需要合并的方法的情况下,可以在处理所有输入之前返回一些结果。在您的情况下,除非您可以对问题中未提及的输入做出其他假设,否则您确实需要在返回任何结果之前完成所有输入的处理。因此,这个答案确实使用了地图。
It is easy enough to use streams to process the values of the map and turn it back into an ArrayList, though. I show that in this answer, as well as providing a way to avoid the appearance of an Optional<Foo>
, which shows up in one of the other answers.
不过,使用流来处理映射的值并将其转换回 ArrayList 很容易。我在这个答案中展示了这一点,并提供了一种避免出现 的方法Optional<Foo>
,它出现在其他答案之一中。
public void reduce() {
ArrayList<Foo> foos = Stream.of("foo", "bar", "baz").flatMap(this::getfoos)
.collect(Collectors.collectingAndThen(Collectors.groupingBy(f -> f.name,
Collectors.reducing(Foo.identity(), Foo::merge)),
map -> map.values().stream().
collect(Collectors.toCollection(ArrayList::new))));
assertEquals(3, foos.size());
foos.forEach(f -> assertEquals(10, f.ids.size()));
}
private Stream<Foo> getfoos(String n) {
return IntStream.range(0, 10).mapToObj(i -> new Foo(n, i));
}
public static class Foo {
private String name;
private List<Integer> ids = new ArrayList<>();
private static final Foo BASE_FOO = new Foo("", 0);
public static Foo identity() {
return BASE_FOO;
}
// use only if side effects to the argument objects are okay
public static Foo merge(Foo fooOne, Foo fooTwo) {
if (fooOne == BASE_FOO) {
return fooTwo;
} else if (fooTwo == BASE_FOO) {
return fooOne;
}
fooOne.ids.addAll(fooTwo.ids);
return fooOne;
}
public Foo(String n, int i) {
name = n;
ids.add(i);
}
}
回答by Tagir Valeev
If the input elements are supplied in the random order, then having intermediate map is probably the best solution. However if you know in advance that all the foos with the same name are adjacent(this condition is actually met in your test), the algorithm can be greatly simplified: you just need to compare the current element with the previous one and merge them if the name is the same.
如果输入元素以随机顺序提供,那么使用中间映射可能是最好的解决方案。但是如果你事先知道所有同名的 foos 是相邻的(这个条件在你的测试中实际上是满足的),那么算法可以大大简化:你只需要将当前元素与前一个元素进行比较并合并它们,如果名字是一样的。
Unfortunately there's no Stream API method which would allow you do to such thing easily and effectively. One possible solution is to write custom collector like this:
不幸的是,没有 Stream API 方法可以让您轻松有效地执行此类操作。一种可能的解决方案是像这样编写自定义收集器:
public static List<Foo> withCollector(Stream<Foo> stream) {
return stream.collect(Collector.<Foo, List<Foo>>of(ArrayList::new,
(list, t) -> {
Foo f;
if(list.isEmpty() || !(f = list.get(list.size()-1)).name.equals(t.name))
list.add(t);
else
f.ids.addAll(t.ids);
},
(l1, l2) -> {
if(l1.isEmpty())
return l2;
if(l2.isEmpty())
return l1;
if(l1.get(l1.size()-1).name.equals(l2.get(0).name)) {
l1.get(l1.size()-1).ids.addAll(l2.get(0).ids);
l1.addAll(l2.subList(1, l2.size()));
} else {
l1.addAll(l2);
}
return l1;
}));
}
My tests show that this collector is always faster than collecting to map (up to 2x depending on average number of duplicate names), both in sequential and parallel mode.
我的测试表明,无论是在顺序模式还是并行模式下,此收集器始终比收集到映射要快(最多 2 倍,具体取决于重复名称的平均数量)。
Another approach is to use my StreamExlibrary which provides a bunch of "partial reduction" methods including collapse
:
另一种方法是使用我的StreamEx库,它提供了一堆“部分减少”方法,包括collapse
:
public static List<Foo> withStreamEx(Stream<Foo> stream) {
return StreamEx.of(stream)
.collapse((l, r) -> l.name.equals(r.name), (l, r) -> {
l.ids.addAll(r.ids);
return l;
}).toList();
}
This method accepts two arguments: a BiPredicate
which is applied for two adjacent elements and should return true if elements should be merged and the BinaryOperator
which performs merging. This solution is a little bit slower in sequential mode than the custom collector (in parallel the results are very similar), but it's still significantly faster than toMap
solution and it's simpler and somewhat more flexible as collapse
is an intermediate operation, so you can collect in another way.
此方法接受两个参数: aBiPredicate
应用于两个相邻元素,如果元素应该合并,则返回 true,而BinaryOperator
which 执行合并。这个解决方案在顺序模式下比自定义收集器慢一点(并行结果非常相似),但它仍然比toMap
解决方案快得多,而且它更简单,更灵活一些,因为它collapse
是一个中间操作,所以你可以在另一个收集器中收集大大地。
Again both these solutions work only if foos with the same name are known to be adjacent. It's a bad idea to sort the input stream by foo name, then using these solutions, because the sorting will drastically reduce the performance making it slower than toMap
solution.
同样,这两种解决方案仅在已知具有相同名称的 foos 相邻时才有效。按 foo 名称对输入流进行排序,然后使用这些解决方案是一个坏主意,因为排序会大大降低性能,使其比toMap
解决方案慢。
回答by Holger
As already pointed out by others, an intermediate Map
is unavoidable, as that's the way of finding the objects to merge. Further, you should not modify source data during reduction.
正如其他人已经指出的那样,中间Map
是不可避免的,因为这是找到要合并的对象的方式。此外,您不应在缩减过程中修改源数据。
Nevertheless, you can achieve both without creating multiple Foo
instances:
不过,您可以在不创建多个Foo
实例的情况下实现两者:
List<Foo> foos = Stream.of("foo", "bar", "baz")
.flatMap(n->IntStream.range(0,10).mapToObj(i -> new Foo(n, i)))
.collect(collectingAndThen(groupingBy(f -> f.name),
m->m.entrySet().stream().map(e->new Foo(e.getKey(),
e.getValue().stream().flatMap(f->f.ids.stream()).collect(toList())))
.collect(toList())));
This assumes that you add a constructor
这假设您添加了一个构造函数
public Foo(String n, List<Integer> l) {
name = n;
ids=l;
}
to your Foo
class, as it should have if Foo
is really supposed to be capable of holding a list of IDs. As a side note, having a type which serves as single item as well as a container for merged results seems unnatural to me. This is exactly why to code turns out to be so complicated.
到你的Foo
班级,因为它应该有,如果Foo
真的应该能够保存一个 ID 列表。作为旁注,将类型用作单个项目以及合并结果的容器对我来说似乎不自然。这正是编码变得如此复杂的原因。
If the source items had a single id
, using something like groupingBy(f -> f.name, mapping(f -> id, toList())
, followed by mapping the entries of (String, List<Integer>)
to the merged items was sufficient.
如果源项目有一个id
,使用类似groupingBy(f -> f.name, mapping(f -> id, toList())
,然后将 的条目映射(String, List<Integer>)
到合并项目就足够了。
Since this is not the case and Java?8 lacks the flatMapping
collector, the flatmapping step is moved to the second step, making it look much more complicated.
由于情况并非如此,Java?8 缺少flatMapping
收集器,因此平面映射步骤移至第二步,使其看起来复杂得多。
But in both cases, the second step is not obsolete as it is where the result items are actually created and converting the map to the desired list type comes for free.
但在这两种情况下,第二步并没有过时,因为它是实际创建结果项的地方,并且可以免费将地图转换为所需的列表类型。