任意键上的 Java Lambda Stream Distinct()?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27870136/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java Lambda Stream Distinct() on arbitrary key?
提问by tmn
I frequently ran into a problem with Java lambda expressions where when I wanted to distinct() a stream on an arbitrary property or method of an object, but wanted to keep the object rather than map it to that property or method. I started to create containers as discussed herebut I started to do it enough to where it became annoying and made a lot of boilerplate classes.
我经常遇到 Java lambda 表达式的问题,当我想在对象的任意属性或方法上进行 distinct() 流,但想保留该对象而不是将其映射到该属性或方法时。我开始像这里讨论的那样创建容器,但我开始做得足够多,以至于它变得烦人并制作了很多样板类。
I threw together this Pairing class, which holds two objects of two types and allows you to specify keying off the left, right, or both objects. My question is... is there really no built-in lambda stream function to distinct() on a key supplier of some sorts? That would really surprise me. If not, will this class fulfill that function reliably?
我把这个 Pairing 类放在一起,它包含两种类型的两个对象,并允许您指定左侧、右侧或两个对象的抠像。我的问题是......在某种关键供应商上真的没有内置的 lambda 流函数来执行 distinct() 吗?那真的会让我感到惊讶。如果没有,这个类会可靠地完成那个功能吗?
Here is how it would be called
这是它的名字
BigDecimal totalShare = orders.stream().map(c -> Pairing.keyLeft(c.getCompany().getId(), c.getShare())).distinct().map(Pairing::getRightItem).reduce(BigDecimal.ZERO, (x,y) -> x.add(y));
Here is the Pairing class
这是配对类
public final class Pairing<X,Y> {
private final X item1;
private final Y item2;
private final KeySetup keySetup;
private static enum KeySetup {LEFT,RIGHT,BOTH};
private Pairing(X item1, Y item2, KeySetup keySetup) {
this.item1 = item1;
this.item2 = item2;
this.keySetup = keySetup;
}
public X getLeftItem() {
return item1;
}
public Y getRightItem() {
return item2;
}
public static <X,Y> Pairing<X,Y> keyLeft(X item1, Y item2) {
return new Pairing<X,Y>(item1, item2, KeySetup.LEFT);
}
public static <X,Y> Pairing<X,Y> keyRight(X item1, Y item2) {
return new Pairing<X,Y>(item1, item2, KeySetup.RIGHT);
}
public static <X,Y> Pairing<X,Y> keyBoth(X item1, Y item2) {
return new Pairing<X,Y>(item1, item2, KeySetup.BOTH);
}
public static <X,Y> Pairing<X,Y> forItems(X item1, Y item2) {
return keyBoth(item1, item2);
}
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
if (keySetup.equals(KeySetup.LEFT) || keySetup.equals(KeySetup.BOTH)) {
result = prime * result + ((item1 == null) ? 0 : item1.hashCode());
}
if (keySetup.equals(KeySetup.RIGHT) || keySetup.equals(KeySetup.BOTH)) {
result = prime * result + ((item2 == null) ? 0 : item2.hashCode());
}
return result;
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Pairing<?,?> other = (Pairing<?,?>) obj;
if (keySetup.equals(KeySetup.LEFT) || keySetup.equals(KeySetup.BOTH)) {
if (item1 == null) {
if (other.item1 != null)
return false;
} else if (!item1.equals(other.item1))
return false;
}
if (keySetup.equals(KeySetup.RIGHT) || keySetup.equals(KeySetup.BOTH)) {
if (item2 == null) {
if (other.item2 != null)
return false;
} else if (!item2.equals(other.item2))
return false;
}
return true;
}
}
UPDATE:
更新:
Tested Stuart's function below and it seems to work great. The operation below distincts on the first letter of each string. The only part I'm trying to figure out is how the ConcurrentHashMap maintains only one instance for the entire stream
在下面测试了 Stuart 的功能,它似乎工作得很好。下面的操作区分每个字符串的第一个字母。我想弄清楚的唯一部分是 ConcurrentHashMap 如何为整个流只维护一个实例
public class DistinctByKey {
public static <T> Predicate<T> distinctByKey(Function<? super T,Object> keyExtractor) {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
public static void main(String[] args) {
final ImmutableList<String> arpts = ImmutableList.of("ABQ","ALB","CHI","CUN","PHX","PUJ","BWI");
arpts.stream().filter(distinctByKey(f -> f.substring(0,1))).forEach(s -> System.out.println(s));
}
Output is...
输出是...
ABQ
CHI
PHX
BWI
采纳答案by Stuart Marks
The distinct
operation is a statefulpipeline operation; in this case it's a stateful filter. It's a bit inconvenient to create these yourself, as there's nothing built-in, but a small helper class should do the trick:
该distinct
操作是有状态的管道操作;在这种情况下,它是一个有状态的过滤器。自己创建这些有点不方便,因为没有内置的东西,但是一个小的帮助类应该可以解决问题:
/**
* Stateful filter. T is type of stream element, K is type of extracted key.
*/
static class DistinctByKey<T,K> {
Map<K,Boolean> seen = new ConcurrentHashMap<>();
Function<T,K> keyExtractor;
public DistinctByKey(Function<T,K> ke) {
this.keyExtractor = ke;
}
public boolean filter(T t) {
return seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
}
I don't know your domain classes, but I think that, with this helper class, you could do what you want like this:
我不知道你的领域类,但我认为,有了这个辅助类,你可以像这样做你想做的事:
BigDecimal totalShare = orders.stream()
.filter(new DistinctByKey<Order,CompanyId>(o -> o.getCompany().getId())::filter)
.map(Order::getShare)
.reduce(BigDecimal.ZERO, BigDecimal::add);
Unfortunately the type inference couldn't get far enough inside the expression, so I had to specify explicitly the type arguments for the DistinctByKey
class.
不幸的是,表达式中的类型推断还不够深入,因此我必须明确指定DistinctByKey
类的类型参数。
This involves more setup than the collectors approach described by Louis Wasserman, but this has the advantage that distinct items pass through immediately instead of being buffered up until the collection completes. Space should be the same, as (unavoidably) both approaches end up accumulating all distinct keys extracted from the stream elements.
与Louis Wasserman 描述的收集器方法相比,这涉及更多设置,但这样做的优点是不同的项目会立即通过,而不是在收集完成之前进行缓冲。空间应该是相同的,因为(不可避免地)两种方法最终都会累积从流元素中提取的所有不同的键。
UPDATE
更新
It's possible to get rid of the K
type parameter since it's not actually used for anything other than being stored in a map. So Object
is sufficient.
可以去掉K
type 参数,因为除了存储在地图中之外,它实际上并没有用于任何其他用途。所以Object
就足够了。
/**
* Stateful filter. T is type of stream element.
*/
static class DistinctByKey<T> {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
Function<T,Object> keyExtractor;
public DistinctByKey(Function<T,Object> ke) {
this.keyExtractor = ke;
}
public boolean filter(T t) {
return seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
}
BigDecimal totalShare = orders.stream()
.filter(new DistinctByKey<Order>(o -> o.getCompany().getId())::filter)
.map(Order::getShare)
.reduce(BigDecimal.ZERO, BigDecimal::add);
This simplifies things a bit, but I still had to specify the type argument to the constructor. Trying to use diamond or a static factory method doesn't seem to improve things. I think the difficulty is that the compiler can't infer generic type parameters -- for a constructor or a static method call -- when either is in the instance expression of a method reference. Oh well.
这稍微简化了一些事情,但我仍然必须为构造函数指定类型参数。尝试使用钻石或静态工厂方法似乎并没有改善情况。我认为困难在于编译器无法推断泛型类型参数——对于构造函数或静态方法调用——当任何一个在方法引用的实例表达式中时。那好吧。
(Another variation on this that would probably simplify it is to make DistinctByKey<T> implements Predicate<T>
and rename the method to eval
. This would remove the need to use a method reference and would probably improve type inference. However, it's unlikely to be as nice as the solution below.)
(另一个可能会简化它的变体是创建DistinctByKey<T> implements Predicate<T>
方法并将其重命名为eval
。这将消除使用方法引用的需要,并且可能会改进类型推断。但是,它不太可能像下面的解决方案一样好。)
UPDATE 2
更新 2
Can't stop thinking about this. Instead of a helper class, use a higher-order function. We can use captured locals to maintain state, so we don't even need a separate class! Bonus, things are simplified so type inference works!
不能停止思考这个。使用高阶函数代替辅助类。我们可以使用捕获的局部变量来维护状态,因此我们甚至不需要单独的类!奖励,事情被简化了,所以类型推断是有效的!
public static <T> Predicate<T> distinctByKey(Function<? super T,Object> keyExtractor) {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
BigDecimal totalShare = orders.stream()
.filter(distinctByKey(o -> o.getCompany().getId()))
.map(Order::getShare)
.reduce(BigDecimal.ZERO, BigDecimal::add);
回答by Louis Wasserman
You more or less have to do something like
你或多或少必须做类似的事情
elements.stream()
.collect(Collectors.toMap(
obj -> extractKey(obj),
obj -> obj,
(first, second) -> first
// pick the first if multiple values have the same key
)).values().stream();
回答by frhack
We can also use RxJava(very powerful reactive extensionlibrary)
Observable.from(persons).distinct(Person::getName)
or
或者
Observable.from(persons).distinct(p -> p.getName())
回答by Jamish
To answer your question in your second update:
要在第二次更新中回答您的问题:
The only part I'm trying to figure out is how the ConcurrentHashMap maintains only one instance for the entire stream:
我想弄清楚的唯一部分是 ConcurrentHashMap 如何为整个流只维护一个实例:
public static <T> Predicate<T> distinctByKey(Function<? super T,Object> keyExtractor) {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
In your code sample, distinctByKey
is only invoked one time, so the ConcurrentHashMap created just once. Here's an explanation:
在您的代码示例中,distinctByKey
仅被调用一次,因此 ConcurrentHashMap 只创建了一次。这是一个解释:
The distinctByKey
function is just a plain-old function that returns an object, and that object happens to be a Predicate. Keep in mind that a predicate is basically a piece of code that can be evaluated later. To manually evaluate a predicate, you must call a method in the Predicate interfacesuch as test
. So, the predicate
该distinctByKey
函数只是一个返回对象的普通函数,而该对象恰好是一个 Predicate。请记住,谓词基本上是一段可以稍后评估的代码。要手动评估谓词,您必须调用Predicate 接口中的方法,例如test
. 所以,谓词
t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null
is merely a declaration that is not actually evaluated inside distinctByKey
.
只是一个声明,实际上并未在 内部进行评估distinctByKey
。
The predicate is passed around just like any other object. It is returned and passed into the filter
operation, which basically evaluates the predicate repeatedly against each element of the stream by calling test
.
谓词就像任何其他对象一样传递。它被返回并传递到filter
操作中,该操作基本上通过调用 对流的每个元素重复评估谓词test
。
I'm sure filter
is more complicated than I made it out to be, but the point is, the predicate is evaluated many times outside of distinctByKey
. There's nothing special* about distinctByKey
; it's just a function that you've called one time, so the ConcurrentHashMap is only created one time.
我敢肯定filter
比我想象的要复杂,但关键是,谓词在distinctByKey
. 没有什么特别*关于distinctByKey
; 它只是您调用过一次的函数,因此 ConcurrentHashMap 只创建一次。
*Apart from being well made, @stuart-marks :)
*除了制作精良,@stuart-marks :)
回答by rognlien
A variation on Stuart Marks second update. Using a Set.
Stuart Marks 第二次更新的变体。使用集合。
public static <T> Predicate<T> distinctByKey(Function<? super T, Object> keyExtractor) {
Set<Object> seen = Collections.newSetFromMap(new ConcurrentHashMap<>());
return t -> seen.add(keyExtractor.apply(t));
}
回答by Craig P. Motlin
You can use the distinct(HashingStrategy)
method in Eclipse Collections.
您可以distinct(HashingStrategy)
在Eclipse Collections 中使用该方法。
List<String> list = Lists.mutable.with("ABQ", "ALB", "CHI", "CUN", "PHX", "PUJ", "BWI");
ListIterate.distinct(list, HashingStrategies.fromFunction(s -> s.substring(0, 1)))
.each(System.out::println);
If you can refactor list
to implement an Eclipse Collections interface, you can call the method directly on the list.
如果可以重构list
实现一个Eclipse Collections接口,就可以直接调用列表上的方法。
MutableList<String> list = Lists.mutable.with("ABQ", "ALB", "CHI", "CUN", "PHX", "PUJ", "BWI");
list.distinct(HashingStrategies.fromFunction(s -> s.substring(0, 1)))
.each(System.out::println);
HashingStrategyis simply a strategy interface that allows you to define custom implementations of equals and hashcode.
HashingStrategy只是一个策略接口,允许您定义 equals 和 hashcode 的自定义实现。
public interface HashingStrategy<E>
{
int computeHashCode(E object);
boolean equals(E object1, E object2);
}
Note: I am a committer for Eclipse Collections.
注意:我是 Eclipse Collections 的提交者。
回答by Fahad
It can be done something like
它可以做类似的事情
Set<String> distinctCompany = orders.stream()
.map(Order::getCompany)
.collect(Collectors.toSet());
回答by saka1029
Set.add(element)
returns true if the set did not already contain element
, otherwise false.
So you can do like this.
Set.add(element)
如果该集合尚未包含element
,则返回 true ,否则返回 false。所以你可以这样做。
Set<String> set = new HashSet<>();
BigDecimal totalShare = orders.stream()
.filter(c -> set.add(c.getCompany().getId()))
.map(c -> c.getShare())
.reduce(BigDecimal.ZERO, BigDecimal::add);
If you want to do this parallel, you must use concurrent map.
如果要并行执行此操作,则必须使用并发映射。
回答by Arshed
Another way of finding distinct elements
寻找不同元素的另一种方法
List<String> uniqueObjects = ImmutableList.of("ABQ","ALB","CHI","CUN","PHX","PUJ","BWI")
.stream()
.collect(Collectors.groupingBy((p)->p.substring(0,1))) //expression
.values()
.stream()
.flatMap(e->e.stream().limit(1))
.collect(Collectors.toList());