任意键上的 Java Lambda Stream Distinct()？

Question

提问by tmn

I frequently ran into a problem with Java lambda expressions where when I wanted to distinct() a stream on an arbitrary property or method of an object, but wanted to keep the object rather than map it to that property or method. I started to create containers as discussed herebut I started to do it enough to where it became annoying and made a lot of boilerplate classes.

我经常遇到 Java lambda 表达式的问题，当我想在对象的任意属性或方法上进行 distinct() 流，但想保留该对象而不是将其映射到该属性或方法时。我开始像这里讨论的那样创建容器，但我开始做得足够多，以至于它变得烦人并制作了很多样板类。

I threw together this Pairing class, which holds two objects of two types and allows you to specify keying off the left, right, or both objects. My question is... is there really no built-in lambda stream function to distinct() on a key supplier of some sorts? That would really surprise me. If not, will this class fulfill that function reliably?

我把这个 Pairing 类放在一起，它包含两种类型的两个对象，并允许您指定左侧、右侧或两个对象的抠像。我的问题是......在某种关键供应商上真的没有内置的 lambda 流函数来执行 distinct() 吗？那真的会让我感到惊讶。如果没有，这个类会可靠地完成那个功能吗？

Here is how it would be called

这是它的名字

BigDecimal totalShare = orders.stream().map(c -> Pairing.keyLeft(c.getCompany().getId(), c.getShare())).distinct().map(Pairing::getRightItem).reduce(BigDecimal.ZERO, (x,y) -> x.add(y));

Here is the Pairing class

这是配对类

    public final class Pairing<X,Y>  {
           private final X item1;
           private final Y item2;
           private final KeySetup keySetup;

           private static enum KeySetup {LEFT,RIGHT,BOTH};

           private Pairing(X item1, Y item2, KeySetup keySetup) {
                  this.item1 = item1;
                  this.item2 = item2;
                  this.keySetup = keySetup;
           }
           public X getLeftItem() { 
                  return item1;
           }
           public Y getRightItem() { 
                  return item2;
           }

           public static <X,Y> Pairing<X,Y> keyLeft(X item1, Y item2) { 
                  return new Pairing<X,Y>(item1, item2, KeySetup.LEFT);
           }

           public static <X,Y> Pairing<X,Y> keyRight(X item1, Y item2) { 
                  return new Pairing<X,Y>(item1, item2, KeySetup.RIGHT);
           }
           public static <X,Y> Pairing<X,Y> keyBoth(X item1, Y item2) { 
                  return new Pairing<X,Y>(item1, item2, KeySetup.BOTH);
           }
           public static <X,Y> Pairing<X,Y> forItems(X item1, Y item2) { 
                  return keyBoth(item1, item2);
           }

           @Override
           public int hashCode() {
                  final int prime = 31;
                  int result = 1;
                  if (keySetup.equals(KeySetup.LEFT) || keySetup.equals(KeySetup.BOTH)) {
                  result = prime * result + ((item1 == null) ? 0 : item1.hashCode());
                  }
                  if (keySetup.equals(KeySetup.RIGHT) || keySetup.equals(KeySetup.BOTH)) {
                  result = prime * result + ((item2 == null) ? 0 : item2.hashCode());
                  }
                  return result;
           }

           @Override
           public boolean equals(Object obj) {
                  if (this == obj)
                         return true;
                  if (obj == null)
                         return false;
                  if (getClass() != obj.getClass())
                         return false;
                  Pairing<?,?> other = (Pairing<?,?>) obj;
                  if (keySetup.equals(KeySetup.LEFT) || keySetup.equals(KeySetup.BOTH)) {
                         if (item1 == null) {
                               if (other.item1 != null)
                                      return false;
                         } else if (!item1.equals(other.item1))
                               return false;
                  }
                  if (keySetup.equals(KeySetup.RIGHT) || keySetup.equals(KeySetup.BOTH)) {
                         if (item2 == null) {
                               if (other.item2 != null)
                                      return false;
                         } else if (!item2.equals(other.item2))
                               return false;
                  }
                  return true;
           }

    }

UPDATE:

更新：

Tested Stuart's function below and it seems to work great. The operation below distincts on the first letter of each string. The only part I'm trying to figure out is how the ConcurrentHashMap maintains only one instance for the entire stream

在下面测试了 Stuart 的功能，它似乎工作得很好。下面的操作区分每个字符串的第一个字母。我想弄清楚的唯一部分是 ConcurrentHashMap 如何为整个流只维护一个实例

public class DistinctByKey {

    public static <T> Predicate<T> distinctByKey(Function<? super T,Object> keyExtractor) {
        Map<Object,Boolean> seen = new ConcurrentHashMap<>();
        return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
    }

    public static void main(String[] args) { 

        final ImmutableList<String> arpts = ImmutableList.of("ABQ","ALB","CHI","CUN","PHX","PUJ","BWI");

        arpts.stream().filter(distinctByKey(f -> f.substring(0,1))).forEach(s -> System.out.println(s));
    }

Output is...

输出是...

ABQ
CHI
PHX
BWI

Answer 1

采纳答案by Stuart Marks

The distinctoperation is a statefulpipeline operation; in this case it's a stateful filter. It's a bit inconvenient to create these yourself, as there's nothing built-in, but a small helper class should do the trick:

该distinct操作是有状态的管道操作；在这种情况下，它是一个有状态的过滤器。自己创建这些有点不方便，因为没有内置的东西，但是一个小的帮助类应该可以解决问题：

/**
 * Stateful filter. T is type of stream element, K is type of extracted key.
 */
static class DistinctByKey<T,K> {
    Map<K,Boolean> seen = new ConcurrentHashMap<>();
    Function<T,K> keyExtractor;
    public DistinctByKey(Function<T,K> ke) {
        this.keyExtractor = ke;
    }
    public boolean filter(T t) {
        return seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
    }
}

I don't know your domain classes, but I think that, with this helper class, you could do what you want like this:

我不知道你的领域类，但我认为，有了这个辅助类，你可以像这样做你想做的事：

BigDecimal totalShare = orders.stream()
    .filter(new DistinctByKey<Order,CompanyId>(o -> o.getCompany().getId())::filter)
    .map(Order::getShare)
    .reduce(BigDecimal.ZERO, BigDecimal::add);

Unfortunately the type inference couldn't get far enough inside the expression, so I had to specify explicitly the type arguments for the DistinctByKeyclass.

不幸的是，表达式中的类型推断还不够深入，因此我必须明确指定DistinctByKey类的类型参数。

This involves more setup than the collectors approach described by Louis Wasserman, but this has the advantage that distinct items pass through immediately instead of being buffered up until the collection completes. Space should be the same, as (unavoidably) both approaches end up accumulating all distinct keys extracted from the stream elements.

与Louis Wasserman 描述的收集器方法相比，这涉及更多设置，但这样做的优点是不同的项目会立即通过，而不是在收集完成之前进行缓冲。空间应该是相同的，因为（不可避免地）两种方法最终都会累积从流元素中提取的所有不同的键。

UPDATE

更新

It's possible to get rid of the Ktype parameter since it's not actually used for anything other than being stored in a map. So Objectis sufficient.

可以去掉Ktype 参数，因为除了存储在地图中之外，它实际上并没有用于任何其他用途。所以Object就足够了。

/**
 * Stateful filter. T is type of stream element.
 */
static class DistinctByKey<T> {
    Map<Object,Boolean> seen = new ConcurrentHashMap<>();
    Function<T,Object> keyExtractor;
    public DistinctByKey(Function<T,Object> ke) {
        this.keyExtractor = ke;
    }
    public boolean filter(T t) {
        return seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
    }
}

BigDecimal totalShare = orders.stream()
    .filter(new DistinctByKey<Order>(o -> o.getCompany().getId())::filter)
    .map(Order::getShare)
    .reduce(BigDecimal.ZERO, BigDecimal::add);

This simplifies things a bit, but I still had to specify the type argument to the constructor. Trying to use diamond or a static factory method doesn't seem to improve things. I think the difficulty is that the compiler can't infer generic type parameters -- for a constructor or a static method call -- when either is in the instance expression of a method reference. Oh well.

这稍微简化了一些事情，但我仍然必须为构造函数指定类型参数。尝试使用钻石或静态工厂方法似乎并没有改善情况。我认为困难在于编译器无法推断泛型类型参数——对于构造函数或静态方法调用——当任何一个在方法引用的实例表达式中时。那好吧。

(Another variation on this that would probably simplify it is to make DistinctByKey<T> implements Predicate<T>and rename the method to eval. This would remove the need to use a method reference and would probably improve type inference. However, it's unlikely to be as nice as the solution below.)

（另一个可能会简化它的变体是创建DistinctByKey<T> implements Predicate<T>方法并将其重命名为eval。这将消除使用方法引用的需要，并且可能会改进类型推断。但是，它不太可能像下面的解决方案一样好。）

UPDATE 2

更新 2

Can't stop thinking about this. Instead of a helper class, use a higher-order function. We can use captured locals to maintain state, so we don't even need a separate class! Bonus, things are simplified so type inference works!

不能停止思考这个。使用高阶函数代替辅助类。我们可以使用捕获的局部变量来维护状态，因此我们甚至不需要单独的类！奖励，事情被简化了，所以类型推断是有效的！

public static <T> Predicate<T> distinctByKey(Function<? super T,Object> keyExtractor) {
    Map<Object,Boolean> seen = new ConcurrentHashMap<>();
    return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}

BigDecimal totalShare = orders.stream()
    .filter(distinctByKey(o -> o.getCompany().getId()))
    .map(Order::getShare)
    .reduce(BigDecimal.ZERO, BigDecimal::add);

Answer 2

回答by Louis Wasserman

You more or less have to do something like

你或多或少必须做类似的事情

 elements.stream()
    .collect(Collectors.toMap(
        obj -> extractKey(obj), 
        obj -> obj, 
       (first, second) -> first
           // pick the first if multiple values have the same key
       )).values().stream();

Answer 3

回答by frhack

We can also use RxJava(very powerful reactive extensionlibrary)

我们也可以使用RxJava（非常强大的反应式扩展库）

Observable.from(persons).distinct(Person::getName)

or

或者

Observable.from(persons).distinct(p -> p.getName())

Answer 4

回答by Jamish

To answer your question in your second update:

要在第二次更新中回答您的问题：

The only part I'm trying to figure out is how the ConcurrentHashMap maintains only one instance for the entire stream:

我想弄清楚的唯一部分是 ConcurrentHashMap 如何为整个流只维护一个实例：

public static <T> Predicate<T> distinctByKey(Function<? super T,Object> keyExtractor) {
        Map<Object,Boolean> seen = new ConcurrentHashMap<>();
        return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
    }

In your code sample, distinctByKeyis only invoked one time, so the ConcurrentHashMap created just once. Here's an explanation:

在您的代码示例中，distinctByKey仅被调用一次，因此 ConcurrentHashMap 只创建了一次。这是一个解释：

The distinctByKeyfunction is just a plain-old function that returns an object, and that object happens to be a Predicate. Keep in mind that a predicate is basically a piece of code that can be evaluated later. To manually evaluate a predicate, you must call a method in the Predicate interfacesuch as test. So, the predicate

该distinctByKey函数只是一个返回对象的普通函数，而该对象恰好是一个 Predicate。请记住，谓词基本上是一段可以稍后评估的代码。要手动评估谓词，您必须调用Predicate 接口中的方法，例如test. 所以，谓词

t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null

is merely a declaration that is not actually evaluated inside distinctByKey.

只是一个声明，实际上并未在内部进行评估distinctByKey。

The predicate is passed around just like any other object. It is returned and passed into the filteroperation, which basically evaluates the predicate repeatedly against each element of the stream by calling test.

谓词就像任何其他对象一样传递。它被返回并传递到filter操作中，该操作基本上通过调用对流的每个元素重复评估谓词test。

I'm sure filteris more complicated than I made it out to be, but the point is, the predicate is evaluated many times outside of distinctByKey. There's nothing special* about distinctByKey; it's just a function that you've called one time, so the ConcurrentHashMap is only created one time.

我敢肯定filter比我想象的要复杂，但关键是，谓词在distinctByKey. 没有什么特别*关于distinctByKey; 它只是您调用过一次的函数，因此 ConcurrentHashMap 只创建一次。

*Apart from being well made, @stuart-marks :)

*除了制作精良，@stuart-marks :)

Answer 5

回答by rognlien

A variation on Stuart Marks second update. Using a Set.

Stuart Marks 第二次更新的变体。使用集合。

public static <T> Predicate<T> distinctByKey(Function<? super T, Object> keyExtractor) {
    Set<Object> seen = Collections.newSetFromMap(new ConcurrentHashMap<>());
    return t -> seen.add(keyExtractor.apply(t));
}

Answer 6

回答by Craig P. Motlin

You can use the distinct(HashingStrategy)method in Eclipse Collections.

您可以distinct(HashingStrategy)在Eclipse Collections 中使用该方法。

List<String> list = Lists.mutable.with("ABQ", "ALB", "CHI", "CUN", "PHX", "PUJ", "BWI");
ListIterate.distinct(list, HashingStrategies.fromFunction(s -> s.substring(0, 1)))
    .each(System.out::println);

If you can refactor listto implement an Eclipse Collections interface, you can call the method directly on the list.

如果可以重构list实现一个Eclipse Collections接口，就可以直接调用列表上的方法。

MutableList<String> list = Lists.mutable.with("ABQ", "ALB", "CHI", "CUN", "PHX", "PUJ", "BWI");
list.distinct(HashingStrategies.fromFunction(s -> s.substring(0, 1)))
    .each(System.out::println);

HashingStrategyis simply a strategy interface that allows you to define custom implementations of equals and hashcode.

HashingStrategy只是一个策略接口，允许您定义 equals 和 hashcode 的自定义实现。

public interface HashingStrategy<E>
{
    int computeHashCode(E object);
    boolean equals(E object1, E object2);
}

Note: I am a committer for Eclipse Collections.

注意：我是 Eclipse Collections 的提交者。

Answer 7

回答by Fahad

It can be done something like

它可以做类似的事情

Set<String> distinctCompany = orders.stream()
        .map(Order::getCompany)
        .collect(Collectors.toSet());

Answer 8

回答by saka1029

Set.add(element)returns true if the set did not already contain element, otherwise false. So you can do like this.

Set.add(element)如果该集合尚未包含element，则返回 true ，否则返回 false。所以你可以这样做。

Set<String> set = new HashSet<>();
BigDecimal totalShare = orders.stream()
    .filter(c -> set.add(c.getCompany().getId()))
    .map(c -> c.getShare())
    .reduce(BigDecimal.ZERO, BigDecimal::add);

If you want to do this parallel, you must use concurrent map.

如果要并行执行此操作，则必须使用并发映射。

Answer 9

回答by Arshed

Another way of finding distinct elements

寻找不同元素的另一种方法

List<String> uniqueObjects = ImmutableList.of("ABQ","ALB","CHI","CUN","PHX","PUJ","BWI")
            .stream()
            .collect(Collectors.groupingBy((p)->p.substring(0,1))) //expression 
            .values()
            .stream()
            .flatMap(e->e.stream().limit(1))
            .collect(Collectors.toList());

任意键上的 Java Lambda Stream Distinct()？

提问by tmn

采纳答案by Stuart Marks

回答by Louis Wasserman

回答by frhack

回答by Jamish

回答by rognlien

回答by Craig P. Motlin

回答by Fahad

回答by saka1029

回答by Arshed

相关推荐

最近更新

标签

任意键上的 Java Lambda Stream Distinct()？

提问by tmn

采纳答案by Stuart Marks

回答by Louis Wasserman

回答by frhack

回答by Jamish

回答by rognlien

回答by Craig P. Motlin

回答by Fahad

回答by saka1029

回答by Arshed

相关推荐

Java com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException：“字段列表”中的未知列“day0_.calendar_id”

从枚举填充 JavaFX ComboBox 或 ChoiceBox

使用 Java 8 Stream API 查找枚举值

JDK8 - 尝试使用 Maven javadoc 插件生成 javadoc 时出现错误“找不到 javax.interceptor.InterceptorBinding 的类文件”

相关推荐

最近更新

标签