java 在集合中查找重复条目

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10755632/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 02:24:43  来源:igfitidea点击:

Finding duplicate entries in Collection

javacollectionsduplicatesequality

提问by

Is there a tool or library to find duplicate entries in a Collection according to specific criteria that can be implemented?

是否有工具或库可以根据可以实施的特定标准在集合中查找重复条目?



To make myself clear: I want to compare the entries to each other according to specific criteria. So I think a Predicatereturning just trueor falseisn't enough.

明确说明:我想根据特定标准将条目相互比较。所以我认为Predicate回归只是truefalse还不够。



I can't use equals.

我不能用equals

采纳答案by dasblinkenlight

I've created a new interface akin to the IEqualityComparer<T>interface in .NET.

我创建了一个类似于.NET 中IEqualityComparer<T>界面的新界面。

Such a EqualityComparator<T>I then pass to the following method which detects duplicates.

这样的EqualityComparator<T>予然后通过下面的方法检测重复。

public static <T> boolean hasDuplicates(Collection<T> collection,
        EqualsComparator<T> equalsComparator) {
    List<T> list = new ArrayList<>(collection);
    for (int i = 0; i < list.size(); i++) {
        T object1 = list.get(i);
        for (int j = (i + 1); j < list.size(); j++) {
            T object2 = list.get(j);
            if (object1 == object2
                    || equalsComparator.equals(object1, object2)) {
                return true;
            }
        }
    }
    return false;
}

This way I can customise the comparison to my needs.

通过这种方式,我可以根据自己的需要自定义比较。

回答by Samuel Rossille

It depends on the semantic of the criterion:

这取决于标准的语义:

If your criterion is always the samefor a given class, and is inherent to the underlying concept, you should just implement equalsand hashCodeand use a set.

如果你的标准始终是相同的给定类,并且是固有的基本概念,你应该实施equalshashCode并使用一组。

If your criterion depend on the context, org.apache.commons.collections.CollectionUtils.select(java.util.Collection, org.apache.commons.collections.Predicate)might be the right solution for you.

如果您的标准取决于上下文org.apache.commons.collections.CollectionUtils.select(java.util.Collection, org.apache.commons.collections.Predicate)可能是适合您的解决方案。

回答by Andy Thomas

If you want to findduplicates, rather than just removing them, one approach would be to throw the Collection into an array, sort the array via a Comparator that implements your criteria, then linearly walk through the array, looking for adjacent duplicates.

如果您想查找重复项,而不仅仅是删除它们,一种方法是将 Collection 放入数组中,通过实现您的条件的 Comparator 对数组进行排序,然后线性遍历数组,寻找相邻的重复项。

Here's a sketch (not tested):

这是一个草图(未经测试):

   MyComparator myComparator = new MyComparator();
   MyType[] myArray = myList.toArray();
   Arrays.sort( myArray, myComparator );
   for ( int i = 1; i < myArray.length; ++i ) {
      if ( 0 == myComparator.compare( myArray[i - 1], myArray[i] )) {
         // Found a duplicate!
      }
   }

Edit:From your comment, you just want to know if there areduplicates. The approach above works for this too. But you could more simply just create a java.util.SortedSet with a custom Comparator. Here's a sketch:

编辑:从您的评论,你只是想知道是否有重复的。上面的方法也适用于此。但您可以更简单地创建一个带有自定义 Comparator 的 java.util.SortedSet。这是一个草图:

   MyComparator myComparator = new MyComparator();
   TreeSet treeSet = new TreeSet( myComparator );
   treeSet.addAll( myCollection );
   boolean containsDuplicates = (treeSet.size() != myCollection.size()); 

回答by dasblinkenlight

You can adapt a Java set to search for duplicates among objects of an arbitrary type: wrap your target class in a private wrapper that evaluates equality based on your criteria, and construct a set of wrappers.

您可以调整 Java 集以在任意类型的对象之间搜索重复项:将目标类包装在私有包装器中,该包装器根据您的条件评估相等性,并构造一组包装器。

Here is a somewhat lengthy example that illustrates the technique. It considers two people with the same first name to be equal, and so it detects three duplicates in the array of five objects.

这是一个有点冗长的示例,说明了该技术。它认为两个名字相同的人是平等的,因此它会检测到五个对象数组中的三个重复项。

import java.util.*;
import java.lang.*;

class Main {
    static class Person {
        private String first;
        private String last;
        public String getFirst() {return first;}
        public String getLast() {return last;}
        public Person(String f, String l) {
            first = f;
            last = l;
        }
        public String toString() {
            return first+" "+last;
        }
    }
    public static void main (String[] args) throws java.lang.Exception {
        List<Person> people = new ArrayList<Person>();
        people.add(new Person("John", "Smith"));
        people.add(new Person("John", "Scott"));
        people.add(new Person("Hyman", "First"));
        people.add(new Person("John", "Walker"));
        people.add(new Person("Hyman", "Black"));
        Set<Object> seen = new HashSet<Object>();
        for (Person p : people) {
            final Person thisPerson = p;
            class Wrap {
                public int hashCode() { return thisPerson.getFirst().hashCode(); }
                public boolean equals(Object o) {
                    Wrap other = (Wrap)o;
                    return other.wrapped().getFirst().equals(thisPerson.getFirst());
                }
                public Person wrapped() { return thisPerson; }
            };
            Wrap wrap = new Wrap();
            if (seen.add(wrap)) {
                System.out.println(p + " is new");
            } else {
                System.out.println(p + " is a duplicate");
            }
        }
    }
}

You can play with this example on ideone [link].

你可以在 ideone [link]上玩这个例子。

回答by Thomas

You could use a map and while iterating over the collection put the elements into the map (the predicates would form the key) and if there's already an entry you've found a duplicate.

您可以使用地图,并在迭代集合时将元素放入地图(谓词将构成键),如果已经有一个条目,您会发现重复项。

For more information see here: Finding duplicates in a collection

有关更多信息,请参见此处:在集合中查找重复项

回答by Tadhg

Treeset allows you to do this easily:

Treeset 允许您轻松地做到这一点:

Set uniqueItems = new TreeSet<>(yourComparator);
List<?> duplicates = objects.stream().filter(o -> !uniqueItems.add(o)).collect(Collectors.toList());

yourComaratoris used when calling uniqueItems.add(o), which adds the item to the set and returns trueif the item is unique. If the comparator considers the item a duplicate, add(o)will return false.

yourComarator在调用 时使用uniqueItems.add(o),它将项目添加到集合中并true在项目唯一时返回。如果比较器认为该项目重复,add(o)则返回 false。

Note that the item's equalsmethod must be consistent with yourComaratoras per the TreeSet documentationfor this to work.

请注意,项目的equals方法必须与yourComarator根据TreeSet 文档一致才能工作。

回答by Nagendra

Iterate the ArrayListwhich contains duplicates and add them to the HashSet. When the add method returns false in the HashSetjust log the duplicate to the console.

迭代ArrayList包含重复项的 并将它们添加到HashSet. 当 add 方法返回 false 时,HashSet只需将副本记录到控制台。