Java:检测 ArrayList 中的重复项?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/562894/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 16:16:03  来源:igfitidea点击:

Java: Detect duplicates in ArrayList?

javaarraysarraylistduplicates

提问by

How could I go about detecting (returning true/false) whether an ArrayList contains more than one of the same element in Java?

我如何去检测(返回真/假)一个 ArrayList 是否包含多个 Java 中的相同元素?

Many thanks, Terry

非常感谢,特里

EditForgot to mention that I am not looking to compare "Blocks" with each other but their integer values. Each "block" has an int and this is what makes them different. I find the int of a particular Block by calling a method named "getNum" (e.g. table1[0][2].getNum();

编辑忘了提及,我不是要比较“块”,而是要比较它们的整数值。每个“块”都有一个 int,这就是它们不同的原因。我通过调用名为“getNum”的方法(例如 table1[0][2].getNum();

采纳答案by Paul Tomblin

Simplest: dump the whole collection into a Set (using the Set(Collection) constructor or Set.addAll), then see if the Set has the same size as the ArrayList.

最简单:将整个集合转储到一个 Set(使用 Set(Collection) 构造函数或 Set.addAll),然后查看 Set 的大小是否与 ArrayList 相同。

List<Integer> list = ...;
Set<Integer> set = new HashSet<Integer>(list);

if(set.size() < list.size()){
    /* There are duplicates */
}

Update: If I'm understanding your question correctly, you have a 2d array of Block, as in

更新:如果我正确理解你的问题,你有一个二维数组块,如

Block table[][];

块表[][];

and you want to detect if any row of them has duplicates?

并且您想检测它们中的任何一行是否有重复项?

In that case, I could do the following, assuming that Block implements "equals" and "hashCode" correctly:

在这种情况下,我可以执行以下操作,假设 Block 正确实现了“equals”和“hashCode”:

for (Block[] row : table) {
   Set set = new HashSet<Block>(); 
   for (Block cell : row) {
      set.add(cell);
   }
   if (set.size() < 6) { //has duplicate
   }
}

I'm not 100% sure of that for syntax, so it might be safer to write it as

对于语法,我不是 100% 确定,因此将其编写为更安全

for (int i = 0; i < 6; i++) {
   Set set = new HashSet<Block>(); 
   for (int j = 0; j < 6; j++)
    set.add(table[i][j]);
 ...

Set.addreturns a boolean false if the item being added is already in the set, so you could even short circuit and bale out on any add that returns falseif all you want to know is whether there are any duplicates.

Set.add如果添加的项目已经在集合中,则返回布尔值 false,因此您甚至可以短路并打包任何返回的添加,false如果您只想知道是否有任何重复。

回答by matt b

If you are looking to avoid having duplicates at all, then you should just cut out the middle process of detecting duplicates and use a Set.

如果您想完全避免重复,那么您应该只是删除检测重复的中间过程并使用Set

回答by Varkhan

If your elements are somehow Comparable (the fact that the order has any real meaning is indifferent -- it just needs to be consistent with your definition of equality), the fastest duplicate removal solution is going to sort the list ( 0(n log(n)) ) then to do a single pass and look for repeatedelements (that is, equal elements that follow each other) (this is O(n)).

如果您的元素以某种方式具有可比性(顺序具有任何实际意义的事实是无关紧要的 - 它只需要与您对相等的定义保持一致),最快的重复删除解决方案将对列表进行排序( 0(n log( n)) ) 然后执行单遍并查找重复元素(即,彼此跟随的相等元素)(这是 O(n))。

The overall complexity is going to be O(n log(n)), which is roughly the same as what you would get with a Set (n times long(n)), but with a much smaller constant. This is because the constant in sort/dedup results from the cost of comparing elements, whereas the cost from the set is most likely to result from a hash computation, plus one (possibly several) hash comparisons. If you are using a hash-based Set implementation, that is, because a Tree based is going to give you a O( n log2(n) ), which is even worse.

整体复杂度将是 O(n log(n)),这与使用 Set(n 倍长(n))大致相同,但常数要小得多。这是因为 sort/dedup 中的常量来自比较元素的成本,而来自集合的成本最有可能来自散列计算,加上一个(可能是多个)散列比较。如果您使用的是基于散列的 Set 实现,也就是说,因为基于 Tree 会给您一个 O( n log2(n) ),甚至更糟。

As I understand it, however, you do not need to removeduplicates, but merely test for their existence. So you should hand-code a merge or heap sort algorithm on your array, that simply exits returning true (i.e. "there is a dup") if your comparator returns 0, and otherwise completes the sort, and traverse the sorted array testing for repeats. In a merge or heap sort, indeed, when the sort is completed, you will have compared every duplicate pair unless both elements were already in their final positions (which is unlikely). Thus, a tweaked sort algorithm should yield a huge performance improvement (I would have to prove that, but I guess the tweaked algorithm should be in the O(log(n)) on uniformly random data)

然而,据我所知,您不需要删除重复项,而只需测试它们的存在。所以你应该在你的数组上手工编写一个合并或堆排序算法,如果你的比较器返回 0,它只是退出返回 true(即“有一个重复”),否则完成排序,并遍历排序数组测试重复. 在合并或堆排序中,实际上,当排序完成时,您将比较每个重复对,除非两个元素都已经在它们的最终位置(这不太可能)。因此,调整后的排序算法应该会产生巨大的性能提升(我必须证明这一点,但我猜调整后的算法应该在均匀随机数据的 O(log(n)) 中)

回答by Antonio

Simply put: 1) make sure all items are comparable 2) sort the array 2) iterate over the array and find duplicates

简单地说:1) 确保所有项目都具有可比性 2) 对数组进行排序 2) 遍历数组并找到重复项

回答by akuhn

Improved code, using return value of Set#addinstead of comparing the size of list and set.

改进的代码,使用返回值Set#add而不是比较列表和集合的大小。

public static <T> boolean hasDuplicate(Iterable<T> all) {
    Set<T> set = new HashSet<T>();
    // Set#add returns false if the set does not change, which
    // indicates that a duplicate element has been added.
    for (T each: all) if (!set.add(each)) return true;
    return false;
}

回答by user60062

Improved code to return the duplicate elements

改进了返回重复元素的代码

  • Can find duplicates in a Collection
  • return the set of duplicates
  • Unique Elements can be obtained from the Set
  • 可以在集合中找到重复项
  • 返回重复项集
  • 独特元素可以从集合中获得


public static <T> List getDuplicate(Collection<T> list) {

    final List<T> duplicatedObjects = new ArrayList<T>();
    Set<T> set = new HashSet<T>() {
    @Override
    public boolean add(T e) {
        if (contains(e)) {
            duplicatedObjects.add(e);
        }
        return super.add(e);
    }
    };
   for (T t : list) {
        set.add(t);
    }
    return duplicatedObjects;
}


public static <T> boolean hasDuplicate(Collection<T> list) {
    if (getDuplicate(list).isEmpty())
        return false;
    return true;
}

回答by Rakesh Sabbani

To know the Duplicates in a List use the following code:It will give you the set which contains duplicates.

要了解列表中的重复项,请使用以下代码:它将为您提供包含重复项的集合。

 public Set<?> findDuplicatesInList(List<?> beanList) {
    System.out.println("findDuplicatesInList::"+beanList);
    Set<Object> duplicateRowSet=null;
    duplicateRowSet=new LinkedHashSet<Object>();
            for(int i=0;i<beanList.size();i++){
                Object superString=beanList.get(i);
                System.out.println("findDuplicatesInList::superString::"+superString);
                for(int j=0;j<beanList.size();j++){
                    if(i!=j){
                         Object subString=beanList.get(j);
                         System.out.println("findDuplicatesInList::subString::"+subString);
                         if(superString.equals(subString)){
                             duplicateRowSet.add(beanList.get(j));
                         }
                    }
                }
            }
            System.out.println("findDuplicatesInList::duplicationSet::"+duplicateRowSet);
        return duplicateRowSet;
  }

回答by Amitesh Jha

    String tempVal = null;
    for (int i = 0; i < l.size(); i++) {
        tempVal = l.get(i); //take the ith object out of list
        while (l.contains(tempVal)) {
            l.remove(tempVal); //remove all matching entries
        }
        l.add(tempVal); //at last add one entry
    }

Note: this will have major performance hit though as items are removed from start of the list. To address this, we have two options. 1) iterate in reverse order and remove elements. 2) Use LinkedList instead of ArrayList. Due to biased questions asked in interviews to remove duplicates from List without using any other collection, above example is the answer. In real world though, if I have to achieve this, I will put elements from List to Set, simple!

注意:尽管从列表的开头删除了项目,但这会对性能造成重大影响。为了解决这个问题,我们有两个选择。1)以相反的顺序迭代并删除元素。2) 使用 LinkedList 而不是 ArrayList。由于在面试中提出的从 List 中删除重复项而不使用任何其他集合的有偏见的问题,上面的例子就是答案。但在现实世界中,如果我必须实现这一点,我会将元素从 List 放入 Set,很简单!

回答by faizal

/**
     * Method to detect presence of duplicates in a generic list. 
     * Depends on the equals method of the concrete type. make sure to override it as required.
     */
    public static <T> boolean hasDuplicates(List<T> list){
        int count = list.size();
        T t1,t2;

        for(int i=0;i<count;i++){
            t1 = list.get(i);
            for(int j=i+1;j<count;j++){
                t2 = list.get(j);
                if(t2.equals(t1)){
                    return true;
                }
            }
        }
        return false;
    }

An example of a concrete class that has overridden equals():

已覆盖的具体类的示例equals()

public class Reminder{
    private long id;
    private int hour;
    private int minute;

    public Reminder(long id, int hour, int minute){
        this.id = id;
        this.hour = hour;
        this.minute = minute;
    }

    @Override
    public boolean equals(Object other){
        if(other == null) return false;
        if(this.getClass() != other.getClass()) return false;
        Reminder otherReminder = (Reminder) other;
        if(this.hour != otherReminder.hour) return false;
        if(this.minute != otherReminder.minute) return false;

        return true;
    }
}

回答by Saurabh

If you want the set of duplicate values:

如果您想要一组重复值:

import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class FindDuplicateInArrayList {

    public static void main(String[] args) {

        Set<String> uniqueSet = new HashSet<String>();
        List<String> dupesList = new ArrayList<String>();
        for (String a : args) {
            if (uniqueSet.contains(a))
                dupesList.add(a);
            else
                uniqueSet.add(a);
        }
        System.out.println(uniqueSet.size() + " distinct words: " + uniqueSet);
        System.out.println(dupesList.size() + " dupesList words: " + dupesList);
    }
}

And probably also think about trimming values or using lowercase ... depending on your case.

并且可能还会考虑修剪值或使用小写字母...取决于您的情况。