在 Java 中删除列表中的重复字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14040331/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 14:51:30  来源:igfitidea点击:

remove duplicate strings in a List in Java

javacollectionsset

提问by user121196

Update: I guess HashSet.add(Object obj)does not call contains. is there a way to implement what I want(remove dup strings ignore case using Set)?

更新:我想HashSet.add(Object obj)不会调用contains. 有没有办法实现我想要的(使用删除 dup 字符串忽略大小写Set)?

Original question: trying to remove dups from a list of String in java, however in the following code CaseInsensitiveSet.contains(Object ob)is not getting called, why?

原始问题:试图从 java 中的字符串列表中删除 dups,但是在下面的代码CaseInsensitiveSet.contains(Object ob)中没有被调用,为什么?

public static List<String> removeDupList(List<String>list, boolean ignoreCase){
    Set<String> set = (ignoreCase?new CaseInsensitiveSet():new LinkedHashSet<String>());
    set.addAll(list);

    List<String> res = new Vector<String>(set);
    return res;
}


public class CaseInsensitiveSet  extends LinkedHashSet<String>{

    @Override
    public boolean contains(Object obj){
        //this not getting called.
        if(obj instanceof String){

            return super.contains(((String)obj).toLowerCase());
        }
        return super.contains(obj);
    }

}

回答by Evgeniy Dorofeev

Try

尝试

        Set set = new TreeSet(String.CASE_INSENSITIVE_ORDER);
        set.addAll(list);
        return new ArrayList(set);

UPDATEbut as Tom Anderson mentioned it does not preserve the initial order, if this is really an issue try

更新但正如汤姆安德森提到的那样,它不会保留初始顺序,如果这真的是一个问题,请尝试

    Set<String> set = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
    Iterator<String> i = list.iterator();
    while (i.hasNext()) {
        String s = i.next();
        if (set.contains(s)) {
            i.remove();
        }
        else {
            set.add(s);
        }
    }

prints

印刷

[2, 1]

回答by Peter Lawrey

containsis not called as LinkedHashSet is not implemented that way.

contains不被调用,因为 LinkedHashSet 不是那样实现的。

If you want add() to call contains() you will need to override it as well.

如果您希望 add() 调用 contains() 您还需要覆盖它。

The reason it is not implemented this way is that calling contains first would mean you are performing two lookups instead of one which would be slower.

没有以这种方式实现的原因是首先调用 contains 意味着您正在执行两次查找,而不是执行速度较慢的一次。

回答by Rahul

add()method of LinkedHashSetdo not call contains()internally else your method would have been called as well.

add()方法LinkedHashSet不要在contains()内部调用,否则您的方法也会被调用。

Instead of a LinkedHashSet, why dont you use a SortedSetwith a case insensitive comparator ? With the String.CASE_INSENSITIVE_ORDERcomparator

而不是 a LinkedHashSet,为什么不使用 aSortedSet和不区分大小写的比较器?使用String.CASE_INSENSITIVE_ORDER比较器

Your code is reduced to

您的代码减少到

public static List<String> removeDupList(List<String>list, boolean ignoreCase){
    Set<String> set = (ignoreCase?new TreeSet<String>(String.CASE_INSENSITIVE_ORDER):new LinkedHashSet<String>());
    set.addAll(list);

    List<String> res = new ArrayList<String>(set);
    return res;
}

If you wish to preserve the Order, as @tom anderson specified in his comment, you can use an auxiliary LinkedHashSet for the order.

如果您希望保留订单,如@tom anderson 在他的评论中指定的那样,您可以为订单使用辅助 LinkedHashSet。

You can try adding that element to TreeSet, if it returns true also add it to LinkedHashSet else not.

您可以尝试将该元素添加到 TreeSet,如果它返回 true 也将它添加到 LinkedHashSet 否则不。

public static List<String> removeDupList(List<String>list){
        Set<String> sortedSet = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
        List<String> orderedList = new ArrayList<String>();
        for(String str : list){
             if(sortedSet.add(str)){ // add returns true, if it is not present already else false
                 orderedList.add(str);
             }
        }
        return orderedList;
    }

回答by Yogesh Patil

Try

尝试

    public boolean addAll(Collection<? extends String> c) {
            for(String s : c) {
            if(! this.contains(s)) {
                this.add(s);
            }
        }
        return super.addAll(c);
    }
    @Override
    public boolean contains(Object o) {
        //Do your checking here
//      return super.contains(o);
    }

This will make sure the contains method is called if you want the code to go through there.

如果您希望代码在那里通过,这将确保调用 contains 方法。

回答by Tom Anderson

Here's another approach, using a HashSetof the strings for deduplication, but building the result list directly:

这是另一种方法,使用一个HashSet字符串进行重复数据删除,但直接构建结果列表:

public static List<String> removeDupList(List<String> list, boolean ignoreCase) {
    HashSet<String> seen = new HashSet<String>();
    ArrayList<String> deduplicatedList = new ArrayList<String>();
    for (String string : list) {
        if (seen.add(ignoreCase ? string.toLowerCase() : string)) {
            deduplicatedList.add(string);
        }
    }
    return deduplicatedList;
}

This is fairly simple, makes only one pass over the elements, and does only a lowercase, a hash lookup, and then a list append for each element.

这相当简单,只对元素进行一次传递,并且只执行小写、哈希查找,然后为每个元素添加一个列表。