如果修改了包含的元素,则 Java HashSet 包含重复项

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13114043/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 11:32:21  来源:igfitidea点击:

Java HashSet contains duplicates if contained element is modified

javaduplicateshashset

提问by PB_MLT

Let's say you have a class and you create a HashSet which can store this instances of this class. If you try to add instances which are equal, only one instance is kept in the collection, and that is fine.

假设您有一个类,并且您创建了一个可以存储此类的实例的 HashSet。如果您尝试添加相等的实例,则集合中只会保留一个实例,这很好。

However if you have two different instances in the HashSet, and you take one and make it an exact copy of the other (by copying the fields), the HashSet will then contain two duplicate instances.

但是,如果您在 HashSet 中有两个不同的实例,并且您将其中一个作为另一个的精确副本(通过复制字段),则 HashSet 将包含两个重复的实例。

Here is the code which demonstrates this:

这是演示这一点的代码:

 public static void main(String[] args)
    {
         HashSet<GraphEdge> set = new HashSet<>();
        GraphEdge edge1 = new GraphEdge(1, "a");
        GraphEdge edge2 = new GraphEdge(2, "b");
        GraphEdge edge3 = new GraphEdge(3, "c");

        set.add(edge1);
        set.add(edge2);
        set.add(edge3);

        edge2.setId(1);
        edge2.setName("a");

        for(GraphEdge edge: set)
        {
            System.out.println(edge.toString());
        }

        if(edge2.equals(edge1))
        {
            System.out.println("Equals");
        }
        else
        {
            System.out.println("Not Equals");
        }
    }

    public class GraphEdge
    {
        private int id;
        private String name;

        //Constructor ...

        //Getters & Setters...

        public int hashCode()
        {
        int hash = 7;
        hash = 47 * hash + this.id;
        hash = 47 * hash + Objects.hashCode(this.name);
        return hash;    
        }

        public boolean equals(Object o)
        {
            if(o == this)
            {
                return true;
            }

            if(o instanceof GraphEdge)
            {
                GraphEdge anotherGraphEdge = (GraphEdge) o;
                if(anotherGraphEdge.getId() == this.id && anotherGraphEdge.getName().equals(this.name))
                {
                    return true;
                }
            }

                return false;
        }
    }

The output from the above code:

上述代码的输出:

1 a
1 a
3 c
Equals

Is there a way to force the HashSet to validate its contents so that possible duplicate entries created as in the above scenario get removed?

有没有办法强制 HashSet 验证其内容,以便删除在上述场景中创建的可能重复条目?

A possible solution could be to create a new HashSet and copy the contents from one hashset to another so that the new hashset won't contain duplicates however I don't like this solution.

一个可能的解决方案可能是创建一个新的 HashSet 并将内容从一个哈希集复制到另一个,这样新的哈希集就不会包含重复项,但是我不喜欢这个解决方案。

回答by user207421

The situation you describe is invalid. See the Javadoc: "The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set."

你描述的情况无效。请参阅Javadoc:“如果对象是集合中的一个元素时,对象的值以影响等于比较的方式更改,则不会指定集合的​​行为。”

回答by Stephen C

To add to @EJP's answer, what will happen in practice if you mutate objects in a HashSetto make them duplicates (in the sense of the equals/ hashcodecontract) is that the hash table data structure will break.

添加到@EJP 的答案中,如果您改变 a 中的对象HashSet以使它们重复(在equals/hashcode合同的意义上),实际上会发生什么是哈希表数据结构将中断。

  • Depending on the exact details of the mutation, and the state of the hash table, one or both of the instances will become invisible to lookup (e.g. containsand other operations). Either it is on the wrong hash chain, or because the other instance appears before it on the hash chain. And it is hard to predict which instance will be visible ... and whether it will remain visible.

  • If you iterate the set, both instances will still be present ... in violation of the Setcontract.

  • 根据突变的确切细节和哈希表的状态,一个或两个实例将变得不可见以进行查找(例如contains和其他操作)。要么它在错误的哈希链上,要么因为另一个实例出现在哈希链上。并且很难预测哪个实例是可见的……以及它是否会保持可见。

  • 如果您迭代该集合,两个实例仍将存在......违反Set合同。

Of course, this is very broken from the application perspective.

当然,这从应用的角度来看是非常破碎的。



You can avoid this problem by either:

您可以通过以下任一方式避免此问题:

  • using an immutable type for your set elements,
  • making a copy of the objects as you put them into the set and / or pull them out of the set,
  • writing your code so that it "knows" not to change the objects for the duration ...
  • 为您的集合元素使用不可变类型,
  • 在将对象放入集合和/或将它们从集合中拉出时制作对象的副本,
  • 编写您的代码,以便它“知道”在持续时间内不要更改对象......

From the perspective of correctness and robustness, the first option is clearly best.

从正确性和健壮性的角度来看,第一个选项显然是最好的。



Incidentally, it would be really difficult to "fix" this in a general way. There is no pervasive mechanism in Java for knowing ... or being notified ... that some element has changed. You can implement such a mechanism on a class by class basis, but it has to be coded explicitly (and it won't be cheap). Even if you did have such a mechanism, what would you do? Clearly one of the objects should now be removed from the set ... but which one?

顺便说一句,以一般的方式“修复”这个真的很困难。Java 中没有普遍的机制来知道……或被通知……某些元素已经改变。您可以逐个类地实现这种机制,但必须对其进行显式编码(而且成本不高)。即使你有这样的机制,你会怎么做?很明显,现在应该从集合中删除一个对象……但是哪个对象呢?

回答by Martin Serrano

You are correct and I don't think there is any way to protect against the case you discuss. All of collections which use hashing and equals are subject to this problem. The collection has no notification that the object has changed since it was added to the collection. I think the solution you outline is good.

您是对的,我认为没有任何方法可以防止您讨论的案例。所有使用散列和等号的集合都会遇到这个问题。集合没有通知对象自添加到集合后已更改。我认为您概述的解决方案很好。

If you are so concerned with this issue, perhaps you need to rethink your data structures. You could use immutable objects for instance. With immutable objects you would not have this problem.

如果您非常关心这个问题,也许您需要重新考虑您的数据结构。例如,您可以使用不可变对象。使用不可变对象,您就不会遇到这个问题。

回答by Mike Valenty

HashSetis not aware of its member's properties changing after the object has been added. If this is a problem for you, then you may want to consider making GraphEdgeimmutable. For example:

HashSet不知道在添加对象后其成员的属性发生了变化。如果这对您来说是个问题,那么您可能需要考虑使GraphEdge不可变。例如:

GraphEdge edge4 = edge2.changeName("new_name");

In the case where GraphEdgeis immutable, changing a value result in returning a new instance rather changing the existing instance.

在 whereGraphEdge是不可变的情况下,更改值会导致返回新实例而不是更改现有实例。

回答by slipperyseal

You will need to do the unique detection a the time you iterate your list. Making a new HashSet might not seem the right way to go, but why not try this... And maybe not use a HashSet to start with...

您需要在迭代列表时进行唯一检测。制作一个新的 HashSet 似乎不是正确的方法,但为什么不试试这个……而且也许不使用 HashSet 开始……

public class TestIterator {
    public static void main(String[] args) {
        List<String> list = new ArrayList<String>();

        list.add("1");
        list.add("1");
        list.add("2");
        list.add("3");

        for (String s : new UniqueIterator<String>(list)) {
            System.out.println(s);
        }
    }
}

public class UniqueIterator<T> implements Iterable<T> {
    private Set<T> hashSet = new HashSet<T>();

    public UniqueIterator(Iterable<T> iterable) {
        for (T t : iterable) {
            hashSet.add(t);
        }
    }

    public Iterator<T> iterator() {
        return hashSet.iterator();
    }
}

回答by Savvas Dalkitsis

Objects.hashCode is meant to be used to generate a hascode using parameter objects. You are using it as part of the hascode calculation.

Objects.hashCode 旨在用于使用参数对象生成 hascode。您将其用作 hascode 计算的一部分。

Try replacing your implementation of hashCode with the following:

尝试用以下内容替换您的 hashCode 实现:

public int hashCode()
{
    return Objects.hashCode(this.id, this.name);
}