如果修改了包含的元素,则 Java HashSet 包含重复项
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13114043/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java HashSet contains duplicates if contained element is modified
提问by PB_MLT
Let's say you have a class and you create a HashSet which can store this instances of this class. If you try to add instances which are equal, only one instance is kept in the collection, and that is fine.
假设您有一个类,并且您创建了一个可以存储此类的实例的 HashSet。如果您尝试添加相等的实例,则集合中只会保留一个实例,这很好。
However if you have two different instances in the HashSet, and you take one and make it an exact copy of the other (by copying the fields), the HashSet will then contain two duplicate instances.
但是,如果您在 HashSet 中有两个不同的实例,并且您将其中一个作为另一个的精确副本(通过复制字段),则 HashSet 将包含两个重复的实例。
Here is the code which demonstrates this:
这是演示这一点的代码:
public static void main(String[] args)
{
HashSet<GraphEdge> set = new HashSet<>();
GraphEdge edge1 = new GraphEdge(1, "a");
GraphEdge edge2 = new GraphEdge(2, "b");
GraphEdge edge3 = new GraphEdge(3, "c");
set.add(edge1);
set.add(edge2);
set.add(edge3);
edge2.setId(1);
edge2.setName("a");
for(GraphEdge edge: set)
{
System.out.println(edge.toString());
}
if(edge2.equals(edge1))
{
System.out.println("Equals");
}
else
{
System.out.println("Not Equals");
}
}
public class GraphEdge
{
private int id;
private String name;
//Constructor ...
//Getters & Setters...
public int hashCode()
{
int hash = 7;
hash = 47 * hash + this.id;
hash = 47 * hash + Objects.hashCode(this.name);
return hash;
}
public boolean equals(Object o)
{
if(o == this)
{
return true;
}
if(o instanceof GraphEdge)
{
GraphEdge anotherGraphEdge = (GraphEdge) o;
if(anotherGraphEdge.getId() == this.id && anotherGraphEdge.getName().equals(this.name))
{
return true;
}
}
return false;
}
}
The output from the above code:
上述代码的输出:
1 a
1 a
3 c
Equals
Is there a way to force the HashSet to validate its contents so that possible duplicate entries created as in the above scenario get removed?
有没有办法强制 HashSet 验证其内容,以便删除在上述场景中创建的可能重复条目?
A possible solution could be to create a new HashSet and copy the contents from one hashset to another so that the new hashset won't contain duplicates however I don't like this solution.
一个可能的解决方案可能是创建一个新的 HashSet 并将内容从一个哈希集复制到另一个,这样新的哈希集就不会包含重复项,但是我不喜欢这个解决方案。
回答by user207421
回答by Stephen C
To add to @EJP's answer, what will happen in practice if you mutate objects in a HashSet
to make them duplicates (in the sense of the equals
/ hashcode
contract) is that the hash table data structure will break.
添加到@EJP 的答案中,如果您改变 a 中的对象HashSet
以使它们重复(在equals
/hashcode
合同的意义上),实际上会发生什么是哈希表数据结构将中断。
Depending on the exact details of the mutation, and the state of the hash table, one or both of the instances will become invisible to lookup (e.g.
contains
and other operations). Either it is on the wrong hash chain, or because the other instance appears before it on the hash chain. And it is hard to predict which instance will be visible ... and whether it will remain visible.If you iterate the set, both instances will still be present ... in violation of the
Set
contract.
根据突变的确切细节和哈希表的状态,一个或两个实例将变得不可见以进行查找(例如
contains
和其他操作)。要么它在错误的哈希链上,要么因为另一个实例出现在哈希链上。并且很难预测哪个实例是可见的……以及它是否会保持可见。如果您迭代该集合,两个实例仍将存在......违反
Set
合同。
Of course, this is very broken from the application perspective.
当然,这从应用的角度来看是非常破碎的。
You can avoid this problem by either:
您可以通过以下任一方式避免此问题:
- using an immutable type for your set elements,
- making a copy of the objects as you put them into the set and / or pull them out of the set,
- writing your code so that it "knows" not to change the objects for the duration ...
- 为您的集合元素使用不可变类型,
- 在将对象放入集合和/或将它们从集合中拉出时制作对象的副本,
- 编写您的代码,以便它“知道”在持续时间内不要更改对象......
From the perspective of correctness and robustness, the first option is clearly best.
从正确性和健壮性的角度来看,第一个选项显然是最好的。
Incidentally, it would be really difficult to "fix" this in a general way. There is no pervasive mechanism in Java for knowing ... or being notified ... that some element has changed. You can implement such a mechanism on a class by class basis, but it has to be coded explicitly (and it won't be cheap). Even if you did have such a mechanism, what would you do? Clearly one of the objects should now be removed from the set ... but which one?
顺便说一句,以一般的方式“修复”这个真的很困难。Java 中没有普遍的机制来知道……或被通知……某些元素已经改变。您可以逐个类地实现这种机制,但必须对其进行显式编码(而且成本不高)。即使你有这样的机制,你会怎么做?很明显,现在应该从集合中删除一个对象……但是哪个对象呢?
回答by Martin Serrano
You are correct and I don't think there is any way to protect against the case you discuss. All of collections which use hashing and equals are subject to this problem. The collection has no notification that the object has changed since it was added to the collection. I think the solution you outline is good.
您是对的,我认为没有任何方法可以防止您讨论的案例。所有使用散列和等号的集合都会遇到这个问题。集合没有通知对象自添加到集合后已更改。我认为您概述的解决方案很好。
If you are so concerned with this issue, perhaps you need to rethink your data structures. You could use immutable objects for instance. With immutable objects you would not have this problem.
如果您非常关心这个问题,也许您需要重新考虑您的数据结构。例如,您可以使用不可变对象。使用不可变对象,您就不会遇到这个问题。
回答by Mike Valenty
HashSet
is not aware of its member's properties changing after the object has been added. If this is a problem for you, then you may want to consider making GraphEdge
immutable. For example:
HashSet
不知道在添加对象后其成员的属性发生了变化。如果这对您来说是个问题,那么您可能需要考虑使GraphEdge
不可变。例如:
GraphEdge edge4 = edge2.changeName("new_name");
In the case where GraphEdge
is immutable, changing a value result in returning a new instance rather changing the existing instance.
在 whereGraphEdge
是不可变的情况下,更改值会导致返回新实例而不是更改现有实例。
回答by slipperyseal
You will need to do the unique detection a the time you iterate your list. Making a new HashSet might not seem the right way to go, but why not try this... And maybe not use a HashSet to start with...
您需要在迭代列表时进行唯一检测。制作一个新的 HashSet 似乎不是正确的方法,但为什么不试试这个……而且也许不使用 HashSet 开始……
public class TestIterator {
public static void main(String[] args) {
List<String> list = new ArrayList<String>();
list.add("1");
list.add("1");
list.add("2");
list.add("3");
for (String s : new UniqueIterator<String>(list)) {
System.out.println(s);
}
}
}
public class UniqueIterator<T> implements Iterable<T> {
private Set<T> hashSet = new HashSet<T>();
public UniqueIterator(Iterable<T> iterable) {
for (T t : iterable) {
hashSet.add(t);
}
}
public Iterator<T> iterator() {
return hashSet.iterator();
}
}
回答by Savvas Dalkitsis
Objects.hashCode is meant to be used to generate a hascode using parameter objects. You are using it as part of the hascode calculation.
Objects.hashCode 旨在用于使用参数对象生成 hascode。您将其用作 hascode 计算的一部分。
Try replacing your implementation of hashCode with the following:
尝试用以下内容替换您的 hashCode 实现:
public int hashCode()
{
return Objects.hashCode(this.id, this.name);
}