Java HashMap 如何处理具有相同哈希码的不同对象?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6493605/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How does a Java HashMap handle different objects with the same hash code?
提问by akshay
As per my understanding I think:
根据我的理解,我认为:
- It is perfectly legal for two objects to have the same hashcode.
- If two objects are equal (using the equals() method) then they have the same hashcode.
- If two objects are not equal then they cannot have the same hashcode
- 两个对象具有相同的哈希码是完全合法的。
- 如果两个对象相等(使用 equals() 方法),则它们具有相同的哈希码。
- 如果两个对象不相等,则它们不能具有相同的哈希码
Am I correct?
我对么?
Now if am correct, I have the following question:
The HashMap
internally uses the hashcode of the object. So if two objects can have the same hashcode, then how can the HashMap
track which key it uses?
现在,如果正确,我有以下问题:HashMap
内部使用对象的哈希码。因此,如果两个对象可以具有相同的哈希码,那么如何HashMap
跟踪它使用的键呢?
Can someone explain how the HashMap
internally uses the hashcode of the object?
有人可以解释HashMap
内部如何使用对象的哈希码吗?
采纳答案by Jesper
A hashmap works like this (this is a little bit simplified, but it illustrates the basic mechanism):
hashmap 的工作原理是这样的(这有点简化,但它说明了基本机制):
It has a number of "buckets" which it uses to store key-value pairs in. Each bucket has a unique number - that's what identifies the bucket. When you put a key-value pair into the map, the hashmap will look at the hash code of the key, and store the pair in the bucket of which the identifier is the hash code of the key. For example: The hash code of the key is 235 -> the pair is stored in bucket number 235. (Note that one bucket can store more then one key-value pair).
它有许多用于存储键值对的“存储桶”。每个存储桶都有一个唯一的编号——这就是标识存储桶的原因。当您将键值对放入映射中时,哈希映射会查看键的哈希码,并将该对存储在标识符为键的哈希码的桶中。例如:key的hash码是235->这对存储在桶号235中。(注意一个桶可以存储多个key-value对)。
When you lookup a value in the hashmap, by giving it a key, it will first look at the hash code of the key that you gave. The hashmap will then look into the corresponding bucket, and then it will compare the key that you gave with the keys of all pairs in the bucket, by comparing them with equals()
.
当你在 hashmap 中查找一个值时,通过给它一个键,它会首先查看你给的键的哈希码。然后哈希映射会查看相应的存储桶,然后将您提供的密钥与存储桶中所有对的密钥进行比较,将它们与equals()
.
Now you can see how this is very efficient for looking up key-value pairs in a map: by the hash code of the key the hashmap immediately knows in which bucket to look, so that it only has to test against what's in that bucket.
现在您可以看到这对于在映射中查找键值对非常有效:通过键的哈希码,哈希映射立即知道要查找哪个存储桶,因此它只需要针对该存储桶中的内容进行测试。
Looking at the above mechanism, you can also see what requirements are necessary on the hashCode()
and equals()
methods of keys:
看上面的机制,也可以看出对keyshashCode()
和equals()
methods有什么要求:
If two keys are the same (
equals()
returnstrue
when you compare them), theirhashCode()
method must return the same number. If keys violate this, then keys that are equal might be stored in different buckets, and the hashmap would not be able to find key-value pairs (because it's going to look in the same bucket).If two keys are different, then it doesn't matter if their hash codes are the same or not. They will be stored in the same bucket if their hash codes are the same, and in this case, the hashmap will use
equals()
to tell them apart.
如果两个键相同(比较时
equals()
返回true
),则它们的hashCode()
方法必须返回相同的数字。如果键违反了这一点,那么相等的键可能会存储在不同的存储桶中,并且哈希图将无法找到键值对(因为它将在同一个存储桶中查找)。如果两个键不同,那么它们的哈希码是否相同并不重要。如果它们的哈希码相同,它们将被存储在同一个桶中,在这种情况下,哈希图将用于
equals()
区分它们。
回答by Pace
The hashcode determines which bucket for the hashmap to check. If there is more than one object in the bucket then a linear search is done to find which item in the bucket equals the desired item (using the equals()
) method.
哈希码确定要检查哈希图的哪个桶。如果桶中有多个对象,则执行线性搜索以查找桶中的哪一项等于所需的项目(使用equals()
)方法。
In other words, if you have a perfect hashcode then hashmap access is constant, you will never have to iterate through a bucket (technically you would also have to have MAX_INT buckets, the Java implementation may share a few hash codes in the same bucket to cut down on space requirements). If you have the worst hashcode (always returns the same number) then your hashmap access becomes linear since you have to search through every item in the map (they're all in the same bucket) to get what you want.
换句话说,如果你有一个完美的哈希码,那么哈希映射访问是恒定的,你永远不必遍历一个桶(从技术上讲,你还必须有 MAX_INT 个桶,Java 实现可能会在同一个桶中共享一些哈希码以减少空间需求)。如果您有最差的哈希码(始终返回相同的数字),那么您的哈希图访问将变为线性,因为您必须搜索地图中的每个项目(它们都在同一个存储桶中)才能得到您想要的。
Most of the time a well written hashcode isn't perfect but is unique enough to give you more or less constant access.
大多数情况下,编写良好的哈希码并不完美,但足够独特,可以让您或多或少地获得持续访问。
回答by Jon Skeet
Your third assertion is incorrect.
你的第三个断言是不正确的。
It's perfectly legal for two unequal objects to have the same hash code. It's used by HashMap
as a "first pass filter" so that the map can quickly find possibleentries with the specified key. The keys with the same hash code are then tested for equality with the specified key.
两个不相等的对象具有相同的哈希码是完全合法的。它被HashMap
用作“第一遍过滤器”,以便地图可以使用指定的键快速找到可能的条目。然后测试具有相同哈希码的键是否与指定键相等。
You wouldn't want a requirement that two unequal objects couldn't have the same hash code, as otherwise that would limit you to 232possible objects. (It would also mean that different types couldn't even use an object's fields to generate hash codes, as other classes could generate the same hash.)
您不希望要求两个不相等的对象不能具有相同的哈希码,否则这会将您限制为 2 32 个可能的对象。(这也意味着不同类型甚至不能使用对象的字段来生成哈希码,因为其他类可以生成相同的哈希。)
回答by Leif Wickland
You're mistaken on point three. Two entries can have the same hash code but not be equal. Take a look at the implementation of HashMap.get from the OpenJdk. You can see that it checks that the hashes are equal and the keys are equal. Were point three true, then it would be unnecessary to check that the keys are equal. The hash code is compared before the key because the former is a more efficient comparison.
你在第三点上错了。两个条目可以具有相同的哈希码,但不能相等。从 OpenJdk看一下HashMap.get 的实现。您可以看到它检查散列是否相等且键是否相等。如果第三点为真,那么就没有必要检查键是否相等。哈希码在键之前进行比较,因为前者是一种更有效的比较。
If you're interested in learning a little more about this, take a look at the Wikipedia article on Open Addressing collision resolution, which I believe is the mechanism that the OpenJdk implementation uses. That mechanism is subtly different than the "bucket" approach one of the other answers mentions.
如果您有兴趣了解更多相关信息,请查看维基百科关于开放寻址冲突解决的文章,我相信这是 OpenJdk 实现使用的机制。该机制与其他答案中提到的“桶”方法略有不同。
回答by Abhijit Gaikwad
You can find excellent information at http://javarevisited.blogspot.com/2011/02/how-hashmap-works-in-java.html
您可以在http://javarevisited.blogspot.com/2011/02/how-hashmap-works-in-java.html找到优秀的信息
To Summarize:
总结:
HashMap works on the principle of hashing
HashMap 的工作原理是散列
put(key, value):HashMap stores both key and value object as Map.Entry. Hashmap applies hashcode(key) to get the bucket. if there is collision ,HashMap uses LinkedList to store object.
put(key, value):HashMap 将键和值对象存储为 Map.Entry。Hashmap 应用 hashcode(key) 来获取桶。如果有碰撞,HashMap 使用 LinkedList 来存储对象。
get(key):HashMap uses Key Object's hashcode to find out bucket location and then call keys.equals() method to identify correct node in LinkedList and return associated value object for that key in Java HashMap.
get(key):HashMap 使用 Key Object 的 hashcode 找出桶的位置,然后调用 keys.equals() 方法来识别 LinkedList 中的正确节点,并返回 Java HashMap 中该键的关联值对象。
回答by Sergii Shevchyk
HashMap
is an array of Entry
objects.
HashMap
是一个Entry
对象数组。
Consider HashMap
as just an array of objects.
考虑HashMap
为只是对象的数组。
Have a look at what this Object
is:
看看这Object
是什么:
static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
final int hash;
…
}
Each Entry
object represents a key-value pair. The field next
refers to another Entry
object if a bucket has more than one Entry
.
每个Entry
对象代表一个键值对。如果存储桶有多个 ,则该字段next
引用另一个Entry
对象Entry
。
Sometimes it might happen that hash codes for 2 different objects are the same. In this case, two objects will be saved in one bucket and will be presented as a linked list.
The entry point is the more recently added object. This object refers to another object with the next
field and so on. The last entry refers to null
.
有时可能会发生 2 个不同对象的哈希码相同的情况。在这种情况下,两个对象将保存在一个存储桶中,并将显示为一个链表。入口点是最近添加的对象。此对象引用具有next
字段等的另一个对象。最后一个条目指的是null
。
When you create a HashMap
with the default constructor
当您HashMap
使用默认构造函数创建 a时
HashMap hashMap = new HashMap();
The array is created with size 16 and default 0.75 load balance.
创建的数组大小为 16,默认负载平衡为 0.75。
Adding a new key-value pair
添加新的键值对
- Calculate hashcode for the key
- Calculate position
hash % (arrayLength-1)
where element should be placed (bucket number) - If you try to add a value with a key which has already been saved in
HashMap
, then value gets overwritten. - Otherwise element is added to the bucket.
- 计算密钥的哈希码
- 计算
hash % (arrayLength-1)
应放置元素的位置(桶号) - 如果您尝试使用已保存在 中的键添加值
HashMap
,则值将被覆盖。 - 否则元素被添加到桶中。
If the bucket already has at least one element, a new one gets added and placed in the first position of the bucket. Its next
field refers to the old element.
如果存储桶已经有至少一个元素,则会添加一个新元素并将其放置在存储桶的第一个位置。它的next
字段是指旧元素。
Deletion
删除
- Calculate hashcode for the given key
- Calculate bucket number
hash % (arrayLength-1)
- Get a reference to the first Entry object in the bucket and by means of equals method iterate over all entries in the given bucket. Eventually we will find the correct
Entry
. If a desired element is not found, returnnull
- 计算给定键的哈希码
- 计算桶数
hash % (arrayLength-1)
- 获取对存储桶中第一个 Entry 对象的引用,并通过 equals 方法迭代给定存储桶中的所有条目。最终我们会找到正确的
Entry
. 如果未找到所需元素,则返回null
回答by Tajinder Singh
import java.util.HashMap;
public class Students {
String name;
int age;
Students(String name, int age ){
this.name = name;
this.age=age;
}
@Override
public int hashCode() {
System.out.println("__hash__");
final int prime = 31;
int result = 1;
result = prime * result + age;
result = prime * result + ((name == null) ? 0 : name.hashCode());
return result;
}
@Override
public boolean equals(Object obj) {
System.out.println("__eq__");
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Students other = (Students) obj;
if (age != other.age)
return false;
if (name == null) {
if (other.name != null)
return false;
} else if (!name.equals(other.name))
return false;
return true;
}
public static void main(String[] args) {
Students S1 = new Students("taj",22);
Students S2 = new Students("taj",21);
System.out.println(S1.hashCode());
System.out.println(S2.hashCode());
HashMap<Students,String > HM = new HashMap<Students,String > ();
HM.put(S1, "tajinder");
HM.put(S2, "tajinder");
System.out.println(HM.size());
}
}
Output:
__ hash __
116232
__ hash __
116201
__ hash __
__ hash __
2
So here we see that if both the objects S1 and S2 have different content, then we are pretty sure that our overridden Hashcode method will generate different Hashcode(116232,11601) for both objects. NOW since there are different hash codes, so it won't even bother to call EQUALS method. Because a different Hashcode GUARANTEES DIFFERENT content in an object.
所以在这里我们看到,如果对象 S1 和 S2 具有不同的内容,那么我们很确定我们重写的 Hashcode 方法将为两个对象生成不同的 Hashcode(116232,11601)。现在因为有不同的哈希码,所以它甚至不会打扰调用 EQUALS 方法。因为不同的 Hashcode 保证了一个对象中的不同内容。
public static void main(String[] args) {
Students S1 = new Students("taj",21);
Students S2 = new Students("taj",21);
System.out.println(S1.hashCode());
System.out.println(S2.hashCode());
HashMap<Students,String > HM = new HashMap<Students,String > ();
HM.put(S1, "tajinder");
HM.put(S2, "tajinder");
System.out.println(HM.size());
}
}
Now lets change out main method a little bit. Output after this change is
__ hash __
116201
__ hash __
116201
__ hash __
__ hash __
__ eq __
1
We can clearly see that equal method is called. Here is print statement __eq__, since we have same hashcode, then content of objects MAY or MAY not be similar. So program internally calls Equal method to verify this.
Conclusion
If hashcode is different , equal method will not get called.
if hashcode is same, equal method will get called.
Thanks , hope it helps.
回答by Eric Wang
Here is a rough description of HashMap
's mechanism, for Java 8
version, (it might be slightly different from Java 6).
这里是对HashMap
的机制的粗略描述,对于Java 8
版本,(它可能与 Java 6 略有不同)。
Data structures
数据结构
- Hash table
Hash value is calculated viahash()
on key, and it decide which bucket of the hashtable to use for a given key. - Linked list(singly)
When count of elements in a bucket is small, a singly linked list is used. - Red-Black tree
When count of elements in a bucket is large, a red-black tree is used.
- 哈希表
哈希值是通过hash()
键计算的,它决定将哈希表的哪个桶用于给定的键。 - Linked list (singly)
当桶中元素的数量很少时,使用单向链表。 - 红黑树
当桶中的元素数量很大时,使用红黑树。
Classes (internal)
课程(内部)
Map.Entry
Represent a single entity in map, the key/value entity.HashMap.Node
Linked list version of node.It could represent:
- A hash bucket.
Because it has a hash property. - A node in singly linked list, (thus also head of linkedlist).
- A hash bucket.
HashMap.TreeNode
Tree version of node.
Map.Entry
表示映射中的单个实体,即键/值实体。HashMap.Node
节点的链表版本。它可以代表:
- 一个哈希桶。
因为它具有哈希属性。 - 单向链表中的一个节点,(因此也是链表的头)。
- 一个哈希桶。
HashMap.TreeNode
节点的树版本。
Fields (internal)
字段(内部)
Node[] table
The bucket table, (head of the linked lists).
If a bucket don't contains elements, then it's null, thus only take space of a reference.Set<Map.Entry> entrySet
Set of entities.int size
Number of entities.float loadFactor
Indicate how full the hash table is allowed, before resizing.int threshold
The next size at which to resize.
Formula:threshold = capacity * loadFactor
Node[] table
桶表,(链表的头部)。
如果一个桶不包含元素,那么它是空的,因此只占用引用的空间。Set<Map.Entry> entrySet
实体集。int size
实体数量。float loadFactor
指示在调整大小之前允许哈希表多满。int threshold
要调整大小的下一个大小。
公式:threshold = capacity * loadFactor
Methods (internal)
方法(内部)
int hash(key)
Calculate hash by key.How to map hash to bucket?
Use following logic:static int hashToBucket(int tableSize, int hash) { return (tableSize - 1) & hash; }
int hash(key)
按键计算哈希。如何将哈希映射到存储桶?
使用以下逻辑:static int hashToBucket(int tableSize, int hash) { return (tableSize - 1) & hash; }
About capacity
关于容量
In hash table, capacity means the bucket count, it could be get from table.length
.
Also could be calculated via threshold
and loadFactor
, thus no need to be defined as a class field.
在哈希表中,容量表示桶数,可以从table.length
.
也可以通过threshold
and计算loadFactor
,因此不需要定义为类字段。
Could get the effective capacity via: capacity()
可以通过以下方式获得有效容量: capacity()
Operations
操作
- Find entity by key.
First find the bucket by hash value, then loop linked list or search sorted tree. - Add entity with key.
First find the bucket according to hash value of key.
Then try find the value:- If found, replace the value.
- Otherwise, add a new node at beginning of linked list, or insert into sorted tree.
- Resize
Whenthreshold
reached, will double hashtable's capacity(table.length
), then perform a re-hash on all elements to rebuild the table.
This could be an expensive operation.
- 通过键查找实体。
先通过hash值找到bucket,然后循环链表或者搜索排序树。 - 使用键添加实体。
首先根据key的hash值找到bucket。
然后尝试找到值:- 如果找到,则替换该值。
- 否则,在链表的开头添加一个新节点,或插入到排序树中。
- 调整大小
当threshold
达到时,将哈希表的容量加倍(table.length
),然后对所有元素执行重新哈希以重建表。
这可能是一项昂贵的操作。
Performance
表现
- get & put
Time complexity isO(1)
, because:- Bucket is accessed via array index, thus
O(1)
. - Linked list in each bucket is of small length, thus could view as
O(1)
. - Tree size is also limited, because will extend capacity & re-hash when element count increase, so could view it as
O(1)
, notO(log N)
.
- Bucket is accessed via array index, thus
- get & put
时间复杂度为O(1)
,因为:- 通过数组索引访问存储桶,因此
O(1)
. - 每个桶中的链表长度很小,因此可以看作
O(1)
. - 树的大小也是有限的,因为当元素数量增加时会扩展容量并重新散列,因此可以将其视为
O(1)
,而不是O(log N)
。
- 通过数组索引访问存储桶,因此
回答by Eric Wang
Hash map works on the principle of hashing
哈希映射的工作原理是哈希的原理
HashMap get(Key k) method calls hashCode method on the key object and applies returned hashValue to its own static hash function to find a bucket location(backing array) where keys and values are stored in form of a nested class called Entry (Map.Entry) . So you have concluded that from the previous line that Both key and value is stored in the bucket as a form of Entry object . So thinking that Only value is stored in the bucket is not correct and will not give a good impression on the interviewer .
HashMap get(Key k) 方法调用键对象上的 hashCode 方法,并将返回的 hashValue 应用到它自己的静态哈希函数中,以找到一个存储桶位置(后备数组),其中键和值以嵌套类的形式存储,称为 Entry (Map.入口) 。因此,您已经从上一行得出结论,键和值都以 Entry 对象的形式存储在存储桶中。所以认为Only value is stored in the bucket是不正确的,不会给面试官留下好印象。
- Whenever we call get( Key k ) method on the HashMap object . First it checks that whether key is null or not . Note that there can only be one null key in HashMap .
- 每当我们在 HashMap 对象上调用 get(Key k) 方法时。首先它检查 key 是否为 null 。请注意 HashMap 中只能有一个空键。
If key is null , then Null keys always map to hash 0, thus index 0.
如果 key 为 null ,则 Null 键始终映射到哈希 0,因此索引 0。
If key is not null then , it will call hashfunction on the key object , see line 4 in above method i.e. key.hashCode() ,so after key.hashCode() returns hashValue , line 4 looks like
如果 key 不为 null ,它将在 key 对象上调用 hashfunction ,参见上面方法中的第 4 行,即 key.hashCode() ,所以在 key.hashCode() 返回 hashValue 后,第 4 行看起来像
int hash = hash(hashValue)
and now ,it applies returned hashValue into its own hashing function .
现在,它将返回的 hashValue 应用到它自己的散列函数中。
We might wonder why we are calculating the hashvalue again using hash(hashValue). Answer is It defends against poor quality hash functions.
我们可能想知道为什么我们要使用 hash(hashValue) 再次计算哈希值。答案是它可以抵御低质量的哈希函数。
Now final hashvalue is used to find the bucket location at which the Entry object is stored . Entry object stores in the bucket like this (hash,key,value,bucketindex)
现在最终的哈希值用于查找存储 Entry 对象的存储桶位置。条目对象像这样存储在存储桶中(哈希、键、值、存储桶索引)
回答by Premraj
Each Entry object represents key-value pair. Field next refers to other Entry object if a bucket has more than 1 Entry.
每个 Entry 对象代表键值对。如果一个桶有 1 个以上的条目,则字段 next 指的是其他条目对象。
Sometimes it might happen that hashCodes for 2 different objects are the same. In this case 2 objects will be saved in one bucket and will be presented as LinkedList. The entry point is more recently added object. This object refers to other object with next field and so one. Last entry refers to null. When you create HashMap with default constructor
有时可能会发生 2 个不同对象的 hashCode 相同的情况。在这种情况下,2 个对象将保存在一个存储桶中,并将显示为 LinkedList。入口点是最近添加的对象。此对象引用具有下一个字段的其他对象等。最后一个条目指的是 null。当您使用默认构造函数创建 HashMap 时
Array is gets created with size 16 and default 0.75 load balance.
创建的数组大小为 16,默认负载平衡为 0.75。