Java HashMap 如何处理具有相同哈希码的不同对象?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6493605/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-16 06:22:42  来源:igfitidea点击:

How does a Java HashMap handle different objects with the same hash code?

javahashmaphashcodehash-function

提问by akshay

As per my understanding I think:

根据我的理解,我认为:

  1. It is perfectly legal for two objects to have the same hashcode.
  2. If two objects are equal (using the equals() method) then they have the same hashcode.
  3. If two objects are not equal then they cannot have the same hashcode
  1. 两个对象具有相同的哈希码是完全合法的。
  2. 如果两个对象相等(使用 equals() 方法),则它们具有相同的哈希码。
  3. 如果两个对象不相等,则它们不能具有相同的哈希码

Am I correct?

我对么?

Now if am correct, I have the following question: The HashMapinternally uses the hashcode of the object. So if two objects can have the same hashcode, then how can the HashMaptrack which key it uses?

现在,如果正确,我有以下问题:HashMap内部使用对象的哈希码。因此,如果两个对象可以具有相同的哈希码,那么如何HashMap跟踪它使用的键呢?

Can someone explain how the HashMapinternally uses the hashcode of the object?

有人可以解释HashMap内部如何使用对象的哈希码吗?

采纳答案by Jesper

A hashmap works like this (this is a little bit simplified, but it illustrates the basic mechanism):

hashmap 的工作原理是这样的(这有点简化,但它说明了基本机制):

It has a number of "buckets" which it uses to store key-value pairs in. Each bucket has a unique number - that's what identifies the bucket. When you put a key-value pair into the map, the hashmap will look at the hash code of the key, and store the pair in the bucket of which the identifier is the hash code of the key. For example: The hash code of the key is 235 -> the pair is stored in bucket number 235. (Note that one bucket can store more then one key-value pair).

它有许多用于存储键值对的“存储桶”。每个存储桶都有一个唯一的编号——这就是标识存储桶的原因。当您将键值对放入映射中时,哈希映射会查看键的哈希码,并将该对存储在标识符为键的哈希码的桶中。例如:key的hash码是235->这对存储在桶号235中。(注意一个桶可以存储多个key-value对)。

When you lookup a value in the hashmap, by giving it a key, it will first look at the hash code of the key that you gave. The hashmap will then look into the corresponding bucket, and then it will compare the key that you gave with the keys of all pairs in the bucket, by comparing them with equals().

当你在 hashmap 中查找一个值时,通过给它一个键,它会首先查看你给的键的哈希码。然后哈希映射会查看相应的存储桶,然后将您提供的密钥与存储桶中所有对的密钥进行比较,将它们与equals().

Now you can see how this is very efficient for looking up key-value pairs in a map: by the hash code of the key the hashmap immediately knows in which bucket to look, so that it only has to test against what's in that bucket.

现在您可以看到这对于在映射中查找键值对非常有效:通过键的哈希码,哈希映射立即知道要查找哪个存储桶,因此它只需要针对该存储桶中的内容进行测试。

Looking at the above mechanism, you can also see what requirements are necessary on the hashCode()and equals()methods of keys:

看上面的机制,也可以看出对keyshashCode()equals()methods有什么要求:

  • If two keys are the same (equals()returns truewhen you compare them), their hashCode()method must return the same number. If keys violate this, then keys that are equal might be stored in different buckets, and the hashmap would not be able to find key-value pairs (because it's going to look in the same bucket).

  • If two keys are different, then it doesn't matter if their hash codes are the same or not. They will be stored in the same bucket if their hash codes are the same, and in this case, the hashmap will use equals()to tell them apart.

  • 如果两个键相同(比较时equals()返回true),则它们的hashCode()方法必须返回相同的数字。如果键违反了这一点,那么相等的键可能会存储在不同的存储桶中,并且哈希图将无法找到键值对(因为它将在同一个存储桶中查找)。

  • 如果两个键不同,那么它们的哈希码是否相同并不重要。如果它们的哈希码相同,它们将被存储在同一个桶中,在这种情况下,哈希图将用于equals()区分它们。

回答by Pace

The hashcode determines which bucket for the hashmap to check. If there is more than one object in the bucket then a linear search is done to find which item in the bucket equals the desired item (using the equals()) method.

哈希码确定要检查哈希图的哪个桶。如果桶中有多个对象,则执行线性搜索以查找桶中的哪一项等于所需的项目(使用equals())方法。

In other words, if you have a perfect hashcode then hashmap access is constant, you will never have to iterate through a bucket (technically you would also have to have MAX_INT buckets, the Java implementation may share a few hash codes in the same bucket to cut down on space requirements). If you have the worst hashcode (always returns the same number) then your hashmap access becomes linear since you have to search through every item in the map (they're all in the same bucket) to get what you want.

换句话说,如果你有一个完美的哈希码,那么哈希映射访问是恒定的,你永远不必遍历一个桶(从技术上讲,你还必须有 MAX_INT 个桶,Java 实现可能会在同一个桶中共享一些哈希码以减少空间需求)。如果您有最差的哈希码(始终返回相同的数字),那么您的哈希图访问将变为线性,因为您必须搜索地图中的每个项目(它们都在同一个存储桶中)才能得到您想要的。

Most of the time a well written hashcode isn't perfect but is unique enough to give you more or less constant access.

大多数情况下,编写良好的哈希码并不完美,但足够独特,可以让您或多或少地获得持续访问。

回答by Jon Skeet

Your third assertion is incorrect.

你的第三个断言是不正确的。

It's perfectly legal for two unequal objects to have the same hash code. It's used by HashMapas a "first pass filter" so that the map can quickly find possibleentries with the specified key. The keys with the same hash code are then tested for equality with the specified key.

两个不相等的对象具有相同的哈希码是完全合法的。它被HashMap用作“第一遍过滤器”,以便地图可以使用指定的键快速找到可能的条目。然后测试具有相同哈希码的键是否与指定键相等。

You wouldn't want a requirement that two unequal objects couldn't have the same hash code, as otherwise that would limit you to 232possible objects. (It would also mean that different types couldn't even use an object's fields to generate hash codes, as other classes could generate the same hash.)

您不希望要求两个不相等的对象不能具有相同的哈希码,否则这会将您限制为 2 32 个可能的对象。(这也意味着不同类型甚至不能使用对象的字段来生成哈希码,因为其他类可以生成相同的哈希。)

回答by Leif Wickland

You're mistaken on point three. Two entries can have the same hash code but not be equal. Take a look at the implementation of HashMap.get from the OpenJdk. You can see that it checks that the hashes are equal and the keys are equal. Were point three true, then it would be unnecessary to check that the keys are equal. The hash code is compared before the key because the former is a more efficient comparison.

你在第三点上错了。两个条目可以具有相同的哈希码,但不能相等。从 OpenJdk看一下HashMap.get 的实现。您可以看到它检查散列是否相等且键是否相等。如果第三点为真,那么就没有必要检查键是否相等。哈希码在键之前进行比较,因为前者是一种更有效的比较。

If you're interested in learning a little more about this, take a look at the Wikipedia article on Open Addressing collision resolution, which I believe is the mechanism that the OpenJdk implementation uses. That mechanism is subtly different than the "bucket" approach one of the other answers mentions.

如果您有兴趣了解更多相关信息,请查看维基百科关于开放寻址冲突解决的文章,我相信这是 OpenJdk 实现使用的机制。该机制与其他答案中提到的“桶”方法略有不同。

回答by Abhijit Gaikwad

You can find excellent information at http://javarevisited.blogspot.com/2011/02/how-hashmap-works-in-java.html

您可以在http://javarevisited.blogspot.com/2011/02/how-hashmap-works-in-java.html找到优秀的信息

To Summarize:

总结:

HashMap works on the principle of hashing

HashMap 的工作原理是散列

put(key, value):HashMap stores both key and value object as Map.Entry. Hashmap applies hashcode(key) to get the bucket. if there is collision ,HashMap uses LinkedList to store object.

put(key, value):HashMap 将键和值对象存储为 Map.Entry。Hashmap 应用 hashcode(key) 来获取桶。如果有碰撞,HashMap 使用 LinkedList 来存储对象。

get(key):HashMap uses Key Object's hashcode to find out bucket location and then call keys.equals() method to identify correct node in LinkedList and return associated value object for that key in Java HashMap.

get(key):HashMap 使用 Key Object 的 hashcode 找出桶的位置,然后调用 keys.equals() 方法来识别 LinkedList 中的正确节点,并返回 Java HashMap 中该键的关联值对象。

回答by Sergii Shevchyk

HashMap structure diagram

HashMap 结构图

HashMapis an array of Entryobjects.

HashMap是一个Entry对象数组。

Consider HashMapas just an array of objects.

考虑HashMap为只是对象的数组。

Have a look at what this Objectis:

看看这Object是什么:

static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        final int hash;
… 
}

Each Entryobject represents a key-value pair. The field nextrefers to another Entryobject if a bucket has more than one Entry.

每个Entry对象代表一个键值对。如果存储桶有多个 ,则该字段next引用另一个Entry对象Entry

Sometimes it might happen that hash codes for 2 different objects are the same. In this case, two objects will be saved in one bucket and will be presented as a linked list. The entry point is the more recently added object. This object refers to another object with the nextfield and so on. The last entry refers to null.

有时可能会发生 2 个不同对象的哈希码相同的情况。在这种情况下,两个对象将保存在一个存储桶中,并将显示为一个链表。入口点是最近添加的对象。此对象引用具有next字段等的另一个对象。最后一个条目指的是null

When you create a HashMapwith the default constructor

当您HashMap使用默认构造函数创建 a时

HashMap hashMap = new HashMap();

The array is created with size 16 and default 0.75 load balance.

创建的数组大小为 16,默认负载平衡为 0.75。

Adding a new key-value pair

添加新的键值对

  1. Calculate hashcode for the key
  2. Calculate position hash % (arrayLength-1)where element should be placed (bucket number)
  3. If you try to add a value with a key which has already been saved in HashMap, then value gets overwritten.
  4. Otherwise element is added to the bucket.
  1. 计算密钥的哈希码
  2. 计算hash % (arrayLength-1)应放置元素的位置(桶号)
  3. 如果您尝试使用已保存在 中的键添加值HashMap,则值将被覆盖。
  4. 否则元素被添加到桶中。

If the bucket already has at least one element, a new one gets added and placed in the first position of the bucket. Its nextfield refers to the old element.

如果存储桶已经有至少一个元素,则会添加一个新元素并将其放置在存储桶的第一个位置。它的next字段是指旧元素。

Deletion

删除

  1. Calculate hashcode for the given key
  2. Calculate bucket number hash % (arrayLength-1)
  3. Get a reference to the first Entry object in the bucket and by means of equals method iterate over all entries in the given bucket. Eventually we will find the correct Entry. If a desired element is not found, return null
  1. 计算给定键的哈希码
  2. 计算桶数 hash % (arrayLength-1)
  3. 获取对存储桶中第一个 Entry 对象的引用,并通过 equals 方法迭代给定存储桶中的所有条目。最终我们会找到正确的Entry. 如果未找到所需元素,则返回null

回答by Tajinder Singh

import java.util.HashMap;

public class Students  {
    String name;
    int age;

    Students(String name, int age ){
        this.name = name;
        this.age=age;
    }

    @Override
    public int hashCode() {
        System.out.println("__hash__");
        final int prime = 31;
        int result = 1;
        result = prime * result + age;
        result = prime * result + ((name == null) ? 0 : name.hashCode());
        return result;
    }

    @Override
    public boolean equals(Object obj) {
        System.out.println("__eq__");
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        Students other = (Students) obj;
        if (age != other.age)
            return false;
        if (name == null) {
            if (other.name != null)
                return false;
        } else if (!name.equals(other.name))
            return false;
        return true;
    }

    public static void main(String[] args) {

        Students S1 = new Students("taj",22);
        Students S2 = new Students("taj",21);

        System.out.println(S1.hashCode());
        System.out.println(S2.hashCode());

        HashMap<Students,String > HM = new HashMap<Students,String > (); 
        HM.put(S1, "tajinder");
        HM.put(S2, "tajinder");
        System.out.println(HM.size());
    }
}

Output:

__ hash __

116232

__ hash __

116201

__ hash __

__ hash __

2

So here we see that if both the objects S1 and S2 have different content, then we are pretty sure that our overridden Hashcode method will generate different Hashcode(116232,11601) for both objects. NOW since there are different hash codes, so it won't even bother to call EQUALS method. Because a different Hashcode GUARANTEES DIFFERENT content in an object.

所以在这里我们看到,如果对象 S1 和 S2 具有不同的内容,那么我们很确定我们重写的 Hashcode 方法将为两个对象生成不同的 Hashcode(116232,11601)。现在因为有不同的哈希码,所以它甚至不会打扰调用 EQUALS 方法。因为不同的 Hashcode 保证了一个对象中的不同内容。

    public static void main(String[] args) {

        Students S1 = new Students("taj",21);
        Students S2 = new Students("taj",21);

        System.out.println(S1.hashCode());
        System.out.println(S2.hashCode());

        HashMap<Students,String > HM = new HashMap<Students,String > (); 
        HM.put(S1, "tajinder");
        HM.put(S2, "tajinder");
        System.out.println(HM.size());
    }
}

Now lets change out main method a little bit. Output after this change is 

__ hash __

116201

__ hash __

116201

__ hash __

__ hash __

__ eq __

1
We can clearly see that equal method is called. Here is print statement __eq__, since we have same hashcode, then content of objects MAY or MAY not be similar. So program internally  calls Equal method to verify this. 


Conclusion 
If hashcode is different , equal method will not get called. 
if hashcode is same, equal method will get called.

Thanks , hope it helps. 

回答by Eric Wang

Here is a rough description of HashMap's mechanism, for Java 8version, (it might be slightly different from Java 6).

这里是对HashMap的机制的粗略描述,对于Java 8版本,(它可能与 Java 6 略有不同)



Data structures

数据结构

  • Hash table
    Hash value is calculated via hash()on key, and it decide which bucket of the hashtable to use for a given key.
  • Linked list(singly)
    When count of elements in a bucket is small, a singly linked list is used.
  • Red-Black tree
    When count of elements in a bucket is large, a red-black tree is used.
  • 哈希表
    哈希值是通过hash()键计算的,它决定将哈希表的哪个桶用于给定的键。
  • Linked list (singly)
    当桶中元素的数量很少时,使用单向链表。
  • 红黑树
    当桶中的元素数量很大时,使用红黑树。


Classes (internal)

课程(内部)

  • Map.Entry
    Represent a single entity in map, the key/value entity.
  • HashMap.Node
    Linked list version of node.

    It could represent:

    • A hash bucket.
      Because it has a hash property.
    • A node in singly linked list, (thus also head of linkedlist).
  • HashMap.TreeNode
    Tree version of node.
  • Map.Entry
    表示映射中的单个实体,即键/值实体。
  • HashMap.Node
    节点的链表版本。

    它可以代表:

    • 一个哈希桶。
      因为它具有哈希属性。
    • 单向链表中的一个节点,(因此也是链表的头)
  • HashMap.TreeNode
    节点的树版本。


Fields (internal)

字段(内部)

  • Node[] table
    The bucket table, (head of the linked lists).
    If a bucket don't contains elements, then it's null, thus only take space of a reference.
  • Set<Map.Entry> entrySetSet of entities.
  • int size
    Number of entities.
  • float loadFactor
    Indicate how full the hash table is allowed, before resizing.
  • int threshold
    The next size at which to resize.
    Formula: threshold = capacity * loadFactor
  • Node[] table
    桶表,(链表的头部)。
    如果一个桶不包含元素,那么它是空的,因此只占用引用的空间。
  • Set<Map.Entry> entrySet实体集。
  • int size
    实体数量。
  • float loadFactor
    指示在调整大小之前允许哈希表多满。
  • int threshold
    要调整大小的下一个大小。
    公式:threshold = capacity * loadFactor


Methods (internal)

方法(内部)

  • int hash(key)
    Calculate hash by key.
  • How to map hash to bucket?
    Use following logic:

    static int hashToBucket(int tableSize, int hash) {
        return (tableSize - 1) & hash;
    }
    
  • int hash(key)
    按键计算哈希。
  • 如何将哈希映射到存储桶?
    使用以下逻辑:

    static int hashToBucket(int tableSize, int hash) {
        return (tableSize - 1) & hash;
    }
    


About capacity

关于容量

In hash table, capacity means the bucket count, it could be get from table.length.
Also could be calculated via thresholdand loadFactor, thus no need to be defined as a class field.

在哈希表中,容量表示桶数,可以从table.length.
也可以通过thresholdand计算loadFactor,因此不需要定义为类字段。

Could get the effective capacity via: capacity()

可以通过以下方式获得有效容量: capacity()



Operations

操作

  • Find entity by key.
    First find the bucket by hash value, then loop linked list or search sorted tree.
  • Add entity with key.
    First find the bucket according to hash value of key.
    Then try find the value:
    • If found, replace the value.
    • Otherwise, add a new node at beginning of linked list, or insert into sorted tree.
  • Resize
    When thresholdreached, will double hashtable's capacity(table.length), then perform a re-hash on all elements to rebuild the table.
    This could be an expensive operation.
  • 通过键查找实体。
    先通过hash值找到bucket,然后循环链表或者搜索排序树。
  • 使用键添加实体。
    首先根据key的hash值找到bucket。
    然后尝试找到值:
    • 如果找到,则替换该值。
    • 否则,在链表的开头添加一个新节点,或插入到排序树中。
  • 调整大小
    threshold达到时,将哈希表的容量加倍(table.length),然后对所有元素执行重新哈希以重建表。
    这可能是一项昂贵的操作。


Performance

表现

  • get & put
    Time complexity is O(1), because:
    • Bucket is accessed via array index, thus O(1).
    • Linked list in each bucket is of small length, thus could view as O(1).
    • Tree size is also limited, because will extend capacity & re-hash when element count increase, so could view it as O(1), not O(log N).
  • get & put
    时间复杂度为O(1),因为:
    • 通过数组索引访问存储桶,因此O(1).
    • 每个桶中的链表长度很小,因此可以看作O(1).
    • 树的大小也是有限的,因为当元素数量增加时会扩展容量并重新散列,因此可以将其视为O(1),而不是O(log N)

回答by Eric Wang

Hash map works on the principle of hashing

哈希映射的工作原理是哈希的原理

HashMap get(Key k) method calls hashCode method on the key object and applies returned hashValue to its own static hash function to find a bucket location(backing array) where keys and values are stored in form of a nested class called Entry (Map.Entry) . So you have concluded that from the previous line that Both key and value is stored in the bucket as a form of Entry object . So thinking that Only value is stored in the bucket is not correct and will not give a good impression on the interviewer .

HashMap get(Key k) 方法调用键对象上的 hashCode 方法,并将返回的 hashValue 应用到它自己的静态哈希函数中,以找到一个存储桶位置(后备数组),其中键和值以嵌套类的形式存储,称为 Entry (Map.入口) 。因此,您已经从上一行得出结论,键和值都以 Entry 对象的形式存储在存储桶中。所以认为Only value is stored in the bucket是不正确的,不会给面试官留下好印象。

  • Whenever we call get( Key k ) method on the HashMap object . First it checks that whether key is null or not . Note that there can only be one null key in HashMap .
  • 每当我们在 HashMap 对象上调用 get(Key k) 方法时。首先它检查 key 是否为 null 。请注意 HashMap 中只能有一个空键。

If key is null , then Null keys always map to hash 0, thus index 0.

如果 key 为 null ,则 Null 键始终映射到哈希 0,因此索引 0。

If key is not null then , it will call hashfunction on the key object , see line 4 in above method i.e. key.hashCode() ,so after key.hashCode() returns hashValue , line 4 looks like

如果 key 不为 null ,它将在 key 对象上调用 hashfunction ,参见上面方法中的第 4 行,即 key.hashCode() ,所以在 key.hashCode() 返回 hashValue 后,第 4 行看起来像

            int hash = hash(hashValue)

and now ,it applies returned hashValue into its own hashing function .

现在,它将返回的 hashValue 应用到它自己的散列函数中。

We might wonder why we are calculating the hashvalue again using hash(hashValue). Answer is It defends against poor quality hash functions.

我们可能想知道为什么我们要使用 hash(hashValue) 再次计算哈希值。答案是它可以抵御低质量的哈希函数。

Now final hashvalue is used to find the bucket location at which the Entry object is stored . Entry object stores in the bucket like this (hash,key,value,bucketindex)

现在最终的哈希值用于查找存储 Entry 对象的存储桶位置。条目对象像这样存储在存储桶中(哈希、键、值、存储桶索引)

回答by Premraj

Each Entry object represents key-value pair. Field next refers to other Entry object if a bucket has more than 1 Entry.

每个 Entry 对象代表键值对。如果一个桶有 1 个以上的条目,则字段 next 指的是其他条目对象。

Sometimes it might happen that hashCodes for 2 different objects are the same. In this case 2 objects will be saved in one bucket and will be presented as LinkedList. The entry point is more recently added object. This object refers to other object with next field and so one. Last entry refers to null. When you create HashMap with default constructor

有时可能会发生 2 个不同对象的 hashCode 相同的情况。在这种情况下,2 个对象将保存在一个存储桶中,并将显示为 LinkedList。入口点是最近添加的对象。此对象引用具有下一个字段的其他对象等。最后一个条目指的是 null。当您使用默认构造函数创建 HashMap 时

Array is gets created with size 16 and default 0.75 load balance.

创建的数组大小为 16,默认负载平衡为 0.75。

enter image description here

在此处输入图片说明

(Source)

(来源)