为什么 hashCode() 可以为 Java 中的不同对象返回相同的值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4360035/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 16:29:07  来源:igfitidea点击:

Why can hashCode() return the same value for different objects in Java?

javadata-structureshashhashcode

提问by Eugene

A quote from the book I'm reading Head First Java:

我正在阅读Head First Java一书中的引述:

The point is that hashcodes can be the same without necessarily guaranteeing that the objects are equal, because the "hashing algorithm" used in the hashCode()method might happen to return the same value for multiple objects.

关键是哈希码可以相同,而不必保证对象相等,因为hashCode()方法中使用的“哈希算法”可能恰好为多个对象返回相同的值。

Why might the hashCode()method return the same value for different objects? Does that not cause problems?

为什么该hashCode()方法可能为不同的对象返回相同的值?这不会引起问题吗?

采纳答案by Lukas Eder

hashingan object means "finding a good, descriptive value (number) that can be reproduced by the very same instance again and again". Because hash codes from Java's Object.hashCode()are of type int, you can only have 2^32different values. That's why you will have so-called "collisions" depending on the hashing algorithm, when two distinct Objects produce the same hashCode.

散列一个对象意味着“找到一个好的、描述性的值(数字),可以被同一个实例一次又一次地复制”。因为来自 Java 的哈希码Object.hashCode()是类型的int,所以你只能有2^32不同的值。这就是为什么当两个不同的对象产生相同的 hashCode 时,您会根据散列算法产生所谓的“冲突”。

Typically, this does not produce any problems, because hashCode()is mostly used together with equals(). For instance, a HashMapwill call hashCode()upon its keys, to know whether the keys may already be contained in the HashMap. If the HashMap does not find the hash code, it's obvious the key is not contained in the HashMap yet. But if it does, it will have to double-check all keys having that same hash code using equals().

通常,这不会产生任何问题,因为hashCode()主要与equals(). 例如,aHashMap将调用hashCode()其键,以了解键是否可能已包含在 HashMap 中。如果 HashMap 没有找到哈希码,很明显 HashMap 中还没有包含该键。但如果是这样,它将必须使用equals().

I.e.

IE

A.hashCode() == B.hashCode() // does not necessarily mean
A.equals(B)

But

A.equals(B) // means
A.hashCode() == B.hashCode()

If equals()and hashCode()are implemented correctly.

如果equals()hashCode()被正确实施。

For a more precise description of the general hashCodecontract, see the Javadoc.

有关一般hashCode合同的更准确描述,请参阅Javadoc

回答by Mark Byers

There are only just over 4 billion possible hashcodes (the range of an int) , but the number of objects you could choose to create is much larger. Therefore some objects must share the same hash code, by the pigeonhole principle.

只有超过 40 亿个可能的哈希码( a 的范围int),但您可以选择创建的对象数量要大得多。因此,根据鸽巢原理,某些对象必须共享相同的哈希码。

For example the number of possible strings containing 10 letters from A-Z is 26**10 which is 141167095653376. It is impossible to assign all of these strings a unique hash code. Nor is it important - the hash code doesn't need to be unique. It just needs to have not too many collisions for real data.

例如,包含来自 AZ 的 10 个字母的可能字符串的数量是 26**10,即 141167095653376。不可能为所有这些字符串分配一个唯一的哈希码。也不重要 - 哈希码不需要是唯一的。它只需要对真实数据没有太多的碰撞。

回答by Vishwanath

As I understand, work of hashcode method is to create buckets for hashing the elements, So that retrieval can be faster. If each object will return same value, there is no use of doing any hashing.

据我了解,hashcode 方法的工作是创建用于散列元素的桶,以便检索更快。如果每个对象都返回相同的值,则不需要进行任何散列。

回答by JLund

The hashCode() value can be used to quickly find an object by using the hash code as an address to a hash table bucket where it is stored.

hashCode() 值可用于通过使用哈希码作为存储它的哈希表存储桶的地址来快速查找对象。

If multiple objects return the same value from hashCode(), it means that they would be stored in the same bucket. If many objects are stored in the same bucket it means that on average it requires more comparison operations to look up a given object.

如果多个对象从 hashCode() 返回相同的值,则意味着它们将存储在同一个桶中。如果许多对象存储在同一个存储桶中,则意味着平均而言,它需要更多的比较操作来查找给定的对象。

Instead use equals() to compare two objects to see whether they are semantically equal.

而是使用 equals() 来比较两个对象以查看它们在语义上是否相等。

回答by Tundey

I have to think that's a pretty inefficient hashing algorithm for 2 objects to have the same hash code.

我不得不认为对于具有相同哈希码的 2 个对象来说,这是一种非常低效的哈希算法。

回答by Thomas

The idea of a hashtable is that you want to be able to realize a datastructure called a dictionary in an efficient way. A dictionary is a key/value store, i.e., you want to be able to store certain objects under a certain key and later on be able to retrieve them again using the same key.

哈希表的想法是您希望能够以有效的方式实现称为字典的数据结构。字典是键/值存储,即,您希望能够在某个键下存储某些对象,然后能够使用相同的键再次检索它们。

One of the most efficient ways to access values is to store them in an array. For instance, we could realize a dictionary that uses integers for keys and Strings for values like so:

访问值的最有效方法之一是将它们存储在数组中。例如,我们可以实现一个字典,它使用整数作为键,使用字符串作为值,如下所示:

String[] dictionary = new String[DICT_SIZE];
dictionary[15] = "Hello";
dictionary[121] = "world";

System.out.println(dictionary[15]); // prints "Hello"

Unfortunately, this approach is not very general at all: the index of an array has to be an integer value, but ideally we'd like to be able to use arbitrary kinds of objects for our keys, not only integers.

不幸的是,这种方法根本不是很通用:数组的索引必须是整数值,但理想情况下,我们希望能够对我们的键使用任意类型的对象,而不仅仅是整数。

Now, the way to solve this point is to have a way of mapping arbitrary objects to integer values which we could then use as keys for our array. In Java, that's what hashCode()does. So now, we could try to implement a String->String dictionary:

现在,解决这一点的方法是将任意对象映射到整数值,然后我们可以将其用作数组的键。在 Java 中,就是这样hashCode()做的。所以现在,我们可以尝试实现一个 String->String 字典:

String[] dictionary = new String[DICT_SIZE];
// "a" -> "Hello"
dictionary["a".hashCode()] = "Hello";

// "b" -> "world"
dictionary["b".hashCode()] = "world";

System.out.println(dictionary["b".hashCode()]); // prints world

But hey, what if there is some object which we'd like to use as a key, but its hashCodemethod returns a value that's greater than or equal to DICT_SIZE? Then we'd get an ArrayIndexOutOfBoundsException and that would be undesirable. So, let's just make it as big as we can, right?

但是,嘿,如果有一些我们想用作键的对象,但它的hashCode方法返回一个大于或等于的值DICT_SIZE怎么办?然后我们会得到一个 ArrayIndexOutOfBoundsException,这是不可取的。所以,我们就让它尽可能大吧?

public static final int DICT_SIZE = Integer.MAX_VALUE // Ooops!

But that would mean that we would have to allocate ginormeous amounts of memory for our array, even if we only intend to store a few items. So that can't be the best solution, and in fact we can do better. Let's assume we had a function hthat for any given DICT_SIZEmaps arbitrary integers into the range [0, DICT_SIZE[. Then we could just apply hto whatever the hashCode()method of a key object returns and be certain that we stay in the boundaries of the underlying array.

但这意味着我们必须为我们的数组分配大量的内存,即使我们只打算存储几个项目。所以这不是最好的解决方案,事实上我们可以做得更好。假设我们有一个函数h,可以DICT_SIZE将任意整数映射到范围内[0, DICT_SIZE[。然后我们可以应用h到任何hashCode()键对象返回的方法,并确保我们留在底层数组的边界内。

public static int h(int value, int DICT_SIZE) {
    // returns an integer >= 0 and < DICT_SIZE for every value.
}

That function is called a hash function. Now we can adapt our dictionary implementation to avoid the ArrayIndexOutOfBoundsException:

该函数称为散列函数。现在我们可以调整我们的字典实现来避免 ArrayIndexOutOfBoundsException:

// "a" -> "Hello"
dictionary[h("a".hashCode(), DICT_SIZE)] = "Hello"

// "b" -> "world"
dictionary[h("b".hashCode(), DICT_SIZE)] = "world"

But that introduces another problem: what if hmaps two different key indices to the same value? For instance:

但这引入了另一个问题:如果h将两个不同的键索引映射到相同的值会怎样?例如:

int keyA = h("a".hashCode(), DICT_SIZE);
int keyB = h("b".hashCode(), DICT_SIZE);

may yield the same values for keyAand keyB, and in that case we would accidentally overwrite a value in our array:

可以产生相同的值keyAkeyB,而在这种情况下,我们会小心覆盖数组中的值:

// "a" -> "Hello"
dictionary[keyA] = "Hello";

// "b" -> "world"
dictionary[keyB] = "world"; // DAMN! This overwrites "Hello"!!

System.out.println(dictionary[keyA]); // prints "world"

Well, you may say, then we just have to make sure that we implement hin such a way that this can never happen. Unfortunately, this isn't possible in general. Consider the following code:

好吧,你可能会说,那么我们只需要确保我们h以一种永远不会发生的方式实施。不幸的是,这通常是不可能的。考虑以下代码:

for (int i = 0; i <= DICT_SIZE; i++) {
    dictionary[h(i, DICT_SIZE)] = "dummy";
}

This loop stores DICT_SIZE + 1values (always the same value, actually, namely the String "dummy") in the dictionary. Mhh, but the array can only store DICT_SIZEdifferent entries! That means, when we use h, we would overwrite (at least) one entry. Or in other words, hwill map two different keys to the same value! These "collisions" can't be avoided: if n pigeons try to go into n-1 pigeon holes, at least two of them have to go into the same hole.

这个循环DICT_SIZE + 1在字典中存储值(总是相同的值,实际上,即字符串“虚拟”)。嗯,但是数组只能存储DICT_SIZE不同的条目!这意味着,当我们使用 时h,我们将覆盖(至少)一个条目。或者换句话说,h将两个不同的键映射到同一个值!这些“碰撞”是无法避免的:如果n只鸽子试图进入n-1个鸽子洞,那么至少有两只必须进入同一个洞。

But what we can do is to extend our implementation so that the array can store multiple values under the same index. This can easily be done by using lists. So instead of using:

但是我们可以做的是扩展我们的实现,以便数组可以在同一索引下存储多个值。这可以通过使用列表轻松完成。所以不要使用:

String[] dictionary = new String[DICT_SIZE];

we write:

我们写:

List<String>[] dictionary = new List<String>[DICT_SIZE];

(Side remark: note that Java doesn't allow the creation of arrays of generic types, so the above line wouldn't compile -- but you get the idea).

(旁注:请注意,Java 不允许创建泛型类型的数组,因此上面的行不会编译——但你明白了)。

That will change the access to the dictionary as follows:

这将改变对字典的访问,如下所示:

// "a" -> "Hello"
dictionary[h("a".hashCode(), DICT_SIZE)].add("Hello");

// "b" -> "world"
dictionary[h("b".hashCode(), DICT_SIZE)].add("world");

In case our hashfunction hreturns different values for all our keys, this will result in lists with only one element each, and retrieving elements is really simple:

如果我们的哈希h函数为我们所有的键返回不同的值,这将导致每个列表只有一个元素,并且检索元素非常简单:

System.out.println(dictionary[h("a".hashCode(), DICT_SIZE)].get(0)); // "Hello"

But we already know that in general hwill map different keys to the same integer sometimes. In these cases, the lists will contain more than one value. For retrieval, we have to go through the whole list to find the "correct" value, but how would we recognize it?

但是我们已经知道,通常h有时会将不同的键映射到同一个整数。在这些情况下,列表将包含多个值。为了检索,我们必须遍历整个列表才能找到“正确”的值,但是我们如何识别它呢?

Well, instead of storing the value alone, we could always store the complete (key,value) pair in the lists. Then lookup would be performed in two steps:

好吧,不是单独存储值,我们总是可以在列表中存储完整的(键,值)对。然后查找将分两步执行:

  1. Apply the hashfunction to retrieve the correct list from the array.
  2. Iterate through all pairs stored in the retrieved list: if the pair with the desired key is found, return the value from the pair.
  1. 应用哈希函数从数组中检索正确的列表。
  2. 遍历存储在检索列表中的所有对:如果找到具有所需键的对,则返回该对中的值。

Now adding and retrieving have become so complex that it's not indecent to treat ourselves separate methods for these operations:

现在添加和检索变得如此复杂,将这些操作的单独方法视为不雅:

List<Pair<String,String>>[] dictionary = List<Pair<String,String>>[DICT_SIZE];

public void put(String key, String value) {
    int hashCode = key.hashCode();
    int arrayIndex = h(hashCode, DICT_SIZE);

    List<Pair<String,String>> listAtIndex = dictionary[arrayIndex];
    if (listAtIndex == null) {
        listAtIndex = new LinkedList<Pair<Integer,String>>();
        dictionary[arrayIndex] = listAtIndex;
    }

    for (Pair<String,String> previouslyAdded : listAtIndex) {
        if (previouslyAdded.getValue().equals(value)) {
            return; // the value is already in the dictionary;
        }
    }

    listAtIndex.add(new Pair<String,String>(key, value));
}

public String get(String key) {
    int hashCode = key.hashCode();
    int arrayIndex = h(hashCode, DICT_SIZE);

    List<Pair<String,String>> listAtIndex = dictionary[arrayIndex];
    if (listAtIndex != null) {
        for (Pair<String,String> previouslyAdded : listAtIndex) {
            if (previouslyAdded.getKey().equals(key)) {
                return previouslyAdded.getValue(); // entry found!
            }
        }
    }

    // entry not found
    return null;
}

So, in order for this approach to work, we actually need two comparison operations: the hashCode method to find the list in the array (this works fast if hashCode()and hare both fast) and an equalsmethod which we need when going through the list.

因此,为了使这种方法起作用,我们实际上需要两个比较操作:在数组中查找列表的 hashCode 方法(如果hashCode()并且h两者都很快,则该equals方法很快)和我们在遍历列表时需要的方法。

This is the general idea of hashing, and you will recognize the putand getmethod from java.util.Map.Of course, the above implementation is an oversimplification, but it should illustrate the gist of it all.

这是散列的一般思想,您会从putget方法中认出java.util.Map.当然,上面的实现过于简单化,但它应该说明了这一切的要点。

Naturally, this approach is not limited to Strings, it works for all kinds of objects, since the methods hashCode()and equalsare members of the top-level class java.lang.Object and all other classes inherit from that one.

自然,这种方法不限于字符串,它适用于所有类型的对象,因为方法hashCode()equals是顶级类 java.lang.Object 的成员,所有其他类都继承自该类。

As you can see, it doesn't really matter if two distinct objects return the same value in their hashCode()method: the above approach will always work! But still it is desirable that they return different values to lower the chances for hash collisions produced by h. We have seen that these can't be avoided 100% in general, but the less collisions we get, the more efficient our hashtable becomes. In the worst case, all keys map to the same array index: in that case, all pairs are stored in a single list and finding a value will then become an operation with costs linear in the size of the hashtable.

如您所见,两个不同的对象是否在其hashCode()方法中返回相同的值并不重要:上述方法将始终有效!但是仍然希望它们返回不同的值以降低h. 我们已经看到,这些通常无法 100% 避免,但是我们得到的冲突越少,我们的哈希表就越有效。在最坏的情况下,所有键都映射到相同的数组索引:在这种情况下,所有对都存储在一个列表中,然后找到一个值将成为一个成本与哈希表大小成线性关系的操作。