java.util.HashMap 和 HashSet 的内部实现

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1781868/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 22:35:33  来源:igfitidea点击:

Internal implementation of java.util.HashMap and HashSet

javahashmaphashcodehashsetlanguage-implementation

提问by peakit

I have been trying to understand the internal implementation of java.util.HashMapand java.util.HashSet.

我一直在尝试了解java.util.HashMapand的内部实现java.util.HashSet

Following are the doubts popping in my mind for a while:

以下是我脑海中浮现的疑惑:

  1. Whats is the importance of the @Override public int hashcode()in a HashMap/HashSet? Where is this hash code used internally?
  2. I have generally seen the key of the HashMap be a Stringlike myMap<String,Object>. Can I map the values against someObject(instead of String) like myMap<someObject, Object>? What all contracts do I need to obey for this happen successfully?
  1. @Override public int hashcode()HashMap/HashSet 中的重要性是什么?内部使用的这个哈希码在哪里?
  2. 我通常看到 HashMap 的键是一个Stringlike myMap<String,Object>。我可以将值映射到someObject(而不是字符串)myMap<someObject, Object>吗?我需要遵守哪些合同才能成功实现?

Thanks in advance !

提前致谢 !

EDIT:

编辑:

  1. Are we saying that the hash code of the key (check!) is the actual thing against which the value is mapped in the hash table? And when we do myMap.get(someKey);java is internally calling someKey.hashCode()to get the number in the Hash table to be looked for the resulting value?
  1. 我们是说键的哈希码(检查!)是哈希表中映射到的值的实际值吗?而当我们在做myMap.get(someKey);java内部调用的时候someKey.hashCode(),获取Hash表中的数字来寻找结果值呢?

Answer:Yes.

回答:是的。

EDIT 2:

编辑2:

  1. In a java.util.HashSet, from where is the key generated for the Hash table? Is it from the object that we are adding eg. mySet.add(myObject);then myObject.hashCode()is going to decide where this is placed in the hash table? (as we don't give keys in a HashSet).
  1. 在a中java.util.HashSet,Hash表的key是从哪里生成的?是否来自我们正在添加的对象,例如。mySet.add(myObject);然后myObject.hashCode()决定将它放在哈希表中的哪个位置?(因为我们不在 HashSet 中提供键)。

Answer:The object added becomes the key. The value is dummy!

答:添加的对象成为key。价值是假的!

采纳答案by Andreas Dolk

The answer to question 2 is easy - yes you can use any Object you like. Maps that have String type keys are widely used because they are typical data structures for naming services. But in general, you can map any two types like Map<Car,Vendor>or Map<Student,Course>.

问题 2 的答案很简单 - 是的,您可以使用任何您喜欢的对象。具有 String 类型键的映射被广泛使用,因为它们是命名服务的典型数据结构。但一般来说,您可以映射任何两种类型,例如Map<Car,Vendor>Map<Student,Course>

For the hashcode() method it's like answered before - whenever you override equals(), then you have to override hashcode() to obey the contract. On the other hand, if you're happy with the standard implementation of equals(), then you shouldn't touch hashcode() (because that could break the contract and result in identical hashcodes for unequal objects).

对于 hashcode() 方法,它就像之前回答的一样 - 每当您覆盖 equals() 时,您就必须覆盖 hashcode() 以遵守合同。另一方面,如果您对 equals() 的标准实现感到满意,那么您不应该触及 hashcode()(因为这可能会破坏合同并导致不相等对象的相同哈希码)。

Practical sidenote: eclipse (and probably other IDEs as well) can auto generate a pair of equals() and hashcode() implementation for your class, just based on the class members.

实用的旁注:eclipse(可能还有其他 IDE)可以为您的类自动生成一对 equals() 和 hashcode() 实现,仅基于类成员。

Edit

编辑

For your additional question: yes, exactly. Look at the source code for HashMap.get(Object key); it calls key.hashcode to calculate the position (bin) in the internal hashtable and returns the value at that position (if there is one).

对于您的其他问题:是的,正是。查看HashMap.get(Object key)的源码;它调用 key.hashcode 来计算内部哈希表中的位置(bin)并返回该位置的值(如果有)。

But be careful with 'handmade' hashcode/equals methods - if you use an object as a key, make sure that the hashcode doesn't change afterwards, otherwise you won't find the mapped values anymore. In other words, the fields you use to calculate equals and hashcode should be final(or 'unchangeable' after creation of the object).

但是要小心“手工制作”的哈希码/等于方法 - 如果您使用对象作为键,请确保之后哈希码不会更改,否则您将找不到映射的值。换句话说,用于计算 equals 和 hashcode 的字段应该是最终的(或在创建对象后“不可更改”)。

Assume, we have a contact with String nameand String phonenumberand we use both fields to calculate equals() and hashcode(). Now we create "John Doe" with his mobile phone number and map him to his favorite Donut shop. hashcode() is used to calculate the index (bin) in the hash table and that's where the donut shop is stored.

假设,我们与String nameand有联系,String phonenumber我们使用这两个字段来计算 equals() 和 hashcode()。现在我们用他的手机号码创建“John Doe”并将他映射到他最喜欢的甜甜圈店。hashcode() 用于计算哈希表中的索引 (bin),这就是存储甜甜圈店的位置。

Now we learn that he has a new phone number and we change the phone number field of the John Doe object. This results in a new hashcode. And this hashcode resolves to a new hash table index - which usually isn't the position where John Does' favorite Donut shop was stored.

现在我们知道他有一个新的电话号码,我们更改了 John Doe 对象的电话号码字段。这会产生一个新的哈希码。并且这个哈希码解析为一个新的哈希表索引——这通常不是约翰·杜斯最喜欢的甜甜圈店的存储位置。

The problem is clear: In this case we wanted to map "John Doe" to the Donut shop, and not "John Doe with a specific phone number". So, we have to be careful with autogenerated equals/hashcode to make sure they're what we really want, because they might use unwanted fields, introducing trouble with HashMaps and HashSets.

问题很明显:在这种情况下,我们想将“John Doe”映射到甜甜圈店,而不是“John Doe with a specific phone number”。因此,我们必须小心使用自动生成的 equals/hashcode 以确保它们是我们真正想要的,因为它们可能使用不需要的字段,给 HashMaps 和 HashSets 带来麻烦。

Edit 2

编辑 2

If you add an object to a HashSet, the Object is the key for the internal hash table, the value is set but unused (just a static instance of Object). Here's the implementation from the openjdk 6 (b17):

如果将对象添加到 HashSet,则 Object 是内部哈希表的键,该值已设置但未使用(只是 Object 的静态实例)。这是来自 openjdk 6 (b17) 的实现:

// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
private transient HashMap<E,Object> map;

public boolean add(E e) {
  return map.put(e, PRESENT)==null;
}

回答by Aaron Digulla

Whats is the importance of the @Override public int hashcode() in a HashMap/HashSet?

HashMap/HashSet 中@Override public int hashcode() 的重要性是什么?

This allows the instance of the map to produce a useful hash code depending on the content of the map. Two maps with the same content will produce the same hash code. If the content is different, the hash code will be different.

这允许地图实例根据地图的内容生成有用的哈希码。具有相同内容的两个映射将产生相同的哈希码。如果内容不同,哈希码也会不同。

Where is this hash code used internally?

内部使用的这个哈希码在哪里?

Never. This code only exists so you can use a map as a key in another map.

绝不。此代码仅存在以便您可以将地图用作另一个地图中的键。

Can I map the values against someObject(instead of String) like myMap<someObject, Object>?

我可以将值映射到someObject(而不是String)像myMap<someObject, Object>吗?

Yes but someObjectmust be a class, not an object (your name suggests that you want to pass in object; it should be SomeObjectto make it clear you're referring to the type).

是的,但someObject必须是一个类,而不是一个对象(你的名字暗示你想传入对象;应该SomeObject明确你指的是类型)。

What all contracts do I need to obey for this happen successfully?

我需要遵守哪些合同才能成功实现?

The class must implement hashCode()and equals().

该类必须实现hashCode()equals()

[EDIT]

[编辑]

Are we saying that the hash code of the key (check!) is the actual thing against which the value is mapped in the hash table?

我们是说键的哈希码(检查!)是哈希表中映射到的值的实际值吗?

Yes.

是的。

回答by Joey

There is a intricate relationship between equals(), hashcode()and hash tables in general in Java (and .NET too, for that matter). To quote from the documentation:

hashcode()在 Java 中,equals()和哈希表之间存在复杂的关系(就此而言,.NET 也是如此)。从文档中引用:

public int hashCode()

Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.

The general contract of hashCode is:

  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  • It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCodemethod on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.

As much as is reasonably practical, the hashCodemethod defined by class Objectdoes return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java? programming language.)

public int hashCode()

返回对象的哈希码值。支持此方法是为了哈希表,例如由java.util.Hashtable.

hashCode 的总合约为:

  • 在 Java 应用程序执行期间,只要在同一个对象上多次调用它,hashCode 方法必须始终返回相同的整数,前提是在对象的 equals 比较中使用的信息没有被修改。该整数不需要从应用程序的一次执行到同一应用程序的另一次执行保持一致。
  • 如果根据 equals(Object) 方法两个对象相等,则对这两个对象中的每一个调用 hashCode 方法必须产生相同的整数结果。
  • 如果根据 equals( java.lang.Object) 方法两个对象不相等,则不需要对这两个对象中的hashCode每一个调用该方法必须产生不同的整数结果。但是,程序员应该意识到为不相等的对象生成不同的整数结果可能会提高哈希表的性能。

尽管合理实用,hashCode类定义的方法Object确实为不同的对象返回不同的整数。(这通常是通过将对象的内部地址转换为整数来实现的,但 Java? 编程语言不需要这种实现技术。)

The line

线

@Overrides public int hashCode()

just tells that the hashCode()method is overridden. This ia usuallya sign that it's safe to use the type as key in a HashMap.

只是告诉该hashCode()方法已被覆盖。这通常表明可以安全地使用该类型作为HashMap.

And yes, you can aesily use any object which obeys the contract for equals()and hashCode()in a HashMapas key.

是的,您可以轻松地使用任何符合aequals()hashCode()a合同的对象HashMap作为键。

回答by Thomas

  1. Any Objectin Java must have a hashCode()method; HashMapand HashSetare no execeptions. This hash code is used if you insert the hash map/set into another hash map/set.
  2. Any class type can be used as the key in a HashMap/HashSet. This requires that the hashCode()method returns equal values for equal objects, and that the equals()method is implemented according to contract (reflexive, transitive, symmetric). The default implementations from Objectalready obey these contracts, but you may want to override them if you want value equality instead of reference equality.
  1. ObjectJava中的任何一个都必须有一个hashCode()方法;HashMap并且HashSet没有例外。如果您将散列映射/集插入另一个散列映射/集,则使用此散列代码。
  2. 任何类类型可被用作在键HashMap/ HashSet。这要求该hashCode()方法为相等的对象返回相等的值,并且该equals()方法是根据契约(自反的、传递的、对称的)实现的。默认实现Object已经遵守这些契约,但是如果您想要值相等而不是引用相等,您可能需要覆盖它们。

回答by Varun

Yes. You can use any object as the key in a HashMap. In order to do so following are the steps you have to follow.

是的。您可以使用任何对象作为 HashMap 中的键。为此,您必须遵循以下步骤。

  1. Override equals.

  2. Override hashCode.

  1. 覆盖等于。

  2. 覆盖哈希码。

The contracts for both the methods are very clearly mentioned in documentation of java.lang.Object. http://java.sun.com/javase/6/docs/api/java/lang/Object.html

java.lang.Object 的文档中非常清楚地提到了这两种方法的契约。http://java.sun.com/javase/6/docs/api/java/lang/Object.html

And yes hashCode() method is used internally by HashMap and hence returning proper value is important for performance.

是的 hashCode() 方法由 HashMap 在内部使用,因此返回正确的值对性能很重要。

Here is the hashCode() method from HashMap

这是来自 HashMap 的 hashCode() 方法

public V put(K key, V value) {
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key.hashCode());
    int i = indexFor(hash, table.length);
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    addEntry(hash, key, value, i);
    return null;
}

It is clear from the above code that hashCode of each key is not just used for hashCode() of the map, but also for finding the bucket to place the key,value pair. That is why hashCode() is related to performance of the HashMap

从上面的代码可以看出,每个key的hashCode不仅仅用于map的hashCode(),还用于寻找放置key,value对的bucket。这就是 hashCode() 与 HashMap 的性能有关的原因

回答by GaryF

Aaron Digulla is absolutely correct. An interesting additional note that people don't seem to realise is that the key object's hashCode() method is not used verbatim. It is, in fact, rehashed by the HashMap i.e. it calls hash(someKey.hashCode)), where hash()is an internal hashing method.

Aaron Digulla 是完全正确的。人们似乎没有意识到的一个有趣的附加说明是关键对象的 hashCode() 方法没有逐字使用。事实上,它是由 HashMap 重新散列的,即它调用hash(someKey.hashCode)),其中hash()是一个内部散列方法。

To see this, have a look at the source: http://kickjava.com/src/java/util/HashMap.java.htm

要看到这一点,请查看源:http: //kickjava.com/src/java/util/HashMap.java.htm

The reason for this is that some people implement hashCode() poorly and the hash() function gives a better hash distribution. It's basically done for performance reasons.

这样做的原因是有些人对 hashCode() 的实现很差,而 hash() 函数提供了更好的散列分布。它基本上是出于性能原因完成的。

回答by sateesh

In answer to question 2, though you can have any class that can be used to as the key in Hashmap, the best practice is to use immutable classes as keys for the HashMap. Or at the least if your "hashCode", and "equals" implementation are dependent on some of the attributes of your class then you should take care that you don't provide methods to alter these attributes.

在回答问题 2 时,尽管您可以将任何类用作 Hashmap 的键,但最佳实践是使用不可变类作为 HashMap 的键。或者至少如果您的“hashCode”和“equals”实现依赖于类的某些属性,那么您应该注意不要提供更改这些属性的方法。

回答by Tendayi Mawushe

Hashing containers like HashMapand HashSetprovide fast access to elements stored in them by splitting their contents into "buckets".

散列容器喜欢HashMapHashSet通过将它们的内容拆分为“桶”来提供对存储在其中的元素的快速访问。

For example the list of numbers: 1, 2, 3, 4, 5, 6, 7, 8stored in a Listwould look (conceptually) in memory something like: [1, 2, 3, 4, 5, 6, 7, 8].

例如,数字列表:1, 2, 3, 4, 5, 6, 7, 8存储在 a 中List(概念上)在内存中看起来像:[1, 2, 3, 4, 5, 6, 7, 8]

Storing the same set of numbers in a Setwould look more like this: [1, 2] [3, 4] [5, 6] [7, 8]. In this example the list has been split into 4 buckets.

在存储同一组数字Set看起来会是这样的:[1, 2] [3, 4] [5, 6] [7, 8]。在这个例子中,列表被分成了 4 个桶。

Now imagine you want to find the value 6out of both the Listand the Set. With a list you would have to start at the beginning of the list and check each value until you get to 6, this will take 6 steps. With a set you find the correct bucket, the check each of the items in that bucket (only 2 in our example) making this a 3 step process. The value of this approach increases dramatically the more data you have.

现在想象一下,你要查找的值6了两者的ListSet。对于列表,您必须从列表的开头开始检查每个值,直到达到 6,这将需要 6 个步骤。通过一个集合,您可以找到正确的存储桶,检查该存储桶中的每个项目(在我们的示例中只有 2 个),使其成为一个 3 步过程。您拥有的数据越多,这种方法的价值就会显着增加。

But wait how did we know which bucket to look in? That is where the hashCodemethod comes in. To determine the bucket in which to look for an item Java hashing containers call hashCodethen apply some function to the result. This function tries to balance the numbers of buckets and the number of items for the fastest lookup possible.

但是等等,我们怎么知道要查看哪个桶呢?这就是hashCode方法的用武之地。为了确定要在其中查找 Java 散列容器调用的项目的存储桶,hashCode然后对结果应用一些函数。此函数尝试平衡存储桶的数量和项目的数量,以实现最快的查找。

During lookup once the correct bucket has been found each item in that bucket is compared one at a time as in a list. That is why when you override hashCodeyou must also override equals. So if an object of any type has both an equalsand a hashCodemethod it can be used as a key in a Mapor an entry in a Set. There is a contract that must be followed to implement these methods correctly the canonical text on this is from Josh Bloch's great book Effective Java: Item 8: Always override hashCode when you override equals

在查找过程中,一旦找到了正确的存储桶,该存储桶中的每个项目都会像在列表中一样一次比较一个。这就是为什么当您覆盖时,您hashCode还必须覆盖equals. 因此,如果任何类型的对象同时具有 anequalshashCode方法,则它可以用作 a 中的键Map或a 中的条目Set。必须遵循一个约定才能正确实现这些方法,关于此的规范文本来自 Josh Bloch 的好书 Effective Java:第 8 条:当您覆盖 equals 时始终覆盖 hashCode

回答by LearnerJava

HashCode method for collection classes like HashSet, HashTable, HashMap etc – Hash code returns integer number for the object that is being supported for the purpose of hashing. It is implemented by converting internal address of the object into an integer. Hash code method should be overridden in every class that overrides equals method. Three general contact for HashCode method

HashSet、HashTable、HashMap 等集合类的 HashCode 方法——哈希码返回支持散列的对象的整数。它是通过将对象的内部地址转换为整数来实现的。哈希码方法应该在每个覆盖 equals 方法的类中被覆盖。HashCode方法的三个一般联系方式

  • For two equal objects acc. to equal method, then calling HashCode for both object it should produce same integer value.

  • If it is being called several times for a single object, then it should return constant integer value.

  • For two unequal objects acc. to equal method, then calling HashCode method for both object, it is not mandatory that it should produce distinct value.

  • 对于两个相等的对象 acc。到equal方法,然后为两个对象调用HashCode它应该产生相同的整数值。

  • 如果为单个对象多次调用它,则它应该返回常量整数值。

  • 对于两个不相等的对象 acc。到相等方法,然后为两个对象调用 HashCode 方法,它不是强制要求它应该产生不同的值。