当许多键具有相同的哈希码时,Java 8 的 HashMap 如何退化为平衡树?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30164087/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How does Java 8's HashMap degenerate to balanced trees when many keys have the same hash code?
提问by Thennam
How does Java 8's HashMap degenerate to balanced trees when many keys have the same hash code? I read that keys should implement Comparable
to define an ordering. How does HashMap combine hashing and natural ordering to implement the trees? What about classes that do not implement Comparable
, or when multiple, non-mutually-comparable Comparable
implementations are keys in the same map?
当许多键具有相同的哈希码时,Java 8 的 HashMap 如何退化为平衡树?我读到应该实现键Comparable
来定义排序。HashMap 如何结合散列和自然排序来实现树?没有实现的类Comparable
,或者当多个不可相互比较的Comparable
实现是同一个映射中的键时呢?
回答by Jeffrey Bosboom
The implementation notes commentin HashMap is a better description of HashMap's operation than I could write myself. The relevant parts for understanding the tree nodes and their ordering are:
HashMap 中的实现注释注释比我自己写的更好地描述了 HashMap 的操作。理解树节点及其排序的相关部分是:
This map usually acts as a binned (bucketed) hash table, but when bins get too large, they are transformed into bins of TreeNodes, each structured similarly to those in java.util.TreeMap. [...] Bins of TreeNodes may be traversed and used like any others, but additionally support faster lookup when overpopulated. [...]
Tree bins (i.e., bins whose elements are all TreeNodes) are ordered primarily by hashCode, but in the case of ties, if two elements are of the same "class C implements Comparable" type then their compareTo method is used for ordering. (We conservatively check generic types via reflection to validate this -- see method comparableClassFor). The added complexity of tree bins is worthwhile in providing worst-case O(log n) operations when keys either have distinct hashes or are orderable, Thus, performance degrades gracefully under accidental or malicious usages in which hashCode() methods return values that are poorly distributed, as well as those in which many keys share a hashCode, so long as they are also Comparable. (If neither of these apply, we may waste about a factor of two in time and space compared to taking no precautions. But the only known cases stem from poor user programming practices that are already so slow that this makes little difference.)
该映射通常用作分箱(分桶)哈希表,但是当分箱变得太大时,它们会被转换为 TreeNode 的分箱,每个分箱的结构类似于 java.util.TreeMap 中的分箱。[...] TreeNodes 的 bins 可以像任何其他 bins 一样被遍历和使用,但另外支持在人口过多时更快的查找。[...]
树箱(即,其元素都是 TreeNode 的箱)主要按 hashCode 排序,但在关系的情况下,如果两个元素属于相同的“C 类实现 Comparable”类型,则使用它们的 compareTo 方法进行排序。(我们通过反射保守地检查泛型类型以验证这一点——请参阅方法compareClassFor)。当键具有不同的散列或可排序时,树箱的增加的复杂性值得提供最坏情况的 O(log n) 操作,因此,在 hashCode() 方法返回的值很差的意外或恶意使用下,性能会优雅地降低分布式,以及其中许多键共享一个 hashCode 的那些,只要它们也是 Comparable 的。(如果这些都不适用,与不采取预防措施相比,我们可能会在时间和空间上浪费大约两倍。但唯一已知的情况源于糟糕的用户编程实践,这些实践已经很慢,这几乎没有什么区别。)
When two objects have equal hash codes but are not mutually comparable, method tieBreakOrder
is invoked to break the tie, first by string comparison on getClass().getName()
(!), then by comparing System.identityHashCode
.
当两个对象具有相同的哈希码但不能相互比较时,tieBreakOrder
调用方法来打破平局,首先通过getClass().getName()
(!)上的字符串比较,然后通过比较System.identityHashCode
。
The actual tree building starts in treeifyBin
, beginning when a bin reaches TREEIFY_THRESHOLD
(currently 8), assuming the hash table has at least MIN_TREEIFY_CAPACITY
capacity (currently 64). It's a mostly-normal red-black tree implementation (crediting CLR), with some complications to support traversal in the same way as hash bins (e.g., removeTreeNode
).
实际的树构建开始于treeifyBin
,从 bin 到达TREEIFY_THRESHOLD
(当前为 8)开始,假设哈希表至少具有MIN_TREEIFY_CAPACITY
容量(当前为 64)。它是一个大多数正常的红黑树实现(归功于 CLR),具有一些复杂性,以与散列箱(例如,removeTreeNode
)相同的方式支持遍历。
回答by Frédéric Dumont
Read the code. It is mostly a red-black tree.
It does not actually require the implementation of Comparable
, but can use it if available (see for instance the find method)
它实际上并不需要 的实现Comparable
,但可以在可用时使用它(例如参见find 方法)
回答by Jordi Castilla
HashMap
has it's own hash method that applies a supplemental 2 bit lenght hash to the objects inside in order to avoid this problems:
HashMap
有它自己的散列方法,该方法将补充的 2 位长度散列应用于内部的对象,以避免出现此问题:
Applies a supplemental hash function to a given hashCode, which defends against poor quality hash functions. This is critical because HashMap uses power-of-two length hash tables, that otherwise encounter collisions for hashCodes that do not differ in lower bits. Note: Null keys always map to hash 0, thus index 0.
将补充散列函数应用于给定的 hashCode,以防止质量较差的散列函数。这很关键,因为HashMap 使用长度为 2 的幂的哈希表,否则会遇到低位没有差异的 hashCode 的冲突。注意:空键总是映射到哈希 0,因此索引 0。
If you want to see how it's done, take a look is inside the source of the HashMap class.
如果您想了解它是如何完成的,请查看HashMap 类的源代码。
static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}