Java中HashSet.contains()的时间复杂度表现是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25247854/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the time complexity performance of HashSet.contains() in Java?
提问by ktm5124
I'm tempted to think that the HashSet.contains(Object)method performs in constant time. It simply gets the hash code of an object and then looks it up in a hash table.
我很想认为HashSet.contains(Object)方法在恒定时间内执行。它只是获取对象的哈希码,然后在哈希表中查找。
First, could someone please confirm whether this is true?
首先,请有人确认这是否属实?
Second, if it is true, is there any risk of collisions, where two objects might have the same hash code and thus the HashSet thinks it has both when it only has one?
其次,如果它是真的,是否有任何冲突的风险,其中两个对象可能具有相同的哈希码,因此 HashSet 认为它只有一个时才拥有两者?
采纳答案by Eran
It runs in O(1)
expected time, as any hash table (assuming the hash function is decent). It is backed by a HashMap
where the key is the Object.
它在O(1)
预期的时间内运行,就像任何哈希表一样(假设哈希函数是不错的)。它由一个支持,HashMap
其中键是对象。
Two objects might have the same hash code, but the HashSet
wouldn't think they are identical, unless the equals
method for these objects says they are the same (i.e. returns true).
两个对象可能具有相同的哈希码,但HashSet
不会认为它们是相同的,除非equals
这些对象的方法说它们是相同的(即返回 true)。
The contains
method calls (indirectly) getEntry
of HashMap
, where key is the Object
for which you wish to know if it's in the HashSet
.
该contains
方法调用(间接)getEntry
of HashMap
,其中 key 是Object
您希望知道它是否在HashSet
.
As you can see below, two objects can be stored in the HashMap
/HashSet
even if their key is mapped to the same value by the hash function. The method iterates over all keys that have the same hash value, and performs equals
on each one to find the matching key.
正如你在下面看到的,两个对象可以存储在HashMap
/ 中,HashSet
即使它们的键被哈希函数映射到相同的值。该方法迭代具有相同哈希值的所有键,并equals
在每个键上执行以找到匹配的键。
final Entry<K,V> getEntry(Object key) {
int hash = (key == null) ? 0 : hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
}
return null;
}
回答by Daniel Valland
The worst-case performance of contains will be O(log n) for Java 8, and O(n) for Java 7, but average case closer to O(1). This is because the hashset is backed by a hashmap, and thus has the same efficiency as hashmap lookup (ie, HashMap.get(...)). The actual mapping in a hashmap is constant time (O(1)), but the need to handle collisions brings the cost to log n. That is, multiple elements that hash to the same array index must be stored in a secondary data structure (aka bucket), and it is this bucket that determines the worst-case performance. In Java, hashmap collision-handling is implemented using a self-balanced tree.
contains 的最坏情况性能对于 Java 8 为 O(log n),对于 Java 7 为 O(n),但平均情况更接近 O(1)。这是因为 hashset 由 hashmap 支持,因此具有与 hashmap 查找相同的效率(即 HashMap.get(...))。hashmap 中的实际映射是常数时间 (O(1)),但是处理冲突的需要带来了 log n 的成本。也就是说,散列到相同数组索引的多个元素必须存储在二级数据结构(又名存储桶)中,而正是这个存储桶决定了最坏情况下的性能。在 Java 中,hashmap 冲突处理是使用自平衡树实现的。
Self-balanced trees guarantee O(log n) for all operations, hence, insertion and lookup in hashmap (and hashset) has a total cost of O(1) + O(log n) = O(log n). The use of a self-balanced tree for collision-handling was introduced in Java 8 as an improvement over chaining (used until java 7), which uses a linked-list, and has a worst case of O(n) for lookup and insertion (as it needs to traverse the list). Notice that chaining would have constant time for insertion (as opposed to lookup), since elements can be added to a linked list in O(1), but the set property (no duplicates) is imposed on the linked-list in the case of hashmap, and it thus need to traverse the linked-list also in the case of insertion to ensure that the element does not already exist in the list/bucket, and we end up with O(n) for both insertion and lookup.
自平衡树保证所有操作的 O(log n),因此,在 hashmap(和 hashset)中插入和查找的总成本为 O(1) + O(log n) = O(log n)。在 Java 8 中引入了用于冲突处理的自平衡树的使用,作为对使用链表的链接(直到 Java 7 一直使用)的改进,并且对于查找和插入具有 O(n) 的最坏情况(因为它需要遍历列表)。请注意,链接将具有恒定的插入时间(与查找相反),因为可以在 O(1) 中将元素添加到链表中,但在hashmap,因此在插入的情况下也需要遍历链表,以确保该元素不存在于列表/存储桶中,并且我们以 O(n) 结束插入和查找。
References:
参考:
This class implements the Set interface, backed by a hash table (actually a HashMap instance). https://docs.oracle.com/javase/8/docs/api/java/util/HashSet.html
Buckets containing a large number of colliding keys will store their entries in a balanced tree instead of a linked list after certain threshold is reached. (https://www.nagarro.com/en/blog/post/24/performance-improvement-for-hashmap-in-java-8)
这个类实现了 Set 接口,由一个哈希表(实际上是一个 HashMap 实例)支持。 https://docs.oracle.com/javase/8/docs/api/java/util/HashSet.html
包含大量冲突键的桶在达到某个阈值后将其条目存储在平衡树中而不是链表中。( https://www.nagarro.com/en/blog/post/24/performance-improvement-for-hashmap-in-java-8)