C++ - unordered_map 复杂性
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15470948/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
c++ - unordered_map complexity
提问by user1764386
I need to create a lookup function where a (X,Y) pair corresponds to a specific Z value. One major requirement for this is that I need to do it in as close to O(1) complexity as I can. My plan is to use an unordered_map.
我需要创建一个查找函数,其中 (X,Y) 对对应于特定的 Z 值。对此的一个主要要求是,我需要尽可能以接近 O(1) 的复杂度来完成。我的计划是使用 unordered_map。
I generally do not use a hash table for lookup, as the lookup time has never been important to me. Am I correct in thinking that as long as I built the unordered_map with no collisions, my lookup time will be O(1)?
我通常不使用哈希表进行查找,因为查找时间对我来说从来都不重要。我认为只要我构建了没有冲突的 unordered_map,我的查找时间就会是 O(1) 吗?
My concern then is what the complexity becomes if there the key is not present in the unordered map. If I use unordered_map::find():, for example, to determine whether a key is present in my hash table, how will it go about giving me an answer? Does it actually iterate over all the keys?
然后我担心的是,如果无序映射中不存在密钥,那么复杂性会怎样。例如,如果我使用 unordered_map::find(): 来确定一个键是否存在于我的哈希表中,它将如何给出答案?它真的遍历所有的键吗?
I greatly appreciate the help.
我非常感谢您的帮助。
采纳答案by James Kanze
The standard more or less requires using buckets for collision resolution, which means that the actual look up time will probably be linear with respect to the number of elements in the bucket, regardless of whether the element is present or not. It's possible to make it O(lg N), but it's not usually done, because the number of elements in the bucket shouldbe small, if the hash table is being used correctly.
标准或多或少需要使用桶来解决冲突,这意味着实际查找时间可能与桶中元素的数量呈线性关系,无论元素是否存在。可以将其设为 O(lg N),但通常不会这样做,因为如果正确使用了哈希表,则存储桶中的元素数量应该很少。
To ensure that the number of elements in a bucket is small, you
must ensure that the hashing function is effective. What
effective means depends on the types and values being hashed.
(The MS implementation uses FNV, which is one of the best
generic hashs around, but if you have special knowledge of the
actual data you'll be seeing, you might be able to do better.)
Another thing which can help reduce the number of elements per
bucket is to force more buckets or use a smaller load factor.
For the first, you can pass the minimum initial number of
buckets as an argument to the constructor. If you know the
total number of elements that will be in the map, you can
control the load factor this way. You can also forse a minumum
number of buckets once the table has been filled, by calling
rehash
. Otherwise, there is a function
std::unordered_map<>::max_load_factor
which you can use. It
is not guaranteed to do anything, but in any reasonable
implementation, it will. Note that if you use it on an already
filled unordered_map
, you'll probably have to call
unordered_map<>::rehash
afterwards.
为了保证一个bucket中的元素个数少,必须保证hash函数有效。什么有效手段取决于被散列的类型和值。(MS 实现使用 FNV,这是最好的通用散列之一,但如果您对将要看到的实际数据有特殊了解,您可能会做得更好。)另一件事可以帮助减少数量每个桶的元素数是强制使用更多桶或使用较小的负载因子。首先,您可以将最小初始桶数作为参数传递给构造函数。如果您知道地图中的元素总数,您可以通过这种方式控制负载因子。一旦表被填满,您还可以通过调用来确定最小数量的存储桶
rehash
。否则,有一个函数
std::unordered_map<>::max_load_factor
你可以使用。它不能保证做任何事情,但在任何合理的实现中,它都会。请注意,如果您在已填充的 上使用它,则unordered_map
可能必须在unordered_map<>::rehash
之后调用
。
(There are several things I don't understand about the standard
unordered_map: why the load factor is a float
, instead of
double
; why it's not required to have an effect; and why it
doesn't automatically call rehash
for you.)
(关于标准 unordered_map 有几件事我不明白:为什么加载因子是float
, 而不是
double
;为什么不需要它产生效果;为什么它不会自动调用rehash
你。)
回答by AndyG
As with any hash table, worst case is always linear complexity (Edit: if you built the map without any collisions like you stated in your original post, then you'll never see this case):
与任何哈希表一样,最坏的情况始终是线性复杂度(编辑:如果您构建的地图没有任何冲突,就像您在原始帖子中所述,那么您将永远不会看到这种情况):
http://www.cplusplus.com/reference/unordered_map/unordered_map/find/
http://www.cplusplus.com/reference/unordered_map/unordered_map/find/
ComplexityAverage case: constant. Worst case: linear in container size.
Return ValueAn iterator to the element, if the specified key value is found, or unordered_map::end if the specified key is not found in the container.
复杂性平均情况:恒定。最坏的情况:容器大小呈线性。
返回值元素的迭代器(如果找到指定的键值)或 unordered_map::end 如果在容器中找不到指定的键。
However, because an unordered_map can only contain unique keys, you will see average complexity of constant time (container first checks hash index, and then iterates over values at that index).
但是,因为 unordered_map 只能包含唯一键,您将看到常数时间的平均复杂度(容器首先检查哈希索引,然后迭代该索引处的值)。
I think the documentation for unordered_map::countfunction is more informative:
我认为unordered_map::count函数的文档提供了更多信息:
Searches the container for elements whose key is k and returns the number of elements found. Because unordered_map containers do not allow for duplicate keys, this means that the function actually returns 1 if an element with that key exists in the container, and zero otherwise.
在容器中搜索键为 k 的元素并返回找到的元素数。因为 unordered_map 容器不允许重复键,这意味着如果容器中存在具有该键的元素,则该函数实际上返回 1,否则返回 0。
回答by Yuushi
To have no collisions in a hashed data structure is incredibly difficult (if not impossible for a given hash function and any kind of data). It would also require a table size exactly equal to the number of keys. No, it does not need to be that strict. As long as the hash function distributes the values in a relatively uniform way, you will have O(1)
lookup complexity.
在散列数据结构中没有冲突是非常困难的(如果对于给定的散列函数和任何类型的数据来说不是不可能的话)。它还需要一个与键数完全相等的表大小。不,它不需要那么严格。只要散列函数以相对统一的方式分配值,您就会具有O(1)
查找复杂性。
Hash tables are generally just arrays with linked lists taking care of the collisions (this is the chaining method - there are other methods, but this is likely the most utilized way of dealing with collisions). Thus, to find if a value is contained within a bucket, it will have to (potentially) iterate over all the values in that bucket. So if the hash function gives you a uniform distribution, and there are N
buckets, and a total of M
values, there should be (on average) M/N
values per bucket. As long as this value is not too large, this allows O(1)
lookup.
哈希表通常只是带有处理冲突的链表的数组(这是链接方法 - 还有其他方法,但这可能是最常用的处理冲突的方法)。因此,要查找某个值是否包含在存储桶中,它必须(可能)遍历该存储桶中的所有值。因此,如果散列函数为您提供均匀分布,并且有N
桶和M
值的总数,则M/N
每个桶应该(平均)有值。只要这个值不是太大,就允许O(1)
查找。
So, as a bit of a long winded answer to your question, as long as the hashing function is reasonable, you will get O(1)
lookup, with it having to iterate over (on average) O(M/N)
keys to give you a "negative" result.
因此,作为对您的问题的冗长回答,只要散列函数是合理的,您就会进行O(1)
查找,它必须迭代(平均)O(M/N)
键以给您一个“否定”结果。