C++ STL unordered_map 如何解决冲突?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21518704/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 23:41:01  来源:igfitidea点击:

How does C++ STL unordered_map resolve collisions?

c++stlunordered-map

提问by whiteSkar

How does C++ STL unordered_map resolve collisions?

C++ STL unordered_map 如何解决冲突?

Looking at the http://www.cplusplus.com/reference/unordered_map/unordered_map/, it says "Unique keys No two elements in the container can have equivalent keys."

查看http://www.cplusplus.com/reference/unordered_map/unordered_map/,它说“唯一键容器中没有两个元素可以具有等效键。”

That should mean that the container is indeed resolving collisions. However, that page does not tell me how it is doing it. I know some ways to resolve collisions like using linked lists and/or probing. What I want to know is how the c++ STL unordered_map is resolving it.

这应该意味着容器确实正在解决冲突。但是,该页面并没有告诉我它是如何做的。我知道一些解决冲突的方法,比如使用链表和/或探测。我想知道的是 c++ STL unordered_map 是如何解决它的。

回答by Jerry Coffin

The standard defines a little more about this than most people seem to realize.

该标准对此的定义比大多数人似乎意识到的要多一些。

Specifically, the standard requires (§23.2.5/9):

具体来说,该标准要求(第 23.2.5/9 节):

The elements of an unordered associative container are organized into buckets. Keys with the same hash code appear in the same bucket.

无序关联容器的元素被组织成桶。具有相同哈希码的键出现在同一个桶中。

The interface includes a bucket_countthat runs in constant time. (table 103). It also includes a bucket_sizethat has to run in time linear on the size of the bucket.

该界面包括一个bucket_count以恒定时间运行的。(表103)。它还包括一个bucket_size必须与存储桶大小成线性关系的时间。

That's basically describing an implementation that uses collision chaining. When you do use collision chaining, meeting all the requirements is somewhere between easy and trivial. bucket_count()is the number of elements in your array. bucket_size()is the number of elements in the collision chain. Getting them in constant and linear time respectively is simple and straightforward.

这基本上描述了一个使用碰撞链的实现。当您确实使用碰撞链时,满足所有要求介于简单和琐碎之间。bucket_count()是数组中的元素数。bucket_size()是碰撞链中元素的数量。分别在常数和线性时间内获取它们是简单明了的。

By contrast, if you use something like linear probing or double hashing, those requirements become all but impossible to meet. Specifically, all the items that hashed to a specific value need to land in the same bucket, and you need to be able to count those buckets in constant time.

相比之下,如果您使用诸如线性探测或双哈希之类的方法,这些要求几乎不可能满足。具体来说,散列到特定值的所有项目都需要落在同一个桶中,并且您需要能够在恒定时间内对这些桶进行计数。

But, if you use something like linear probing or double hashing, finding all the items that hashed to the same value means you need to hash the value, then walk through the "chain" of non-empty items in your table to find how many of those hashed to the same value. That's not linear on the number of items that hashed to the same value though--it's linear on the number of items that hashed to the same ora colliding value.

但是,如果您使用诸如线性探测或双重散列之类的方法,找到散列为相同值的所有项目意味着您需要对值进行散列,然后遍历表中非空项目的“链”以找出有多少散列到相同值的那些。不过,这与散列为相同值的项目数量不是线性的——它与散列为相同冲突值的项目数量呈线性关系。

With enough extra work and a fair amount of stretching the meaning of some of the requirements almost to the breaking point, it might be barely possible to create a hash table using something other than collision chaining, and still at least sort of meet the requirements--but I'm not really certain it's possible, and it would certain involve quite a lot of extra work.

有了足够的额外工作和相当数量的将某些要求的含义延伸到几乎崩溃点,可能几乎不可能使用碰撞链以外的其他东西创建哈希表,并且仍然至少可以满足要求 - - 但我不确定这是否可能,而且肯定会涉及很多额外的工作。

Summary: all practical implementations of std::unordered_set(or unordered_map) undoubtedly use collision chaining. While it might be (just barely) possible to meet the requirements using linear probing or double hashing, such an implementation seems to lose a great deal and gain nearly nothing in return.

总结:std::unordered_set(or unordered_map) 的所有实际实现无疑都使用了碰撞链。虽然使用线性探测或双散列可能(只是勉强)满足要求,但这样的实现似乎损失了很多并且几乎没有得到任何回报。

回答by Enigma22134

I found this answer looking for how to detect when my types are colliding, so I will post this in case that is the intent of the question.:

我发现这个答案是为了寻找如何检测我的类型何时发生碰撞,所以我会发布这个以防这是问题的意图。:

I believe there's some misconception about "Unique keys No two elements in the container can have equivalent keys."

我相信对“唯一键容器中没有两个元素可以具有等效键”存在一些误解。

look at the code below

看看下面的代码

//pseudocode
std::unordered_map<int, char> hashmap;
hashmap[5] = 'a';
hashmap[5] = 'b'; //replace 'a' with 'b', there is no collision being handled.

I think the Jerry's answer is referring to the internal system that it uses to shrink keys to appropriate array indices.

我认为 Jerry 的答案是指它用来将键缩小到适当数组索引的内部系统。

If you want collisions to be handled for your types (with buckets), you need std::unordered_multimapand will have to iterate over

如果您希望为您的类型(使用桶)处理冲突,您需要std::unordered_multimap并且必须迭代

Hopefully this code can be read without the context I generated it with. it basically checks to see if any element in the bucket associated with the hash is the element I'm looking for.

希望这段代码可以在没有我生成它的上下文的情况下阅读。它基本上检查与散列关联的存储桶中的任何元素是否是我正在寻找的元素。

//sp is std::shared_ptr
//memo is std::unordered_multimap< int, sp<AStarNode> >

//there's probably multiple issues with this code in terms of good design (like using int keys rather than unsigned)

bool AStar_Incremental::hasNodeBeenVisited(sp<AStarNode> node)
{
    using UMIter = std::unordered_multimap<int, sp<AStarNode> >::iterator;

    bool bAlreadyVisited = false;

    //get all values for key in O(1*)
    int hash = WorldGrid::hashGrid(node->location);
    std::pair<UMIter, UMIter> start_end = memo.equal_range(hash); //bucket range
    UMIter start = start_end.first;
    UMIter end = start_end.second;

    //hopefully this is implemented to be O(m) where m is the bucket size.
    for(UMIter bucketIter = start; bucketIter != end; ++bucketIter)
    {
        sp<AStarNode> previousNode = bucketIter->second;
        sf::Vector2i& previousVisit = previousNode->location;
        if (previousVisit == node->location)
        {
            bAlreadyVisited = true;
            break;
        }
    }

    return bAlreadyVisited;
}