C语言 链式哈希表与开放寻址哈希表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2556142/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 05:01:19  来源:igfitidea点击:

Chained Hash Tables vs. Open-Addressed Hash Tables

cdata-structureshashtable

提问by Andrei Ciobanu

Can somebody explain the main differences between (advantages / disadvantages) the two implementations?

有人可以解释(优点/缺点)这两种实现之间的主要区别吗?

For a library, what implementation is recommended?

对于图书馆,推荐什么实现?

回答by Richard Barrell

Wikipedia's article on hash tablesgives a distinctly better explanation and overview of different hash table schemes that people have used than I'm able to off the top of my head. In fact you're probably better off reading that article than asking the question here. :)

维基百科关于哈希表的文章对人们使用的不同哈希表方案提供了明显更好的解释和概述,而我无法想象。事实上,阅读那篇文章可能比在这里提问要好。:)

That said...

那说...

A chained hash table indexes into an array of pointers to the heads of linked lists. Each linked list cell has the key for which it was allocated and the value which was inserted for that key. When you want to look up a particular element from its key, the key's hash is used to work out which linked list to follow, and then that particular list is traversed to find the element that you're after. If more than one key in the hash table has the same hash, then you'll have linked lists with more than one element.

链式哈希表索引到指向链表头部的指针数组。每个链表单元格都有为其分配的键和为该键插入的值。当您想从其键中查找特定元素时,键的散列用于确定要跟随哪个链表,然后遍历该特定列表以查找您要查找的元素。如果哈希表中的多个键具有相同的哈希值,那么您将拥有包含多个元素的链表。

The downside of chained hashing is having to follow pointers in order to search linked lists. The upside is that chained hash tables only get linearly slower as the load factor (the ratio of elements in the hash table to the length of the bucket array) increases, even if it rises above 1.

链式散列的缺点是必须遵循指针才能搜索链表。好处是链式哈希表只会随着负载因子(哈希表中元素与桶数组长度的比率)的增加而线性变慢,即使它上升到 1 以上。

An open-addressing hash table indexes into an array of pointers to pairs of (key, value). You use the key's hash value to work out which slot in the array to look at first. If more than one key in the hash table has the same hash, then you use some scheme to decide on another slot to look in instead. For example, linear probing is where you look at the next slot after the one chosen, and then the next slot after that, and so on until you either find a slot that matches the key you're looking for, or you hit an empty slot (in which case the key must not be there).

开放寻址散列表索引到指向(键,值)对的指针数组。您可以使用键的散列值来确定要首先查看数组中的哪个插槽。如果哈希表中的多个键具有相同的哈希值,那么您可以使用某种方案来决定要查找的另一个槽。例如,线性探测是您查看所选的下一个槽位之后的下一个槽位,然后再查看下一个槽位,依此类推,直到您找到与您要查找的键匹配的槽位,或者您击中了一个空位插槽(在这种情况下,密钥不能在那里)。

Open-addressing is usually faster than chained hashing when the load factor is low because you don't have to follow pointers between list nodes. It gets very, very slow if the load factor approaches 1, because you end up usually having to search through many of the slots in the bucket array before you find either the key that you were looking for or an empty slot. Also, you can never have more elements in the hash table than there are entries in the bucket array.

当负载因子较低时,开放寻址通常比链式散列更快,因为您不必遵循列表节点之间的指针。如果负载因子接近 1,它会变得非常非常慢,因为在找到要查找的键或空槽之前,您通常必须搜索存储桶数组中的许多槽。此外,哈希表中的元素永远不会多于存储桶数组中的条目。

To deal with the fact that all hash tables at least get slower (and in some cases actually break completely) when their load factor approaches 1, practical hash table implementations make the bucket array larger (by allocating a new bucket array, and copying elements from the old one into the new one, then freeing the old one) when the load factor gets above a certain value (typically about 0.7).

为了处理当负载因子接近 1 时所有哈希表至少变慢(并且在某些情况下实际上完全破坏)这一事实,实际的哈希表实现使存储桶数组更大(通过分配一个新的存储桶数组,并从当负载因子超过某个值(通常约为 0.7)时,将旧的放入新的,然后释放旧的)。

There are lots of variations on all of the above. Again, please see the wikipedia article, it really is quite good.

以上所有内容都有很多变化。再次,请参阅维基百科文章,它确实非常好。

For a library that is meant to be used by other people, I would stronglyrecommend experimenting. Since they're generally quite performance-crucial, you're usually best off using somebody else's implementation of a hash table which has already been carefully tuned. There are lots of open-source BSD, LGPL and GPL licensed hash table implementations.

对于供其他人使用的库,我强烈建议进行试验。由于它们通常对性能非常重要,因此您通常最好使用其他人已经仔细调整过的哈希表的实现。有很多开源的 BSD、LGPL 和 GPL 许可的哈希表实现。

If you're working with GTK, for example, then you'll find that there's a good hash table in GLib.

例如,如果您正在使用 GTK,那么您会发现GLib中有一个很好的哈希表

回答by oba2311

Since excellent explanation is given, I'd just add visualizations taken from CLRS for further illustration:

由于给出了很好的解释,我只是添加了从 CLRS 获取的可视化以进一步说明:

Open Addressing: Open Addressing:

开放寻址: 开放寻址:

Chaining: Chaining:

链接: 链接:

回答by YuVi

My understanding (in simple terms) is that both the methods has pros and cons, though most of the libraries use Chaining strategy.

我的理解(简单来说)是这两种方法各有利弊,尽管大多数库都使用链式策略。

Chaining Method:

链接方法:

Here the hash tables array maps to a linked list of items. This is efficient if the number of collision is fairly small. The worst case scenario is O(n)where n is the number of elements in the table.

这里哈希表数组映射到项目的链接列表。如果碰撞次数相当少,这是有效的。最坏的情况是O(n)n 是表中元素的数量。

Open Addressing with Linear Probe:

使用线性探针进行开放寻址:

Here when the collision occurs, move on to the next index until we find an open spot. So, if the number of collision is low, this is very fast and space efficient. The limitation here is the total number of entries in the table is limited by the size of the array. This is not the case with chaining.

在这里当碰撞发生时,移动到下一个索引,直到我们找到一个空位。因此,如果碰撞次数较少,则速度非常快且节省空间。这里的限制是表中条目的总数受数组大小的限制。链接不是这种情况。

There is another approach which is Chaining with binary search trees. In this approach, when the collision occurs, they are stored in binary search tree instead of linked list. Hence, the worst case scenario here would be O(log n). In practice, this approach is best suited when there is a extremely nonuniform distribution.

还有另一种方法是用二叉搜索树链接。在这种方法中,当发生碰撞时,它们存储在二叉搜索树而不是链表中。因此,这里最坏的情况是O(log n). 在实践中,这种方法最适合分布极不均匀的情况。

回答by Santosh Sindham

Open addressing vs. separate chaining

开放寻址与单独链接

Linear probing, double and random hashing are appropriate if the keys are kept as entries in the hashtable itself... doing that is called "open addressing" it is also called "closed hashing"

如果键作为条目保留在哈希表本身中,则线性探测、双重和随机哈希是合适的……这样做称为“开放寻址”,也称为“封闭哈希”

Another idea: Entries in the hashtable are just pointers to the head of a linked list (“chain”); elements of the linked list contain the keys... this is called "separate chaining" it is also called "open hashing"

另一个想法:哈希表中的条目只是指向链表(“链”)头部的指针;链表的元素包含键......这称为“单独链接”,也称为“开放散列”

Collision resolution becomes easy with separate chaining: just insert a key in its linked list if it is not already there (It is possible to use fancier data structures than linked lists for this; but linked lists work very well in the average case, as we will see) Let's look at analyzing time costs of these strategies

使用单独的链接可以轻松解决冲突:如果键不存在,只需在其链表中插入一个键(为此可以使用比链表更高级的数据结构;但链表在一般情况下工作得很好,因为我们将看到)让我们来看看分析这些策略的时间成本

Source: http://cseweb.ucsd.edu/~kube/cls/100/Lectures/lec16/lec16-25.html

资料来源:http: //cseweb.ucsd.edu/~kube/cls/100/Lectures/lec16/lec16-25.html

回答by Yogesh Umesh Vaity

If the number of items that will be inserted in a hash table isn't known when the table is created, chained hash table is preferable to open addressing.

如果在创建表时不知道将插入哈希表的项目数,则链式哈希表比开放寻址更可取。

Increasing the load factor(number of items/table size) causes major performance penalties in open addressed hash tables, but performance degrades only linearly in chained hash tables.

增加负载因子(项目数/表大小)会导致开放寻址散列表中的主要性能损失,但在链式散列表中性能仅线性下降。

If you are dealing with low memory and want to reduce memory usage, go for open addressing. If you are not worried about memory and want speed, go for chained hash tables.

如果您正在处理低内存并希望减少内存使用,请使用开放寻址。如果您不担心内存并想要速度,请使用链式哈希表。

When in doubt, use chained hash tables. Adding more data than you anticipated won't cause performance to slow to a crawl.

如有疑问,请使用链式哈希表。添加比预期更多的数据不会导致性能下降。