C++ STL中的map和hashmap有什么区别

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5139859/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 17:30:01  来源:igfitidea点击:

what the difference between map and hashmap in STL

c++stl

提问by user496949

in C++ STL, there are two map, map and hashmap. Anyone know the main difference of them?

在C++ STL中,有两种map,map和hashmap。有人知道它们的主要区别吗?

回答by martinus

map uses a red-black tree as the data structure, so the elements you put in there are sorted, and insert/delete is O(log(n)). The elements need to implement at least operator<.

map使用红黑树作为数据结构,所以你放进去的元素都是有序的,插入/删除是O(log(n))。元素至少需要实现operator<

hashmap uses a hash, so elements are unsorted, insert/delete is O(1). Elements need to implement at least operator==and you need a hash function.

hashmap 使用散列,因此元素未排序,插入/删除是 O(1)。元素至少需要实现,operator==你需要一个哈希函数。

回答by CashCow

hash_map uses a hash table. This is "constant" time in theory. Most implementations use a "collision" hash table. What happens in reality is:

hash_map 使用哈希表。这在理论上是“恒定”时间。大多数实现使用“碰撞”哈希表。现实中发生的事情是:

  • It creates a big table
  • You have a "hash" function for your object that generates you a random place in the table (random-looking, but the hash function will always return the same value for your object) and usually this is the mod of the actual 32-bit (or 64-bit) hash value with the size of the table.
  • The table looks to see if the space is available. If so it places the item in the table. If not it checks if the element there is the one you are trying to insert. If so it is a duplicate so no insert. If not, this is called a "collision" and it uses some formula to find another cell and this continues until it either finds a duplicate or an empty cell.
  • When the table gets filled up too much it resizes. An efficient (in time) implementation will store all the original hash values together with the elements so it won't need to recalculate the hashes when it does this. In addition, comparing the hashes is usually faster than comparing the elements, so it can do this whilst searching to eliminate most of the collisions as a pre-step.
  • If you never delete anything it is simple. However deleting elements adds an extra complexity. A cell that had an element in it which has been deleted is in a different state from one that was just empty all along, as you may have had collisions and if you just empty it, those elements won't be found. So there is usually some "mark". Of course now when we want to reuse the cell, we still have to recurse down in case there is a duplicate lower down (in which case we can't insert in this cell), then remember to reuse the deleted cell.
  • The usual constraint is that your objects must be implemented to check for equality, but Dinkumware (or was it SGI) implemented theirs with operator< which might be slower but has the advantage of decoupling your elements and the type of associated container they can be stored in, although you still need a hash function to store in a hash.
  • 它创建了一张大桌子
  • 您的对象有一个“散列”函数,它会在表中生成一个随机位置(看起来是随机的,但散列函数将始终为您的对象返回相同的值),通常这是实际 32 位的 mod (或 64 位)哈希值与表的大小。
  • 该表查看空间是否可用。如果是这样,它会将项目放入表格中。如果不是,它会检查元素是否是您要插入的元素。如果是这样它是重复的所以没有插入。如果没有,这被称为“碰撞”,它使用一些公式来查找另一个单元格,并且一直持续到找到重复单元格或空单元格。
  • 当表格填满太多时,它会调整大小。一个有效的(及时)实现会将所有原始散列值与元素一起存储,因此在执行此操作时不需要重新计算散列值。此外,比较散列通常比比较元素更快,因此它可以在搜索以消除大多数冲突的同时执行此操作作为前一步。
  • 如果您从不删除任何内容,这很简单。然而,删除元素会增加额外的复杂性。包含已删除元素的单元格与一直为空的单元格处于不同的状态,因为您可能发生了冲突,如果您只是清空它,将找不到这些元素。所以通常会有一些“标记”。当然现在我们要重用cell的时候,还是要向下递归,以防有重复的down down(这种情况下不能插入到这个cell中),然后记得重用被删除的cell。
  • 通常的限制是你的对象必须被实现来检查相等性,但是 Dinkumware(或者它是 SGI)用 operator< 实现了它们的对象,这可能会更慢但具有解耦元素和它们可以存储的关联容器类型的优点in,虽然你仍然需要一个散列函数来存储在散列中。

The theory is that if you have a big enough table, the operations are constant time, i.e. it does not depend on the number of actual elements you have. In practice, of course, the more elements you have the more collisions occur.

理论是如果你有一个足够大的表,操作是常数时间,即它不依赖于你拥有的实际元素的数量。在实践中,当然,您拥有的元素越多,发生的冲突就越多。

std::map uses a binary tree. There is no need to define a hash function for an object, just strictly ordered comparison. On insertion it recurses down the tree to find the insertion point (and whether there are any duplicates) and adds the node, and may need to rebalance the tree so the depth of leaves is never more than 1 apart. Rebalancing time is relative to the depth of the tree too so all these operations are O(log N) where N is the number of elements.

std::map 使用二叉树。不需要为对象定义散列函数,只需严格有序的比较。在插入时,它在树上递归以找到插入点(以及是否有任何重复项)并添加节点,并且可能需要重新平衡树,因此叶子的深度永远不会超过 1。重新平衡时间也与树的深度有关,因此所有这些操作都是 O(log N),其中 N 是元素的数量。

The advantages of hash is the complexity The advantages of the tree is:

hash的优点是复杂度树的优点是:

  • Totally scalable. It only uses what it needs, no need for a huge table or to pre-empt the size of the table, although hash may require less "baggage" per element than a tree.
  • No need to hash first, which for a good function can take longer than the comparisons would if the data set is not large.
  • 完全可扩展。它只使用它需要的东西,不需要一个巨大的表或抢占表的大小,尽管哈希可能比树需要更少的每个元素的“行李”。
  • 不需要先散列,如果数据集不大,那么对于一个好的函数来说,这可能比比较花费的时间更长。

One other issue with std::mapis that it uses a single strictly-ordered comparison function whilst a "compare" function that returned -1, 0 or 1 would be a lot more efficient, particularly with the most commonly used key type, std::string, which already has this function implemented (it is char_traits::compare). (This inefficiency is based on the premise that to check that x==y, you check x<yand y<xso you do two comparisons. You would do this just once per lookup).

另一个问题std::map是它使用单个严格排序的比较函数,而返回 -1、0 或 1 的“比较”函数效率更高,特别是对于最常用的键类型 std::string,已经实现了这个功能(它是char_traits::compare)。(这种效率低下的前提是要检查x==y、检查 、x<y然后y<x进行两次比较。每次查找只执行一次)。

回答by Erik

mapis a red-black tree, O(log(n))access time. hash_map(which is not standard, however unordered_mapwill become standard) uses (conceptually) a hash of the key as an index in an array of linked lists, and therefore has a best-case access time of O(1)and a worst case of O(n).

map是一棵红黑树,O(log(n))访问时间。hash_map(这不是标准的,但unordered_map将成为标准)使用(概念上)键的散列作为链表数组中的索引,因此具有最佳情况的访问时间O(1)和最坏的情况O(n)

See http://en.wikipedia.org/wiki/Red-black_tree

http://en.wikipedia.org/wiki/Red-black_tree

回答by Matteo TeoMan Mangano

The main difference is the searching time.

主要区别在于搜索时间。

for few data is better map

对于少数数据是更好的地图

for lots of data is better hashmap

对于大量数据是更好的哈希图

anyway the tecnical answers given previously are correct.

无论如何,之前给出的技术答案是正确的。