C++ 在琐碎的键的情况下,使用 map 比 unordered_map 有什么优势吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2196995/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 22:29:10  来源:igfitidea点击:

Is there any advantage of using map over unordered_map in case of trivial keys?

c++performancedictionaryunordered-map

提问by Kornel Kisielewicz

A recent talk about unordered_mapin C++ made me realize that I should use unordered_mapfor most cases where I used mapbefore, because of the efficiency of lookup ( amortized O(1)vs. O(log n)). Most times I use a map, I use either intor std::stringas the key type; hence, I've got no problems with the definition of the hash function. The more I thought about it, the more I came to realize that I can't find any reason of using a std::mapover a std::unordered_mapin the case of keys with simple types -- I took a look at the interfaces, and didn't find any significant differences that would impact my code.

最近unordered_map在 C++ 中的一次讨论让我意识到我应该unordered_mapmap以前使用的大多数情况下使用,因为查找的效率(摊销 O(1)O(log n))。大多数时候我使用地图,我使用intstd::string作为键类型;因此,我对散列函数的定义没有任何问题。我想过这个问题越多,我越才明白,我找不到任何理由使用的std::mapstd::unordered_map与简单类型的键的情况下-我看了一下界面,并没有发现任何会影响我的代码的重大差异。

Hence the question: is there any real reason to use std::mapover std::unordered_mapin the case of simple types like intand std::string?

因此,问题是:在像and这样的简单类型的情况下,是否有真正的理由使用std::mapover ?std::unordered_mapintstd::string

I'm asking from a strictly programming point of view -- I know that it's not fully considered standard, and that it may pose problems with porting.

我是从严格的编程角度提出问题的——我知道它不是完全被认为是标准的,并且它可能会给移植带来问题。

Also, I expect that one of the correct answers might be "it's more efficient for smaller sets of data"because of a smaller overhead (is that true?) -- hence I'd like to restrict the question to cases where the amount of keys is non-trivial (>1 024).

另外,我希望正确的答案之一可能是“对于较小的数据集更有效”,因为开销较小(这是真的吗?) - 因此我想将问题限制在键是重要的(> 1 024)。

Edit:duh, I forgot the obvious (thanks GMan!) -- yes, maps are ordered of course -- I know that, and am looking for other reasons.

编辑:呃,我忘记了明显的(感谢 GMan!)——是的,地图当然是订购的——我知道这一点,正在寻找其他原因。

回答by GManNickG

Don't forget that mapkeeps its elements ordered. If you can't give that up, obviously you can't use unordered_map.

不要忘记map保持其元素有序。如果你不能放弃它,显然你不能使用unordered_map.

Something else to keep in mind is that unordered_mapgenerally uses more memory. mapjust has a few house-keeping pointers, and memory for each object. Contrarily, unordered_maphas a big array (these can get quite big in some implementations), and then additional memory for each object. If you need to be memory-aware, mapshould prove better, because it lacks the large array.

要记住的另一件事是unordered_map通常使用更多的内存。map只是有一些内务指针和每个对象的内存。相反,unordered_map有一个大数组(在某些实现中这些数组会变得非常大),然后为每个对象提供额外的内存。如果你需要内存感知,map应该证明更好,因为它缺少大数组。

So, if you need pure lookup-retrieval, I'd say unordered_mapis the way to go. But there are always trade-offs, and if you can't afford them, then you can't use it.

所以,如果你需要纯粹的查找检索,我会说unordered_map是要走的路。但总有取舍,如果你买不起,那么你就不能使用它。

Just from personal experience, I found an enormous improvement in performance (measured, of course) when using unordered_mapinstead of mapin a main entity look-up table.

仅从个人经验来看,我发现在使用unordered_map而不是map在主实体查找表中使用时性能(当然是测量的)有巨大的改进。

On the other hand, I found it was much slower at repeatedly inserting and removing elements. It's great for a relatively static collection of elements, but if you're doing tons of insertions and deletions the hashing + bucketing seems to add up. (Note, this was over many iterations.)

另一方面,我发现重复插入和删除元素要慢得多。它非常适合相对静态的元素集合,但是如果您要进行大量的插入和删除操作,散列 + 分桶似乎会加起来。(注意,这是经过多次迭代。)

回答by Blair Zajac

If you want to compare the speed of your std::mapand std::unordered_mapimplementations, you could use Google's sparsehashproject which has a time_hash_map program to time them. For example, with gcc 4.4.2 on an x86_64 Linux system

如果你想比较你的std::mapstd::unordered_map实现的速度,你可以使用谷歌的sparsehash项目,它有一个 time_hash_map 程序来计时。例如,在 x86_64 Linux 系统上使用 gcc 4.4.2

$ ./time_hash_map
TR1 UNORDERED_MAP (4 byte objects, 10000000 iterations):
map_grow              126.1 ns  (27427396 hashes, 40000000 copies)  290.9 MB
map_predict/grow       67.4 ns  (10000000 hashes, 40000000 copies)  232.8 MB
map_replace            22.3 ns  (37427396 hashes, 40000000 copies)
map_fetch              16.3 ns  (37427396 hashes, 40000000 copies)
map_fetch_empty         9.8 ns  (10000000 hashes,        0 copies)
map_remove             49.1 ns  (37427396 hashes, 40000000 copies)
map_toggle             86.1 ns  (20000000 hashes, 40000000 copies)

STANDARD MAP (4 byte objects, 10000000 iterations):
map_grow              225.3 ns  (       0 hashes, 20000000 copies)  462.4 MB
map_predict/grow      225.1 ns  (       0 hashes, 20000000 copies)  462.6 MB
map_replace           151.2 ns  (       0 hashes, 20000000 copies)
map_fetch             156.0 ns  (       0 hashes, 20000000 copies)
map_fetch_empty         1.4 ns  (       0 hashes,        0 copies)
map_remove            141.0 ns  (       0 hashes, 20000000 copies)
map_toggle             67.3 ns  (       0 hashes, 20000000 copies)

回答by Jerry Coffin

I'd echo roughly the same point GMan made: depending on the type of use, std::mapcan be (and often is) faster than std::tr1::unordered_map(using the implementation included in VS 2008 SP1).

我会回应 GMan 提出的大致相同的观点:根据使用类型,std::map可以(并且通常是)比std::tr1::unordered_map(使用 VS 2008 SP1 中包含的实现)更快。

There are a few complicating factors to keep in mind. For example, in std::map, you're comparing keys, which means you only ever look at enough of the beginning of a key to distinguish between the right and left sub-branches of the tree. In my experience, nearly the only time you look at an entire key is if you're using something like int that you can compare in a single instruction. With a more typical key type like std::string, you often compare only a few characters or so.

有一些复杂的因素需要牢记。例如,在 中std::map,您正在比较键,这意味着您只需要查看足够多的键的开头来区分树的左右子分支。根据我的经验,几乎唯一一次查看整个键的情况是,如果您使用的是 int 之类的东西,您可以在单个指令中进行比较。对于像 std::string 这样更典型的键类型,您通常只比较几个字符左右。

A decent hash function, by contrast, always looks at the entirekey. IOW, even if the table lookup is constant complexity, the hash itself has roughly linear complexity (though on the length of the key, not the number of items). With long strings as keys, an std::mapmight finish a search before an unordered_mapwould even startits search.

相比之下,一个体面的哈希函数总是查看整个密钥。IOW,即使表查找的复杂性是恒定的,哈希本身也具有大致线性的复杂性(尽管是键的长度,而不是项目的数量)。随着长串钥匙,一个std::map可能完成搜索之前unordered_map,甚至将开始其搜索。

Second, while there are several methods of resizing hash tables, most of them are pretty slow -- to the point that unless lookups are considerablymore frequent than insertions and deletions, std::map will often be faster than std::unordered_map.

其次,虽然有调整哈希表的几种方法,其中大部分是相当缓慢的-来,除非是查找点大大高于插入和缺失更频繁,性病::地图通常会比快std::unordered_map

Of course, as I mentioned in the comment on your previous question, you can also use a table of trees. This has both advantages and disadvantages. On one hand, it limits the worst case to that of a tree. It also allows fast insertion and deletion, because (at least when I've done it) I've used a fixed-size of table. Eliminating alltable resizing allows you to keep your hash table a lot simpler and typically faster.

当然,正如我在上一个问题的评论中提到的,您也可以使用树表。这既有优点也有缺点。一方面,它将最坏的情况限制为一棵树的情况。它还允许快速插入和删除,因为(至少在我完成后)我使用了固定大小的表。消除所有表调整大小可以让您的哈希表更简单,通常更快。

One other point: the requirements for hashing and tree-based maps are different. Hashing obviously requires a hash function, and an equality comparison, where ordered maps require a less-than comparison. Of course the hybrid I mentioned requires both. Of course, for the common case of using a string as the key, this isn't really a problem, but some types of keys suit ordering better than hashing (or vice versa).

另一点:散列和基于树的映射的要求是不同的。散列显然需要散列函数和相等比较,其中有序映射需要小于比较。当然,我提到的混合动力车两者都需要。当然,对于使用字符串作为键的常见情况,这并不是真正的问题,但某些类型的键比散列更适合排序(反之亦然)。

回答by Gearoid Murphy

I was intrigued by the answer from @Jerry Coffin, which suggested that the ordered map would exhibit performance increases on long strings, after some experimentation (which can be downloaded from pastebin), I've found that this only seems to hold true for collections of random strings, when the map is initialised with a sorted dictionary (which contain words with considerable amounts of prefix-overlap), this rule breaks down, presumably because of the increased tree depth necessary to retrieve value. The results are shown below, the 1st number column is insert time, 2nd is fetch time.

我对@Jerry Coffin 的回答很感兴趣,它表明有序映射会在长字符串上表现出性能提升,经过一些实验(可以从pastebin下载),我发现这似乎只适用于集合对于随机字符串,当地图使用排序字典(其中包含具有大量前缀重叠的单词)初始化时,该规则失效,大概是因为检索值所需的树深度增加。结果如下图,第一列是插入时间,第二列是取回时间。

g++ -g -O3 --std=c++0x   -c -o stdtests.o stdtests.cpp
g++ -o stdtests stdtests.o
gmurphy@interloper:HashTests$ ./stdtests
# 1st number column is insert time, 2nd is fetch time
 ** Integer Keys ** 
 unordered:      137      15
   ordered:      168      81
 ** Random String Keys ** 
 unordered:       55      50
   ordered:       33      31
 ** Real Words Keys ** 
 unordered:      278      76
   ordered:      516     298

回答by Matthieu M.

I would just point out that... there are many kind of unordered_maps.

我只想指出......有很多种类unordered_map

Look up the Wikipedia Articleon hash map. Depending on which implementation was used, the characteristics in term of look-up, insertion and deletion might vary quite significantly.

在哈希映射上查找维基百科文章。根据使用的实现,查找、插入和删除方面的特性可能会有很大差异。

And that's what worries me the most with the addition of unordered_mapto the STL: they will have to choose a particular implementation as I doubt they'll go down the Policyroad, and so we will be stuck with an implementation for the average use and nothing for the other cases...

这就是添加unordered_map到 STL 时最让我担心的:他们将不得不选择一个特定的实现,因为我怀疑他们会走上这Policy条路,所以我们将被困在一个普通用途的实现上,而没有什么其他情况...

For example some hash maps have linear rehashing, where instead of rehashing the whole hash map at once, a portion is rehash at each insertion, which helps amortizing the cost.

例如,一些哈希映射具有线性重新哈希,而不是一次重新哈希整个哈希映射,而是在每次插入时重新哈希一部分,这有助于分摊成本。

Another example: some hash maps use a simple list of nodes for a bucket, others use a map, others don't use nodes but find the nearest slot and lastly some will use a list of nodes but reorder it so that the last accessed element is at the front (like a caching thing).

另一个例子:一些哈希映射使用一个简单的节点列表作为一个桶,其他人使用一个映射,其他人不使用节点但找到最近的槽,最后一些将使用节点列表但重新排序,以便最后访问的元素在前面(就像缓存一样)。

So at the moment I tend to prefer the std::mapor perhaps a loki::AssocVector(for frozen data sets).

所以目前我倾向于更喜欢std::map或者可能是loki::AssocVector(对于冻结的数据集)。

Don't get me wrong, I'd like to use the std::unordered_mapand I may in the future, but it's difficult to "trust" the portability of such a container when you think of all the ways of implementing it and the various performances that result of this.

不要误会我的意思,我想使用std::unordered_map并且我将来可能会使用,但是当您想到实现它的所有方法以及由此产生的各种性能时,很难“信任”这种容器的可移植性这个的。

回答by user1531083

Significant differences that have not really been adequately mentioned here:

此处并未真正充分提及的显着差异:

  • mapkeeps iterators to all elements stable, in C++17 you can even move elements from one mapto the other without invalidating iterators to them (and if properly implemented without any potential allocation).
  • maptimings for single operations are typically more consistent since they never need large allocations.
  • unordered_mapusing std::hashas implemented in the libstdc++ is vulnerable to DoS if fed with untrusted input (it uses MurmurHash2 with a constant seed - not that seeding would really help, see https://emboss.github.io/blog/2012/12/14/breaking-murmur-hash-flooding-dos-reloaded/).
  • Being ordered enables efficient range searches, e.g. iterate over all elements with key ≥ 42.
  • map使所有元素的迭代器保持稳定,在 C++17 中,您甚至可以将元素从一个移动map到另一个,而不会使它们的迭代器失效(并且如果正确实现而没有任何潜在的分配)。
  • map单个操作的时间通常更一致,因为它们从不需要大量分配。
  • unordered_mapstd::hash如果输入不受信任的输入,则在 libstdc++ 中实现的使用容易受到 DoS 攻击(它使用带有恒定种子的 MurmurHash2 - 种子并不是真的有帮助,请参阅https://emboss.github.io/blog/2012/12/14/打破杂音散列泛滥-dos-reloaded/)。
  • 有序可实现高效的范围搜索,例如迭代所有 key ≥ 42 的元素。

回答by user1531083

Hash tables have higher constants than common map implementations, which become significant for small containers. Max size is 10, 100, or maybe even 1,000 or more? Constants are the same as ever, but O(log n) is close to O(k). (Remember logarithmic complexity is still reallygood.)

哈希表具有比普通映射实现更高的常量,这对于小容器来说变得很重要。最大大小是 10、100,甚至 1,000 或更多?常量和以前一样,但 O(log n) 接近 O(k)。(记住对数复杂度仍然非常好。)

What makes a good hash function depends on your data's characteristics; so if I don't plan on looking at a custom hash function (but can certainly change my mind later, and easily since I typedef damn near everything) and even though defaults are chosen to perform decently for many data sources, I find the ordered nature of map to be enough of a help initially that I still default to map rather than a hash table in that case.

什么是好的散列函数取决于您的数据特征;因此,如果我不打算查看自定义散列函数(但以后肯定会改变我的想法,而且很容易因为我在几乎所有内容都进行了类型定义)并且即使选择默认值以对许多数据源执行体面,我发现有序map 的性质最初足以提供帮助,在这种情况下,我仍然默认映射而不是哈希表。

Plus that way you don't have to even think about writing a hash function for other (usually UDT) types, and just write op< (which you want anyway).

此外,您甚至不必考虑为其他(通常是 UDT)类型编写哈希函数,只需编写 op<(无论如何您都想要)。

回答by Don Hatch

Reasons have been given in other answers; here is another.

原因已在其他答案中给出;这是另一个。

std::map (balanced binary tree) operations are amortized O(log n) and worst case O(log n). std::unordered_map (hash table) operations are amortized O(1) and worst case O(n).

std::map(平衡二叉树)操作分摊 O(log n) 和最坏情况 O(log n)。std::unordered_map (hash table) 操作分摊 O(1) 和最坏情况 O(n)。

How this plays out in practice is that the hash table "hiccups" every once in a while with an O(n) operation, which may or may not be something your application can tolerate. If it can't tolerate it, you'd prefer std::map over std::unordered_map.

这在实践中的表现是哈希表每隔一段时间就会“打嗝”一次 O(n) 操作,这可能是您的应用程序可以容忍的,也可能不是。如果它不能容忍它,你会更喜欢 std::map 而不是 std::unordered_map。

回答by Shital Shah

Summary

概括

Assuming ordering is not important:

假设排序不重要:

  • If you are going to build large table once and do lots of queries, use std::unordered_map
  • If you are going to build small table (may be under 100 elements) and do lots of queries, use std::map. This is because reads on it are O(log n).
  • If you are going to change table a lot then may bestd::mapis good option.
  • If you are in doubt, just use std::unordered_map.
  • 如果您要构建一次大表并进行大量查询,请使用 std::unordered_map
  • 如果您要构建小表(可能少于 100 个元素)并进行大量查询,请使用std::map. 这是因为对它的读取是O(log n).
  • 如果您要经常更换桌子,那么可能std::map是不错的选择。
  • 如果您有疑问,只需使用std::unordered_map.

Historical Context

历史背景

In most languages, unordered map (aka hash based dictionaries) are the default map however in C++ you get ordered map as default map. How did that happen? Some people erroneously assume that C++ committee made this decision in their unique wisdom but the truth is unfortunately uglier than that.

在大多数语言中,无序映射(又名基于哈希的字典)是默认映射,但是在 C++ 中,您将有序映射作为默认映射。那是怎么发生的?有些人错误地认为 C++ 委员会以他们独特的智慧做出了这个决定,但不幸的是,事实比这更丑陋。

It is widely believedthat C++ ended up with ordered map as default because there are not too many parameters on how they can be implemented. On the other hand, hash based implementations has tons of things to talk about. So to avoid gridlocks in standardization they just got alongwith ordered map. Around 2005, many languages already had good implementations of hash based implementation and so it was more easier for the committee to accept new std::unordered_map. In a perfect world, std::mapwould have been unordered and we would have std::ordered_mapas separate type.

人们普遍认为,C++ 最终以有序映射作为默认值,因为没有太多关于如何实现它们的参数。另一方面,基于哈希的实现有很多事情要讨论。因此,为了避免标准化中的僵局,他们只是与有序地图相处。2005 年左右,许多语言已经很好地实现了基于哈希的实现,因此委员会更容易接受新的std::unordered_map. 在一个完美的世界中,std::map将是无序的,我们将拥有std::ordered_map单独的类型。

Performance

表现

Below two graphs should speak for themselves (source):

下面的两张图不言自明(来源):

enter image description here

在此处输入图片说明

enter image description here

在此处输入图片说明

回答by wendong

I've made a test recently which makes 50000 merge&sort. That means if the string keys are the same, merge the byte string. And the final output should be sorted. So this includes a look up for every insertion.

我最近做了一个测试,可以进行 50000 次合并和排序。这意味着如果字符串键相同,则合并字节字符串。并且最终的输出应该被排序。所以这包括对每个插入的查找。

For the mapimplementation, it takes 200 ms to finish the job. For the unordered_map+ map, it takes 70 ms for unordered_mapinsertion and 80 ms for mapinsertion. So the hybrid implementation is 50 ms faster.

对于map实现,完成工作需要 200 毫秒。对于unordered_map+ mapunordered_map插入需要 70 毫秒,插入需要80 毫秒map。所以混合实现快了 50 毫秒。

We should think twice before we use the map. If you only need the data to be sorted in the final result of your program, a hybrid solution may be better.

在使用map. 如果您只需要在程序的最终结果中对数据进行排序,那么混合解决方案可能会更好。