C++ 中的 set 和 unordered_set 有什么区别?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16075890/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
what is the difference between set and unordered_set in C++?
提问by Ajeet Ganga
Came across this good question, which is similar but not at all same since it talks about Java, which has different implementation of hash-tables, by virtue of having synchronized accessor /mutators Differences between HashMap and Hashtable?
遇到了这个好问题,它是相似的,但完全不同,因为它讨论了 Java,它具有不同的哈希表实现,因为具有同步的访问器/mutators HashMap 和 Hashtable 之间的差异?
So what is the difference in C++ implementation of set and unordered_set ? This question can be ofcourse extended to map vs unordered_map and so on for other C++ containers.
那么 set 和 unordered_set 的 C++ 实现有什么区别?这个问题当然可以扩展到 map vs unordered_map 等其他 C++ 容器。
Here is my initial assessment
这是我的初步评估
set: While standard doesnt explicitly asks it to be implemented as trees, the time-complexity constraint asked for its operations for find/insert, means it will always be implemented as tree. Usually as RB tree (as seen in GCC 4.8), which is height-balanced. Since they are height balanced, they have predictable time-complexity for find()
set:虽然标准没有明确要求将其实现为树,但要求其查找/插入操作的时间复杂度约束意味着它将始终作为树实现。通常作为高度平衡的 RB 树(如 GCC 4.8 中所见)。由于它们是高度平衡的,因此它们具有可预测的 find() 时间复杂度
Pros : Compact (compared to other DS in comparison)
优点:紧凑(与其他 DS 相比)
Con : Access time complexity is O(lg n)
缺点:访问时间复杂度为 O(lg n)
unordered_set: While standard doesnt explicitly asks it to be implemented as trees, the time-complexity constraint asked for its operations for find/insert, means it will always be implemented as hash-table.
unordered_set:虽然标准没有明确要求将其实现为树,但要求其查找/插入操作的时间复杂度约束意味着它将始终作为哈希表实现。
Pros :
优点:
- Faster (promises amortized O(1) for search)
- Easy to convert basic primitives to thread-safe, as compared to tree-DS
- 更快(承诺摊销 O(1) 进行搜索)
- 与 tree-DS 相比,易于将基本原语转换为线程安全
Cons :
缺点:
- Look up not guaranteed to be O(1) Therotical worst case is O(n)
- Not as compact as tree. (for practical purposes load factors is never 1)
- 查找不保证是 O(1) 理论上最坏的情况是 O(n)
- 不像树那么紧凑。(出于实际目的,负载因子从不为 1)
Note : The O(1), for hashtable comes from the assumption that there are no collision. Even with load-factor of .5, every second variable insertion is leading to collision. It could be observed that the load-factor of hash-table is inversely proportional to the number of operations required for accessing a element in it. More we reduce #operations, sparser hash-table. When the element stored are of size comparable to pointer, then overhead is quite significant.
注意:哈希表的 O(1) 来自没有冲突的假设。即使负载因子为 0.5,每插入一秒钟的变量都会导致冲突。可以观察到,哈希表的负载因子与访问其中元素所需的操作数成反比。更多我们减少#operations,更稀疏的哈希表。当存储的元素的大小与指针相当时,开销就相当可观。
Edit : Since most are saying question contains sufficient answer in it, I am changing the question to "Did I miss any difference between map/set for performance analysis that one should know ??"
编辑:由于大多数人都说问题中包含足够的答案,因此我将问题更改为“我是否错过了应该知道的性能分析地图/集之间的任何差异??”
采纳答案by Yuushi
I think you've generally answered your own question, however, this:
我认为你通常已经回答了你自己的问题,但是,这个:
Not as compact as tree. (for practical purposes load factors is never 1)
不像树那么紧凑。(出于实际目的,负载因子从不为 1)
is not necessarily true. Each node of a tree (we'll assume it's a red-black tree) for a type T
utilizes space that is equal to at least 2 * pointer_size + sizeof(T) + sizeof(bool)
. This may be 3 * pointer size
depending on whether the tree contains a parent
pointer for each tree node.
不一定是真的。一个类型的树的每个节点(我们假设它是一棵红黑树)T
使用的空间至少等于2 * pointer_size + sizeof(T) + sizeof(bool)
。这可能3 * pointer size
取决于树是否包含parent
每个树节点的指针。
Compare this to a hash-map: there will be wasted array space for each hash map due to the fact that load factor < 1
as you've said. However, assuming the hash map uses singly linked lists for chaining (and really, there's no real reason not to), each element inserted take only sizeof(T) + pointer size
.
将此与哈希映射进行比较:由于load factor < 1
正如您所说的那样,每个哈希映射都会浪费数组空间。然而,假设哈希映射使用单链表进行链接(实际上,没有真正的理由不这样做),插入的每个元素只需要sizeof(T) + pointer size
.
Note that this analysis ignores any overhead which may come from extra space used by alignment.
请注意,此分析忽略了可能来自对齐使用的额外空间的任何开销。
For any element T
which has a small size (so, any basic type), the size of the pointers and other overhead dominates. At a load factor of > 0.5
(for example) the std::unordered_set
may indeed use up less memory than the equivalent std::set
.
对于任何T
具有小尺寸的元素(因此,任何基本类型),指针的大小和其他开销占主导地位。在> 0.5
(例如)的负载因子下,std::unordered_set
可能确实比等效的 使用更少的内存std::set
。
The other big missing point is the fact that iterating through a std::set
is guaranteed to produce an ordering from smallest to largest, based on the given comparison function, while iterating through an std::unordered_set
will return the values in a "random" order.
另一个重要的缺失点是std::set
,基于给定的比较函数,迭代 a可以保证产生从最小到最大的排序,而迭代 anstd::unordered_set
将以“随机”顺序返回值。
回答by dhaffey
Another difference (though not performance-related) is that set
insertion doesn't invalidate iterators, while unordered_set
insertion can if it triggers a rehash. In practice it's a pretty minor concern, since references to the actual elements remain valid.
另一个区别(尽管与性能无关)是set
插入不会使迭代器无效,而unordered_set
插入可以触发重新哈希。实际上,这是一个很小的问题,因为对实际元素的引用仍然有效。
回答by Tony Delroy
Yuushi addresses spatial efficiency and other points well already; just a few other parts of the question I'll comment on...
Yuushi 已经很好地解决了空间效率和其他问题;只是问题的其他几个部分我将评论......
The O(1), for hashtable comes from the assumption that there are no collision.
哈希表的 O(1) 来自没有冲突的假设。
That's not true. What O(1) means is not that the first lookup attempt will always succeed, it's that there is - on average - a constant number of attempts needed, rather than something that grows as the number of values grows. For example, with an unordered_set
or ..._map
, the max_load_factor
defaults to 1.0 on construction, and if load factor approaches that with a good hash function, the averagenumber of elements that hash to any one bucket will be around 2 regardless of how many values are in the table.
这不是真的。O(1) 的意思并不是第一次查找尝试总是会成功,而是平均而言,需要的尝试次数是恒定的,而不是随着值数量的增加而增加。例如,使用 anunordered_set
或 ... _map
,max_load_factor
在构造时默认为 1.0,如果负载因子通过良好的散列函数接近该值,则散列到任何一个桶的元素的平均数量将在 2 左右,无论有多少个值在表中。
Even with load-factor of .5, every second variable insertion is leading to collision.
即使负载因子为 0.5,每插入一秒钟的变量都会导致冲突。
True, but it doesn't get as dire as you might intuitively expect: that average chain length of 2 at 1.0 load factor's not bad.
是的,但它并不像您直觉上预期的那么可怕:在 1.0 负载系数下,平均链长度为 2 还不错。
It could be observed that the load-factor of hash-table is inversely proportional to the number of operations required for accessing a element in it. More we reduce #operations, sparser hash-table.
可以观察到,哈希表的负载因子与访问其中元素所需的操作数成反比。更多我们减少#operations,更稀疏的哈希表。
There's definitely a correlation (it's not inverse).
肯定存在相关性(不是相反的)。
回答by Jayhello
In some case set
is more convenient.
在某些情况下set
更方便。
For example using vector
as key:
例如使用vector
作为键:
set<vector<int>> s;
s.insert({1, 2});
s.insert({1, 3});
s.insert({1, 2});
for(const auto& vec:s)
cout<<vec<<endl; // I have override << for vector
// 1 2
// 1 3
The reason why vector<int>
can be in set
because vector
override operator<
.
之所以vector<int>
会在set
因为vector
覆盖operator<
。
But if you use unordered_set<vector<int>>
you have to create a hash function for vector<int>
, because vector does't have a hash function, so you have to define one like:
但是如果你使用unordered_set<vector<int>>
你必须为 建立一个散列函数vector<int>
,因为 vector 没有散列函数,所以你必须定义一个像:
struct VectorHash {
size_t operator()(const std::vector<int>& v) const {
std::hash<int> hasher;
size_t seed = 0;
for (int i : v) {
seed ^= hasher(i) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}
return seed;
}
};
vector<vector<int>> two(){
//unordered_set<vector<int>> s; // error vector<int> doesn't have hash function
unordered_set<vector<int>, VectorHash> s;
s.insert({1, 2});
s.insert({1, 3});
s.insert({1, 2});
for(const auto& vec:s)
cout<<vec<<endl;
// 1 2
// 1 3
}
you can see that in some case unordered_set
is more complicated.
你可以看到在某些情况下unordered_set
更复杂。
Mainly cited from: https://stackoverflow.com/a/29855973/6329006
主要引用自:https: //stackoverflow.com/a/29855973/6329006
More difference between unordered_set
and set
see this: https://stackoverflow.com/a/52203931/6329006
之间的更多区别unordered_set
,set
请参阅:https: //stackoverflow.com/a/52203931/6329006