C++ 中的 set 和 unordered_set 有什么区别？

Question

提问by Ajeet Ganga

Came across this good question, which is similar but not at all same since it talks about Java, which has different implementation of hash-tables, by virtue of having synchronized accessor /mutators Differences between HashMap and Hashtable?

遇到了这个好问题，它是相似的，但完全不同，因为它讨论了 Java，它具有不同的哈希表实现，因为具有同步的访问器/mutators HashMap 和 Hashtable 之间的差异？

So what is the difference in C++ implementation of set and unordered_set ? This question can be ofcourse extended to map vs unordered_map and so on for other C++ containers.

那么 set 和 unordered_set 的 C++ 实现有什么区别？这个问题当然可以扩展到 map vs unordered_map 等其他 C++ 容器。

Here is my initial assessment

这是我的初步评估

set: While standard doesnt explicitly asks it to be implemented as trees, the time-complexity constraint asked for its operations for find/insert, means it will always be implemented as tree. Usually as RB tree (as seen in GCC 4.8), which is height-balanced. Since they are height balanced, they have predictable time-complexity for find()

set：虽然标准没有明确要求将其实现为树，但要求其查找/插入操作的时间复杂度约束意味着它将始终作为树实现。通常作为高度平衡的 RB 树（如 GCC 4.8 中所见）。由于它们是高度平衡的，因此它们具有可预测的 find() 时间复杂度

Pros : Compact (compared to other DS in comparison)

优点：紧凑（与其他 DS 相比）

Con : Access time complexity is O(lg n)

缺点：访问时间复杂度为 O(lg n)

unordered_set: While standard doesnt explicitly asks it to be implemented as trees, the time-complexity constraint asked for its operations for find/insert, means it will always be implemented as hash-table.

unordered_set：虽然标准没有明确要求将其实现为树，但要求其查找/插入操作的时间复杂度约束意味着它将始终作为哈希表实现。

Pros :

优点：

Faster (promises amortized O(1) for search)
Easy to convert basic primitives to thread-safe, as compared to tree-DS

更快（承诺摊销 O(1) 进行搜索）
与 tree-DS 相比，易于将基本原语转换为线程安全

Cons :

缺点：

Look up not guaranteed to be O(1) Therotical worst case is O(n)
Not as compact as tree. (for practical purposes load factors is never 1)

查找不保证是 O(1) 理论上最坏的情况是 O(n)
不像树那么紧凑。（出于实际目的，负载因子从不为 1）

Note : The O(1), for hashtable comes from the assumption that there are no collision. Even with load-factor of .5, every second variable insertion is leading to collision. It could be observed that the load-factor of hash-table is inversely proportional to the number of operations required for accessing a element in it. More we reduce #operations, sparser hash-table. When the element stored are of size comparable to pointer, then overhead is quite significant.

注意：哈希表的 O(1) 来自没有冲突的假设。即使负载因子为 0.5，每插入一秒钟的变量都会导致冲突。可以观察到，哈希表的负载因子与访问其中元素所需的操作数成反比。更多我们减少#operations，更稀疏的哈希表。当存储的元素的大小与指针相当时，开销就相当可观。

Edit : Since most are saying question contains sufficient answer in it, I am changing the question to "Did I miss any difference between map/set for performance analysis that one should know ??"

编辑：由于大多数人都说问题中包含足够的答案，因此我将问题更改为“我是否错过了应该知道的性能分析地图/集之间的任何差异？？”

Answer 1

采纳答案by Yuushi

I think you've generally answered your own question, however, this:

我认为你通常已经回答了你自己的问题，但是，这个：

Not as compact as tree. (for practical purposes load factors is never 1)

不像树那么紧凑。（出于实际目的，负载因子从不为 1）

is not necessarily true. Each node of a tree (we'll assume it's a red-black tree) for a type Tutilizes space that is equal to at least 2 * pointer_size + sizeof(T) + sizeof(bool). This may be 3 * pointer sizedepending on whether the tree contains a parentpointer for each tree node.

不一定是真的。一个类型的树的每个节点（我们假设它是一棵红黑树）T使用的空间至少等于2 * pointer_size + sizeof(T) + sizeof(bool)。这可能3 * pointer size取决于树是否包含parent每个树节点的指针。

Compare this to a hash-map: there will be wasted array space for each hash map due to the fact that load factor < 1as you've said. However, assuming the hash map uses singly linked lists for chaining (and really, there's no real reason not to), each element inserted take only sizeof(T) + pointer size.

将此与哈希映射进行比较：由于load factor < 1正如您所说的那样，每个哈希映射都会浪费数组空间。然而，假设哈希映射使用单链表进行链接（实际上，没有真正的理由不这样做），插入的每个元素只需要sizeof(T) + pointer size.

Note that this analysis ignores any overhead which may come from extra space used by alignment.

请注意，此分析忽略了可能来自对齐使用的额外空间的任何开销。

For any element Twhich has a small size (so, any basic type), the size of the pointers and other overhead dominates. At a load factor of > 0.5(for example) the std::unordered_setmay indeed use up less memory than the equivalent std::set.

对于任何T具有小尺寸的元素（因此，任何基本类型），指针的大小和其他开销占主导地位。在> 0.5（例如）的负载因子下，std::unordered_set可能确实比等效的使用更少的内存std::set。

The other big missing point is the fact that iterating through a std::setis guaranteed to produce an ordering from smallest to largest, based on the given comparison function, while iterating through an std::unordered_setwill return the values in a "random" order.

另一个重要的缺失点是std::set，基于给定的比较函数，迭代 a可以保证产生从最小到最大的排序，而迭代 anstd::unordered_set将以“随机”顺序返回值。

Answer 2

回答by dhaffey

Another difference (though not performance-related) is that setinsertion doesn't invalidate iterators, while unordered_setinsertion can if it triggers a rehash. In practice it's a pretty minor concern, since references to the actual elements remain valid.

另一个区别（尽管与性能无关）是set插入不会使迭代器无效，而unordered_set插入可以触发重新哈希。实际上，这是一个很小的问题，因为对实际元素的引用仍然有效。

Answer 3

回答by Tony Delroy

Yuushi addresses spatial efficiency and other points well already; just a few other parts of the question I'll comment on...

Yuushi 已经很好地解决了空间效率和其他问题；只是问题的其他几个部分我将评论......

The O(1), for hashtable comes from the assumption that there are no collision.

哈希表的 O(1) 来自没有冲突的假设。

That's not true. What O(1) means is not that the first lookup attempt will always succeed, it's that there is - on average - a constant number of attempts needed, rather than something that grows as the number of values grows. For example, with an unordered_setor ..._map, the max_load_factordefaults to 1.0 on construction, and if load factor approaches that with a good hash function, the averagenumber of elements that hash to any one bucket will be around 2 regardless of how many values are in the table.

这不是真的。O(1) 的意思并不是第一次查找尝试总是会成功，而是平均而言，需要的尝试次数是恒定的，而不是随着值数量的增加而增加。例如，使用 anunordered_set或 ... _map，max_load_factor在构造时默认为 1.0，如果负载因子通过良好的散列函数接近该值，则散列到任何一个桶的元素的平均数量将在 2 左右，无论有多少个值在表中。

Even with load-factor of .5, every second variable insertion is leading to collision.

即使负载因子为 0.5，每插入一秒钟的变量都会导致冲突。

True, but it doesn't get as dire as you might intuitively expect: that average chain length of 2 at 1.0 load factor's not bad.

是的，但它并不像您直觉上预期的那么可怕：在 1.0 负载系数下，平均链长度为 2 还不错。

It could be observed that the load-factor of hash-table is inversely proportional to the number of operations required for accessing a element in it. More we reduce #operations, sparser hash-table.

可以观察到，哈希表的负载因子与访问其中元素所需的操作数成反比。更多我们减少#operations，更稀疏的哈希表。

There's definitely a correlation (it's not inverse).

肯定存在相关性（不是相反的）。

Answer 4

回答by Jayhello

In some case setis more convenient.

在某些情况下set更方便。

For example using vectoras key:

例如使用vector作为键：

set<vector<int>> s;
s.insert({1, 2});
s.insert({1, 3});
s.insert({1, 2});

for(const auto& vec:s)
    cout<<vec<<endl;   // I have override << for vector
// 1 2
// 1 3

The reason why vector<int>can be in setbecause vectoroverride operator<.

之所以vector<int>会在set因为vector覆盖operator<。

But if you use unordered_set<vector<int>>you have to create a hash function for vector<int>, because vector does't have a hash function, so you have to define one like:

但是如果你使用unordered_set<vector<int>>你必须为建立一个散列函数vector<int>，因为 vector 没有散列函数，所以你必须定义一个像：

struct VectorHash {
    size_t operator()(const std::vector<int>& v) const {
        std::hash<int> hasher;
        size_t seed = 0;
        for (int i : v) {
            seed ^= hasher(i) + 0x9e3779b9 + (seed<<6) + (seed>>2);
        }
        return seed;
    }
};

vector<vector<int>> two(){
    //unordered_set<vector<int>> s; // error vector<int> doesn't  have hash function
    unordered_set<vector<int>, VectorHash> s;
    s.insert({1, 2});
    s.insert({1, 3});
    s.insert({1, 2});

    for(const auto& vec:s)
        cout<<vec<<endl;
    // 1 2
    // 1 3
}

you can see that in some case unordered_setis more complicated.

你可以看到在某些情况下unordered_set更复杂。

Mainly cited from: https://stackoverflow.com/a/29855973/6329006

主要引用自：https: //stackoverflow.com/a/29855973/6329006

More difference between unordered_setand setsee this: https://stackoverflow.com/a/52203931/6329006

之间的更多区别unordered_set，set请参阅：https: //stackoverflow.com/a/52203931/6329006

C++ 中的 set 和 unordered_set 有什么区别？

提问by Ajeet Ganga

采纳答案by Yuushi

回答by dhaffey

回答by Tony Delroy

回答by Jayhello

相关推荐

最近更新

标签

C++ 中的 set 和 unordered_set 有什么区别？

提问by Ajeet Ganga

采纳答案by Yuushi

回答by dhaffey

回答by Tony Delroy

回答by Jayhello

相关推荐

磁盘空间？（已使用/免费/总计）我如何获得这个？在 C++ 中

C++ pic Vs pie 的区别

C++ 从数组中找到平均值

C++ 如何模拟“按任意键继续？”

相关推荐

最近更新

标签