C++ 为什么有人会使用 set 而不是 unordered_set?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1349734/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 19:39:05  来源:igfitidea点击:

Why would anyone use set instead of unordered_set?

c++algorithmdata-structuresc++11

提问by AraK

C++0x is introducing unordered_setwhich is available in boostand many other places. What I understand is that unordered_setis hash table with O(1)lookup complexity. On the other hand, setis nothing but a tree with log(n)lookup complexity. Why on earth would anyone use setinstead of unordered_set? i.e is there a need for setanymore?

C++0x 正在引入unordered_set可在boost许多其他地方使用的功能。我所理解的是unordered_set具有O(1)查找复杂性的哈希表。另一方面,set它只不过是具有log(n)查找复杂性的树。为什么会有人使用set而不是unordered_set?即有需要set吗?

回答by sth

Unordered sets have to pay for their O(1) average access time in a few ways:

无序集必须通过以下几种方式为其 O(1) 平均访问时间付出代价:

  • setuses less memorythan unordered_setto store the same number of elements.
  • For a small number of elements, lookups in a setmight be fasterthan lookups in an unordered_set.
  • Even though many operations are faster in the average casefor unordered_set, they are often guaranteed to have better worst case complexitiesfor set(for example insert).
  • That setsorts the elementsis useful if you want to access them in order.
  • You can lexicographically comparedifferent sets with <, <=, >and >=. unordered_sets are not required to support these operations.
  • set使用比存储相同数量的元素更少的内存unordered_set
  • 对于少量元素,在 a 中查找set可能比在 a 中查找unordered_set
  • 尽管很多操作都在更快的平均情况unordered_set,他们经常保证有更好的最坏情况复杂set(例如insert)。
  • set种种元素,如果你想将它们按顺序访问是有益的。
  • 您可以按字典顺序比较不同sets的<<=>>=unordered_sets 不需要支持这些操作。

回答by moonshadow

When, for someone who wants to iterate over the items of the set, the order matters.

对于想要迭代集合中的项目的人来说,顺序很重要。

回答by Mehrdad Afshari

Whenever you prefer a tree to a hash table.

每当您更喜欢树而不是哈希表时。

For instance, hash tables are "O(n)" at worst case. O(1) is the average case. Trees are "O(logn)" at worst.

例如,哈希表在最坏的情况下是“O(n)”。O(1) 是平均情况。树在最坏的情况下是“O(logn)”。

回答by Jayhello

Use set when:

在以下情况下使用 set:

  1. We need ordered data(distinct elements).
  2. We would have to print/access the data (in sorted order).
  3. We need predecessor/successor of elements.
  1. 我们需要有序数据(不同的元素)。
  2. 我们将不得不打印/访问数据(按排序顺序)。
  3. 我们需要元素的前身/后继。

Use unordered_set when:

在以下情况下使用 unordered_set:

  1. We need to keep a set of distinct elements and no ordering is required.
  2. We need single element access i.e. no traversal.
  1. 我们需要保留一组不同的元素并且不需要排序。
  2. 我们需要单元素访问,即没有遍历。

Examples:

例子:

set:

放:

Input : 1, 8, 2, 5, 3, 9

输入 : 1, 8, 2, 5, 3, 9

Output : 1, 2, 3, 5, 8, 9

输出:1、2、3、5、8、9

Unordered_set:

无序_set:

Input : 1, 8, 2, 5, 3, 9

输入 : 1, 8, 2, 5, 3, 9

Output : 9 3 1 8 2 5 (maybe this order, influenced by hash function)

输出:9 3 1 8 2 5(可能是这个顺序,受散列函数影响)

Mainly difference :

主要区别:

enter image description here

在此处输入图片说明

Note:(in some case setis more convenient) for example using vectoras key

注意:(在某些情况下set更方便)例如使用vector作为键

set<vector<int>> s;
s.insert({1, 2});
s.insert({1, 3});
s.insert({1, 2});

for(const auto& vec:s)
    cout<<vec<<endl;   // I have override << for vector
// 1 2
// 1 3 

The reason why vector<int>can be as key in setbecause vectoroverride operator<.

之所以vector<int>可以作为key inset是因为vectoroverride operator<

But if you use unordered_set<vector<int>>you have to create a hash function for vector<int>, because vector does't have a hash function, so you have to define one like:

但是如果你使用unordered_set<vector<int>>你必须为 建立一个散列函数vector<int>,因为 vector 没有散列函数,所以你必须定义一个像:

struct VectorHash {
    size_t operator()(const std::vector<int>& v) const {
        std::hash<int> hasher;
        size_t seed = 0;
        for (int i : v) {
            seed ^= hasher(i) + 0x9e3779b9 + (seed<<6) + (seed>>2);
        }
        return seed;
    }
};

vector<vector<int>> two(){
    //unordered_set<vector<int>> s; // error vector<int> doesn't  have hash function
    unordered_set<vector<int>, VectorHash> s;
    s.insert({1, 2});
    s.insert({1, 3});
    s.insert({1, 2});

    for(const auto& vec:s)
        cout<<vec<<endl;
    // 1 2
    // 1 3
}

you can see that in some case unordered_setis more complicated.

你可以看到在某些情况下unordered_set更复杂。

Mainly cited from: https://www.geeksforgeeks.org/set-vs-unordered_set-c-stl/https://stackoverflow.com/a/29855973/6329006

主要引用自:https: //www.geeksforgeeks.org/set-vs-unordered_set-c-stl/ https://stackoverflow.com/a/29855973/6329006

回答by Jayhello

Because std::set is part of Standard C++ and unordered_set isn't. C++0x is NOT a standard, and neither is Boost. For many of us, portability is essential, and that means sticking to the standard.

因为 std::set 是标准 C++ 的一部分,而 unordered_set 不是。C++0x 不是标准,Boost 也不是标准。对于我们中的许多人来说,便携性是必不可少的,这意味着坚持标准。

回答by ldog

Consider sweepline algorithms. These algorithms would fail utterly with hash tables, but work beautifully with balanced trees. To give you a concrete example of a sweepline algorithm consider fortune's algorithm. http://en.wikipedia.org/wiki/Fortune%27s_algorithm

考虑扫描线算法。这些算法在使用哈希表时会完全失败,但在使用平衡树时效果很好。为了给你一个扫描线算法的具体例子,请考虑财富算法。http://en.wikipedia.org/wiki/Fortune%27s_algorithm

回答by Blargle

One more thing, in addition to what other people already mentioned. While the expected amortized complexity for inserting an element to an unordered_set is O(1), every now and then it willtake O(n) because the hash-table needs to be restructured (the number of buckets needs to change) - even with a 'good' hash function. Just like inserting an element in a vector takes O(n) every now and then because the underlying array needs to be reallocated.

除了其他人已经提到的之外,还有一件事。虽然将元素插入到 unordered_set 的预期摊销复杂度是 O(1),但它时不时地需要 O(n),因为哈希表需要重组(桶的数量需要改变) - 即使有一个“好”的哈希函数。就像在向量中插入一个元素时不时需要 O(n) 一样,因为底层数组需要重新分配。

Inserting in a set always takes at most O(log n). This might be preferable in some applications.

插入一个集合总是最多需要 O(log n)。这在某些应用中可能更可取。

回答by Spectral

Pardon me, one more thing worth noticing about the sorted property:

对不起,关于 sorted 属性还有一件值得注意的事情:

If you want a range of datain container, for example: You stored time in set, and you want time from 2013-01-01 to 2014-01-01.

如果您想要容器中的一系列数据,例如:您将时间存储在set 中,并且您想要从 2013-01-01 到 2014-01-01 的时间。

For unordered_setit is impossible.

对于unordered_set是不可能的。

Of course, this example would be more convincing for usage cases between mapand unordered_map.

当然,这个例子对于mapunordered_map之间的用例会更有说服力。

回答by mic_e

While this answer might be 10 years late, it's worth pointing out that std::unordered_setalso has security downsides.

虽然这个答案可能晚了 10 年,但值得指出的是,它std::unordered_set也有安全隐患。

If the hash function is predictable (this is typically the case unless it applies counter-measures such as a randomized salt), attackers can hand-craft data that produces hash collisions and causes all insertions and look-ups to take O(n) time.

如果散列函数是可预测的(这通常是这种情况,除非它采用随机盐等对策),攻击者可以手工制作产生散列冲突的数据,并导致所有插入和查找花费 O(n) 时间.

This can be used for very efficient and elegant denial-of-service attacks.

这可用于非常有效和优雅的拒绝服务攻击。

Many (most?) implementations of languages that internally employ hash maps have run into this:

许多(大多数?)内部使用哈希映射的语言实现都遇到了这种情况:

回答by Rushyo

Off hand, I would say it is convenient to have things in a relationship if you're looking to convert it into a different format.

顺便说一句,如果您希望将其转换为不同的格式,那么在关系中建立关系是很方便的。

It is also possible that whilst one is faster to access, the time to build the index or the memory used when creating and/or accessing it is greater.

也有可能虽然访问速度更快,但构建索引的时间或创建和/或访问它时使用的内存更大。