C++ 检查向量中的重复项
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2860634/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Checking for duplicates in a vector
提问by xbonez
Possible Duplicate:
Determining if an unordered vector<T> has all unique elements
I have to check a vector for duplicates. What is the best way to approach this:
我必须检查向量是否有重复项。解决这个问题的最佳方法是什么:
I take the first element, compare it against all other elements in the vector. Then take the next element and do the same and so on.
我取第一个元素,将它与向量中的所有其他元素进行比较。然后取下一个元素并执行相同的操作,依此类推。
Is this the best way to do it, or is there a more efficient way to check for dups?
这是最好的方法,还是有更有效的方法来检查重复?
采纳答案by IVlad
Use a hash tablein which you insert each element. Before you insert an element, check if it's already there. If it is, you have yourself a duplicate. This is O(n)
on average, but the worst case is just as bad as your current method.
使用一个哈希表,您可以在其中插入每个元素。在插入元素之前,请检查它是否已经存在。如果是,那么您就有了自己的副本。这是O(n)
平均水平,但最坏的情况与您当前的方法一样糟糕。
Alternatively, you can use a setto do the same thing in O(n log n)
worst case. This is as good as the sorting solution, except it doesn't change the order of the elements (uses more memory though since you create a set).
或者,在最坏的情况下,您可以使用set来做同样的事情O(n log n)
。这与排序解决方案一样好,只是它不会改变元素的顺序(尽管创建了一个集合会使用更多的内存)。
Another way is to copy your vector to another vector, sort that and check the adjacent elements there. I'm not sure if this is faster than the set solution, but I think sorting adds less overhead than the balanced search trees a set uses so it should be faster in practice.
另一种方法是将您的向量复制到另一个向量,对其进行排序并检查那里的相邻元素。我不确定这是否比集合解决方案更快,但我认为排序比集合使用的平衡搜索树增加的开销更少,因此在实践中应该更快。
Of course, if you don't care about keeping the original order of the elements, just sort the initial vector.
当然,如果你不关心保持元素的原始顺序,只需对初始向量进行排序即可。
回答by Patrick
If your vector is an STL container, the solution is easy:
如果您的向量是 STL 容器,则解决方案很简单:
std::sort(myvec.begin(), myvec.end());
std::erase(std::unique(myvec.begin(), myvec.end()), myvec.end());
According to cppreference (https://en.cppreference.com/w/cpp/algorithm/unique), the elements are shifted around so that the values from myvec.begin()
to the return value of std::unique
are all unique. The elements after the iterator returned by std::unique
are unspecified (useless in every use-case I've seen) so remove them from the std::vector<A>
using std::vector<A>::erase
.
根据 cppreference ( https://en.cppreference.com/w/cpp/algorithm/unique),元素会四处移动,以便从myvec.begin()
到 返回值的值std::unique
都是唯一的。返回的迭代器之后的元素std::unique
是未指定的(在我见过的每个用例中都没有用),因此将它们从std::vector<A>
using 中删除std::vector<A>::erase
。
回答by KeithB
Sorting and then comparing adjacent elements is the way to go. A sort takes O(n log n) comparisons, an then an additional n-1 to compare adjacent elements.
排序然后比较相邻元素是要走的路。排序需要 O(n log n) 次比较,然后额外的 n-1 次比较相邻元素。
The scheme in the question would take (n^2)/2 comparisons.
问题中的方案将进行 (n^2)/2 次比较。
回答by LoudNPossiblyWrong
You can also use binary_search.
您也可以使用 binary_search。
Here are two good examples that will help you:
这里有两个很好的例子可以帮助你:
http://www.cplusplus.com/reference/algorithm/binary_search/
http://www.cplusplus.com/reference/algorithm/binary_search/
回答by Mark Ransom
If you don't care about an occasional false positive, you can use a Bloom Filterto detect probable duplicates in the collection. If false positives can't be accepted, take the values that fail the filter and run a second detection pass on those. The list of failed values should be fairly small, although they will need to be checked against the full input.
如果您不关心偶尔的误报,您可以使用布隆过滤器来检测集合中可能的重复项。如果不能接受误报,则取未通过过滤器的值并对这些值运行第二次检测。失败值的列表应该相当小,尽管它们需要根据完整的输入进行检查。