C++ 找到集合并集的最快方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11362002/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
The fastest way to find union of sets
提问by Damir
I have sets of pairs of int like
set<pair<int,int> > x1, x2, ... xn
( n can be between 2 and 20). What is the fastest way to find union of those sets ?
我有成对的 int 像
set<pair<int,int> > x1, x2, ... xn
(n 可以在 2 到 20 之间)。找到这些集合的并集的最快方法是什么?
Sorry If I wasn't make clear at the beginning, I meant fast in performance, memory allocation is not a problem.
对不起,如果我一开始没有说清楚,我的意思是性能快,内存分配不是问题。
采纳答案by Richard J. Ross III
Unfortunately, I believe that you are limited to a linear O(N)
solution, as all a union would be is a combination of the elements in both sets.
不幸的是,我相信您仅限于线性O(N)
解决方案,因为所有联合都是两个集合中元素的组合。
template<typename S>
S union_sets(const S& s1, const S& s2)
{
S result = s1;
result.insert(s2.cbegin(), s2.cend());
return result;
}
回答by Steve Jessop
Assuming that the result needs to be a set too, then you have no choice but to insert every element of each x_i
into that result set. So the obvious implementation is:
假设结果也需要是一个集合,那么您别无选择,只能将每个元素的每个元素插入x_i
到该结果集中。所以明显的实现是:
set<pair<int,int>> x(x1);
x.insert(x2.begin(), x2.end());
// etc
The remaining question is whether this can be beaten for speed.
剩下的问题是这是否可以在速度上被击败。
The single-element insert
takes a position
hint, which if correctspeeds up insertion. So it mightturn out that something like this is faster than x.insert(x2.begin(), x2.end());
:
单个元素insert
需要一个position
提示,如果正确,可以加快插入速度。所以结果可能是这样的事情比x.insert(x2.begin(), x2.end());
:
auto pos = x.begin()
for (auto it = x2.begin(); it != x2.end(); ++it) {
pos = x.insert(pos, *it);
}
It depends on the data, though: that position may or may not be accurate. You can ensure that it is by putting all the elements in order before you start, for which the best tool is probably set_union
. That might better be named merge_and_dedupe_sorted_ranges
, because what it does has nothing particularly to do with std::set
. You could either set_union
into intermediate vectors, or else into sets like this:
不过,这取决于数据:该位置可能准确,也可能不准确。您可以通过在开始之前将所有元素按顺序排列来确保它,最好的工具可能是set_union
. 最好将其命名为merge_and_dedupe_sorted_ranges
,因为它的作用与std::set
. 您可以set_union
转换为中间向量,也可以转换为如下集合:
set<pair<int,int>> x;
set_union(x1.begin(), x1.end(), x2.begin(), x2.end(), inserter(x, x.end());
My concern with using set_union
is that in order to get the benefit of adding the elements to a set in increasing order, you need to create a new empty container each time you call it (because if it's not empty then the elements added need to interleave with the values already in it). The overhead of these containers might be higher than the overhead of inserting into a set in arbitrary order: you would have to test it.
我对使用的担忧set_union
是,为了获得按递增顺序将元素添加到集合的好处,每次调用它时都需要创建一个新的空容器(因为如果它不为空,则添加的元素需要与已经在其中的值)。这些容器的开销可能高于以任意顺序插入集合的开销:您必须对其进行测试。
回答by Rafael Baptista
Find the union of the smallest sets first. That is order your sets by set length, compute the union of the two smallest sets, delete those sets, insert the union into your set list according it its size.
首先找到最小集合的并集。即按集合长度对集合进行排序,计算两个最小集合的并集,删除这些集合,根据其大小将并集插入到集合列表中。
If you had a measurement of how similar two sets are likely to be then you best bet there would be to first find the union of the most similar sets first. That is prefer union operations that eliminate duplicates early.
如果您测量了两个集合的相似程度,那么您最好先找到最相似集合的并集。那是更喜欢早期消除重复的联合操作。
Edit: And for each union operation between two sets - merge the smaller set into the bigger set.
编辑:对于两个集合之间的每个联合操作 - 将较小的集合合并到较大的集合中。
回答by Sebastian Mach
I assume with fastyou mean fast to implement.
我认为fast您的意思是快速实施。
Then: std::set_union(*)
然后:std::set_union(*)
Example for two sets:
两组示例:
#include <set>
#include <algorithm>
#include <iterator>
using namespace std;
int main () {
set<pair<int,int> > a, b, uni;
set_union (a.begin(), a.end(),
b.begin(), b.end(),
inserter(uni, uni.begin()));
}
for n sets, hand writing it might be the most maintainable solution:
对于 n 个集合,手写它可能是最易于维护的解决方案:
#include <set>
#include <vector>
using namespace std;
int main () {
vector<set<pair<int,int>>> sets;
set<pair<int,int>> uni;
for (const auto &s : sets)
for (const auto &elem : s)
uni.insert (elem);
}
though in general, one should prefer standard algorithms and profit from their quality implementation.
虽然一般来说,人们应该更喜欢标准算法并从它们的质量实现中获益。
If by fastyou mean performance, we can't help as we don't have the requirements. Different approaches might give different results for different circumstances.
如果您所说的快速是指性能,我们无能为力,因为我们没有要求。对于不同的情况,不同的方法可能会给出不同的结果。
(*) note: the site is frowned upon sometimes for not being 100% accurate vs. the standard
(*) 注意:该网站有时会因为与标准相比不是 100% 准确而皱眉
回答by Anon Mail
Try the set_union in the header algorithm.
尝试头部算法中的 set_union。
回答by MadScientist
You could use std::set_unionrecursively or simply insert all sets into a result set (duplicate items are eliminated by the set). If the number of items is very small you can try to insert it all into a vector, sorting it and use std::uniqueon the vector.
您可以 递归地使用std::set_union或简单地将所有集合插入到结果集中(重复项被集合消除)。如果项目的数量非常少,您可以尝试将其全部插入向量中,对其进行排序并在向量上使用 std::unique。
回答by ecatmur
To save on memory allocations and improve locality, it'd be better to use a single vector<T>
as working memory.
为了节省内存分配并提高局部性,最好使用单个vector<T>
作为工作内存。
Construct a vector<T>
and reserve the total number of elements in all of the s (counting duplicates). Then, starting with the empty range [v.begin(), v.begin())
, extend it to a set-like (unique, sorted) range by appending the contents of each set, merging and uniquifying:
构造 avector<T>
并保留所有 s 中的元素总数(计算重复项)。然后,从空的 range 开始,[v.begin(), v.begin())
通过附加每个集合的内容,合并和唯一化,将其扩展为一个类似集合的(唯一的,排序的)范围:
vector<T> v;
v.reserve(<total size>);
for (set<T> &s: sets) {
auto middle = v.insert(v.end(), s.begin(), s.end());
inplace_merge(v.begin(), middle, v.end());
v.erase(v.unique(v.begin(), v.end()), v.end());
}