C++ std::set 与用户定义的类型,如何确保没有重复

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1114856/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 18:50:37  来源:igfitidea点击:

std::set with user defined type, how to ensure no duplicates

c++set

提问by DeusAduro

So I have an std::set which needs to keep specific ordering as well as not allowing duplicates of a user defined (by me) type. Now I can get the order to work correctly by overloading the '<' operator in my type. However, the set does not appropriately detect duplicates, and to be honest I'm not entirely sure how it does this internally. I have overloaded the '==' operator, but somehow im not sure this is what the set is actually using? So the question is how does the set determine duplicates when you add values? Here is the relevant code:

所以我有一个 std::set 需要保持特定的顺序以及不允许用户定义(由我)类型的重复项。现在我可以通过在我的类型中重载 '<' 运算符来使订单正常工作。但是,该集合并没有适当地检测重复项,老实说,我不完全确定它在内部是如何做到这一点的。我重载了 '==' 运算符,但不知何故我不确定这是该集合实际使用的内容吗?所以问题是当你添加值时,集合如何确定重复项?这是相关的代码:

The user defined type:

用户定义类型:

//! An element used in the route calculation.
struct RouteElem {
    int shortestToHere; // Shortest distance from the start.
    int heuristic;      // The heuristic estimate to the goal.
    Coordinate position;
    bool operator<( const RouteElem& other ) const
    {
        return (heuristic+shortestToHere) < (other.heuristic+other.shortestToHere);
    }
    bool operator==( const RouteElem& other ) const
    {
        return (position.x == other.position.x && position.y == other.position.y);
    }
};

So the elements are equivalent when their position is equivalent, and an element is less than another if its combined functional is less than that of the other. The sorting works, but the set will accept two elements of the same position.

所以当元素的位置相等时,元素是等价的,如果一个元素的组合泛函小于另一个元素,则元素小于另一个元素。排序有效,但集合将接受相同位置的两个元素。

回答by Paul

operator==is not used by std::set. Elements aand bare considered equal iff !(a < b) && !(b < a)

operator==不使用std::set。元素ab被认为是相等的 iff!(a < b) && !(b < a)

回答by Mehrdad Afshari

std::setsupports specifying a comparison function. The default is lesswhich will use operator <to check equality. You can define a custom function to check equality and use that one instead:

std::set支持指定比较函数。默认值less将用于operator <检查相等性。您可以定义一个自定义函数来检查相等性并改用该函数:

std::set<RouteElem, mycomparefunction> myset; 

Note that it's not possible to separate the comparison function from sorting function. std::setis a binary tree and if an element in a binary tree is not bigger nor smaller than a specific element, it should be in the same place. It does something like this in the place finding algorithm:

请注意,无法将比较功能与排序功能分开。std::set是一棵二叉树,如果二叉树中的元素既不大于也不小于特定元素,则它应该在同一位置。它在位置查找算法中执行以下操作:

if (a < b) {
    // check the left subtree
} else if (b < a) {
    // check the right subtree
} else {
    // the element should be placed here.
}

回答by Steve Jessop

rlbond's comparator does not prevent the insertion of elements which compare equal. Apparently it's difficult to prove this in comments, given the character limit, because rlbond appears to thinks that std::set guarantees that it will never contain two elements with !compare(a,b) && !compare(b,a)for his comparator. However, rlbond's comparator does not define a strict order, and therefore is not a valid parameter to std::set.

rlbond 的比较器不会阻止比较相等的元素的插入。显然,鉴于字符限制,很难在注释中证明这一点,因为 rlbond 似乎认为 std::set 保证它永远不会包含两个元素 with!compare(a,b) && !compare(b,a)作为他的比较器。但是,rlbond 的比较器没有定义严格的顺序,因此不是 std::set 的有效参数。

#include <set>
#include <iostream>
#include <iterator>
#include <algorithm>

struct BrokenOrder {
    int order;
    int equality;

    public:
    BrokenOrder(int o, int e) : order(o), equality(e) {}

    bool operator<(const BrokenOrder &rhs) const {
        return order < rhs.order;
    }
    bool operator==(const BrokenOrder &rhs) const {
        return equality == rhs.equality;
    }
};

std::ostream &operator<<(std::ostream &stream, const BrokenOrder &b) {
    return stream << b.equality;
}

// rlbond's magic comparator
struct LessThan : public std::binary_function<BrokenOrder, BrokenOrder, bool> {
    bool operator()(const BrokenOrder& lhs, const BrokenOrder& rhs) const
    {
        return !(lhs == rhs) && (lhs < rhs);
    }
};

int main() {
    std::set<BrokenOrder,LessThan> s;
    for (int i = 0; i < 5; ++i) {
        s.insert(BrokenOrder(i,i));
    }
    for (int i = 0; i < 5; ++i) {
        s.insert(BrokenOrder(10-i,i));
    }
    std::copy(s.begin(), s.end(), 
        std::ostream_iterator<BrokenOrder>(std::cout, "\n"));
}

Output:

输出:

0
1
2
3
4
3
2
1
0

Duplicates. The magic comparator has failed. Different elements in the set have the same value of equality, and hence compare the same with operator==, because during insertion the set never compared the new element against its duplicate. The only duplicate which was excluded was 4, because the two 4's had sort orders 4 and 6. This put them close enough together in the set to be compared against each other.

重复。魔术比较器失败。集合中的不同元素具有相同的 值equality,因此与 比较相同operator==,因为在插入期间,集合从未将新元素与其副本进行比较。唯一被排除的重复项是 4,因为这两个 4 的排序顺序是 4 和 6。这使它们在集合中靠得足够近,可以相互比较。

From the C++ standard: 25.3:3 "For the algorithms to work correctly, comphas to induce a strict weak ordering on the values".

来自 C++ 标准:25.3:3“为了使算法正常工作,comp必须对值进行严格的弱排序”。

25.3:4 "... the requirements are that comp and equiv both be transitive relations:

25.3:4 "...要求是 comp 和 equiv 都是传递关系:

comp(a,b) && comp(b,c) implies comp(a,c)"

Now, consider the elements a = BrokenOrder(1,1), b = BrokenOrder(2,2), and c = BrokenOrder(9,1), and compof course equal to the magic comparator. Then:

现在,考虑元素a = BrokenOrder(1,1)b = BrokenOrder(2,2)、 和c = BrokenOrder(9,1)comp当然等于魔术比较器。然后:

  • comp(a,b)is true since 1 != 2 (equality) and 1 < 2 (order)
  • comp(b,c)is true since 2 != 1 (equality) and 2 < 9 (order)
  • comp(a,c)is false since 1 == 1 (equality)
  • comp(a,b)为真,因为 1 != 2(相等)和 1 < 2(顺序)
  • comp(b,c)为真,因为 2 != 1(相等)和 2 < 9(顺序)
  • comp(a,c)为假,因为 1 == 1(相等)

回答by Greg Hewgill

The STL set implementation does something conceptually like this to detect equality:

STL 集合实现在概念上做类似这样的事情来检测相等性:

bool equal = !(a < b) && !(b < a);

That is, if two elements are both not less than the other, then they must be equal. You may be able to check this by setting a breakpoint on your operator==()method and checking to see whether it ever gets called at all.

也就是说,如果两个元素都不小于另一个,那么它们必须相等。您可以通过在您的operator==()方法上设置断点并检查它是否曾经被调用来检查这一点。

I would generally be suspicious of comparison operators that check completely different things. Your <operator is defined in terms of two things that are separate from how your ==operator is defined. Generally you will want such comparisons to use consistent information.

我通常会怀疑比较运算符检查完全不同的东西。您的<运营商是根据与您的运营商的定义方式不同的两件事来==定义的。通常,您会希望此类比较使用一致的信息。

回答by Steve Jessop

You could try something like the following:

您可以尝试以下操作:

//! An element used in the route calculation.
struct RouteElem {
    int shortestToHere; // Shortest distance from the start.
    int heuristic;              // The heuristic estimate to the goal.
    Coordinate position;
    bool operator<( const RouteElem& other ) const
    {
      return (heuristic+shortestToHere) < (other.heuristic+other.shortestToHere);
    }
    bool operator==( const RouteElem& other ) const
    {
      return (position.x == other.position.x && position.y == other.position.y);
    }
};

struct CompareByPosition {
    bool operator()(const RouteElem &lhs, const RouteElem &rhs) {
        if (lhs.position.x != rhs.position.x) 
            return lhs.position.x < rhs.position.x;
        return lhs.position.y < rhs.position.y;
    }
};

// first, use std::set to remove duplicates
std::set<RouteElem,CompareByPosition> routeset;
// ... add each RouteElem to the set ...

// now copy the RouteElems into a vector
std::vector<RouteElem> routevec(routeset.begin(), routeset.end());

// now sort via operator<
std::sort(routevec.begin(), routevec.end());

Obviously there's the copy in the middle, which looks slow. But any structure which indexes items by two different criteria is therefore going to have some kind of extra overhead per item compared with a set. The whole of the code above is O(n log n), assuming that your implementation of std::sort uses introsort.

显然中间有副本,看起来很慢。但是,与集合相比,任何按两个不同标准对项目进行索引的结构都会因此每个项目具有某种额外的开销。上面的整个代码是 O(n log n),假设您的 std::sort 实现使用了 introsort。

If you have it, under this scheme you could use unordered_setinstead of setto do the initial uniqueifying. Since the hash would only have to depend on x and y, it should be faster than the O(log N) comparisons required to insert into a set.

如果你有它,在这个方案下你可以使用unordered_set而不是set做初始唯一化。由于散列只需要依赖于 x 和 y,它应该比插入集合所需的 O(log N) 比较快。

Edit: just noticed that you said you wanted to "keep" sort order, not that you wanted to process everything in a batch. Sorry about that. If you want to efficiently maintain order and exclude duplicates while adding elements, then I would recommend using the set or unordered set I define above, based on position, and also a std::multiset<RouteElem>, which will maintain the operator<order. For each new element, do:

编辑:只是注意到你说你想“保持”排序顺序,而不是你想批量处理所有东西。对于那个很抱歉。如果您想在添加元素时有效地维护顺序并排除重复项,那么我建议使用我在上面定义的 set 或 unordered set,基于位置,还有 a std::multiset<RouteElem>,它将维护operator<顺序。对于每个新元素,请执行以下操作:

if (routeset.insert(elem).second) {
    routemultiset.insert(elem);
}

Although beware that this offers no exception guarantee. If the second insert throws, then the routeset has been modified, so the state is no longer consistent. So I guess really you need:

尽管请注意,这不提供任何例外保证。如果第二次插入抛出,则路由集已经被修改,因此状态不再一致。所以我想你真的需要:

if (routeset.insert(elem).second) {
    try {
        routemultiset.insert(elem); // I assume strong exception guarantee
    } catch(...) {
        routeset.erase(elem); // I assume nothrow. Maybe should check those.
        throw;
    }
}

Or an equivalent with RAII, which will be more verbose if there's only one place in your code you ever use the RAII class, but better if there's much repetition.

或者与 RAII 等效,如果您的代码中只有一个地方使用过 RAII 类,则会更加冗长,但如果有很多重复,则更好。

回答by rlbond

Beware of the ramifications of this. It looks like you are trying to do something like A*, and if you try to insert a "duplicate" it will be ignored, even if there is a "better" route.

当心这件事的后果。看起来您正在尝试执行 A* 之类的操作,如果您尝试插入“重复”,即使有“更好”的路线,它也会被忽略。

NOTE: This solution doesn't work, see onebyone's explanation below

注意:此解决方案不起作用,请参阅下面的某人的解释

struct RouteElem 
{
    int shortestToHere; // Shortest distance from the start.
    int heuristic;              // The heuristic estimate to the goal.
    Coordinate position;
    bool operator<( const RouteElem& other ) const
    {
        return (heuristic+shortestToHere) < (other.heuristic+other.shortestToHere);
    }
    bool operator==( const RouteElem& other ) const
    {
        return (position.x == other.position.x && position.y == other.position.y);
    }
};

struct RouteElemLessThan : public std::binary_function<RouteElem, RouteElem, bool>
{
    bool operator()(const RouteElem& lhs, const RouteElem& rhs) const
    {
        return !(lhs == rhs) && (lhs < rhs);
    }
};

std::set<RouteElem, RouteElemLessThan> my_set;