在 C++ 中实现不相交集（联合查找）

Question

提问by Isaac

I am trying to implement Disjoint Sets for use in Kruskal's algorithm, but I am having trouble understanding exactly how it should be done and in particular, how to manage the forest of trees. After reading the Wikipedia description of Disjoint Sets and after reading the description in Introduction to Algorithms (Cormen et al) I have come up with the following:

我正在尝试实现不相交集以在 Kruskal 算法中使用，但我无法准确理解它应该如何完成，特别是如何管理树木森林。在阅读了维基百科对不相交集的描述并阅读了算法简介（Cormen 等人）中的描述后，我得出了以下结论：

    class DisjointSet
    {
    public:
        class   Node 
        {
        public:
            int         data;
            int         rank;

            Node*       parent;

            Node() : data(0), 
                     rank(0),
                     parent(this) { } // the constructor does MakeSet
        };

        Node*   find(Node*);
        Node*   merge(Node*, Node*); // Union
    };


    DisjointSet::Node*   DisjointSet::find(DisjointSet::Node* n)
    {
        if (n != n->parent) {
            n->parent = find(n->parent);
        }
        return n->parent;
    }

    DisjointSet::Node* DisjointSet::merge(DisjointSet::Node* x,
                                          DisjointSet::Node* y)
    {
        x = find(x);
        y = find(y);

        if (x->rank > y->rank) {
            y->parent = x;
        } else {
            x->parent = y;
            if (x->rank == y->rank) {
                ++(y->rank);
            }
        }
    }

I am pretty sure this is incomplete though and that I am missing something.

我很确定这是不完整的，我错过了一些东西。

Introduction to Algorithms mentions that there should be a forest of trees, but it does not give any explanation for a practical implementation of this forest. I watched CS 61B Lecture 31: Disjoint Sets ( http://www.youtube.com/watch?v=wSPAjGfDl7Q) and here the lecturer uses only an array to store both the forest and all its trees and values. There is no explicit 'Node' type of class as I have, mentioned. I have also found many other sources (I cannot post more than one link), which also use this technique. I would be happy to do this, except that this relies on the indices of the array for lookup and since I want to store values of type other than int, I need to use something else (std::map comes to mind).

算法导论提到应该有一片树林，但它没有对这片森林的实际实现给出任何解释。我观看了 CS 61B 第 31 课：不相交集 ( http://www.youtube.com/watch?v=wSPAjGfDl7Q)，在这里讲师仅使用一个数组来存储森林及其所有树木和值。没有我提到的明确的“节点”类型的类。我还发现了许多其他来源（我不能发布多个链接），它们也使用了这种技术。我很乐意这样做，除了这依赖于数组的索引进行查找，并且由于我想存储 int 以外的类型的值，我需要使用其他东西（我想到了 std::map）。

Another issue that I am unsure about is memory allocation because I am using C++. I am storing trees of pointers and my MakeSet operation will be: new DisjointSet::Node; . Now, these Nodes only have pointers to their parents, so I'm not sure how to find the bottom of a tree. How will I be able to traverse my trees to deallocate them all?

我不确定的另一个问题是内存分配，因为我使用的是 C++。我正在存储指针树，我的 MakeSet 操作将是： new DisjointSet::Node; . 现在，这些节点只有指向其父节点的指针，所以我不确定如何找到树的底部。我将如何遍历我的树以将它们全部释放？

I understand the basic concept of this data structure, but I'm just a bit confused about the implementation. Any advice and suggestions would be most welcome, thank you.

我了解这个数据结构的基本概念，但我对实现有点困惑。欢迎任何意见和建议，谢谢。

Answer 1

回答by Stuart Golodetz

Not a perfect implementation by any means (I did write it after all!), but does this help?

无论如何都不是一个完美的实现（毕竟我确实写了它！），但这有帮助吗？

/***
 * millipede: DisjointSetForest.h
 * Copyright Stuart Golodetz, 2009. All rights reserved.
 ***/

#ifndef H_MILLIPEDE_DISJOINTSETFOREST
#define H_MILLIPEDE_DISJOINTSETFOREST

#include <map>

#include <common/exceptions/Exception.h>
#include <common/io/util/OSSWrapper.h>
#include <common/util/NullType.h>

namespace mp {

/**
@brief  A disjoint set forest is a fairly standard data structure used to represent the partition of
        a set of elements into disjoint sets in such a way that common operations such as merging two
        sets together are computationally efficient.

This implementation uses the well-known union-by-rank and path compression optimizations, which together
yield an amortised complexity for key operations of O(a(n)), where a is the (extremely slow-growing)
inverse of the Ackermann function.

The implementation also allows clients to attach arbitrary data to each element, which can be useful for
some algorithms.

@tparam T   The type of data to attach to each element (arbitrary)
*/
template <typename T = NullType>
class DisjointSetForest
{
    //#################### NESTED CLASSES ####################
private:
    struct Element
    {
        T m_value;
        int m_parent;
        int m_rank;

        Element(const T& value, int parent)
        :   m_value(value), m_parent(parent), m_rank(0)
        {}
    };

    //#################### PRIVATE VARIABLES ####################
private:
    mutable std::map<int,Element> m_elements;
    int m_setCount;

    //#################### CONSTRUCTORS ####################
public:
    /**
    @brief  Constructs an empty disjoint set forest.
    */
    DisjointSetForest()
    :   m_setCount(0)
    {}

    /**
    @brief  Constructs a disjoint set forest from an initial set of elements and their associated values.

    @param[in]  initialElements     A map from the initial elements to their associated values
    */
    explicit DisjointSetForest(const std::map<int,T>& initialElements)
    :   m_setCount(0)
    {
        add_elements(initialElements);
    }

    //#################### PUBLIC METHODS ####################
public:
    /**
    @brief  Adds a single element x (and its associated value) to the disjoint set forest.

    @param[in]  x       The index of the element
    @param[in]  value   The value to initially associate with the element
    @pre
        -   x must not already be in the disjoint set forest
    */
    void add_element(int x, const T& value = T())
    {
        m_elements.insert(std::make_pair(x, Element(value, x)));
        ++m_setCount;
    }

    /**
    @brief  Adds multiple elements (and their associated values) to the disjoint set forest.

    @param[in]  elements    A map from the elements to add to their associated values
    @pre
        -   None of the elements to be added must already be in the disjoint set forest
    */
    void add_elements(const std::map<int,T>& elements)
    {
        for(typename std::map<int,T>::const_iterator it=elements.begin(), iend=elements.end(); it!=iend; ++it)
        {
            m_elements.insert(std::make_pair(it->first, Element(it->second, it->first)));
        }
        m_setCount += elements.size();
    }

    /**
    @brief  Returns the number of elements in the disjoint set forest.

    @return As described
    */
    int element_count() const
    {
        return static_cast<int>(m_elements.size());
    }

    /**
    @brief  Finds the index of the root element of the tree containing x in the disjoint set forest.

    @param[in]  x   The element whose set to determine
    @pre
        -   x must be an element in the disjoint set forest
    @throw Exception
        -   If the precondition is violated
    @return As described
    */
    int find_set(int x) const
    {
        Element& element = get_element(x);
        int& parent = element.m_parent;
        if(parent != x)
        {
            parent = find_set(parent);
        }
        return parent;
    }

    /**
    @brief  Returns the current number of disjoint sets in the forest (i.e. the current number of trees).

    @return As described
    */
    int set_count() const
    {
        return m_setCount;
    }

    /**
    @brief  Merges the disjoint sets containing elements x and y.

    If both elements are already in the same disjoint set, this is a no-op.

    @param[in]  x   The first element
    @param[in]  y   The second element
    @pre
        -   Both x and y must be elements in the disjoint set forest
    @throw Exception
        -   If the precondition is violated
    */
    void union_sets(int x, int y)
    {
        int setX = find_set(x);
        int setY = find_set(y);
        if(setX != setY) link(setX, setY);
    }

    /**
    @brief  Returns the value associated with element x.

    @param[in]  x   The element whose value to return
    @pre
        -   x must be an element in the disjoint set forest
    @throw Exception
        -   If the precondition is violated
    @return As described
    */
    T& value_of(int x)
    {
        return get_element(x).m_value;
    }

    /**
    @brief  Returns the value associated with element x.

    @param[in]  x   The element whose value to return
    @pre
        -   x must be an element in the disjoint set forest
    @throw Exception
        -   If the precondition is violated
    @return As described
    */
    const T& value_of(int x) const
    {
        return get_element(x).m_value;
    }

    //#################### PRIVATE METHODS ####################
private:
    Element& get_element(int x) const
    {
        typename std::map<int,Element>::iterator it = m_elements.find(x);
        if(it != m_elements.end()) return it->second;
        else throw Exception(OSSWrapper() << "No such element: " << x);
    }

    void link(int x, int y)
    {
        Element& elementX = get_element(x);
        Element& elementY = get_element(y);
        int& rankX = elementX.m_rank;
        int& rankY = elementY.m_rank;
        if(rankX > rankY)
        {
            elementY.m_parent = x;
        }
        else
        {
            elementX.m_parent = y;
            if(rankX == rankY) ++rankY;
        }
        --m_setCount;
    }
};

}

#endif

Answer 2

回答by THK

Boost has a disjoint set implementation:

Boost 有一个不相交的集合实现：

http://www.boost.org/doc/libs/1_54_0/libs/disjoint_sets/disjoint_sets.html

Answer 3

回答by Murilo Vasconcelos

your implementation looks good (except in the function merge you should either declare returning void or put a return there, I prefer to return void). The thing is you need to track the Nodes*. You can do that by having a vector<DisjointSet::Node*>on your DisjointSetclass or having this vectorsomewhere else and declaring the methods of DisjointSetas static.

您的实现看起来不错（除了在函数合并中，您应该声明返回 void 或将返回放在那里，我更喜欢返回 void）。问题是您需要跟踪Nodes*. 您可以通过vector<DisjointSet::Node*>在您的DisjointSet班级上使用 a或在vector其他地方使用 this并声明DisjointSetas的方法来做到这一点static。

Here is an example of run (note that I changed merge to return void and didn't change the methods of DisjointSetto be static:

这是运行的示例（请注意，我更改了 merge 以返回 void 并且没有更改DisjointSetto be的方法static：

int main()
{
    vector<DisjointSet::Node*> v(10);
    DisjointSet ds;

    for (int i = 0; i < 10; ++i) {
        v[i] = new DisjointSet::Node();
        v[i]->data = i;
    }

    int x, y;

    while (cin >> x >> y) {
        ds.merge(v[x], v[y]);
    }

    for (int i = 0; i < 10; ++i) {
        cout << v[i]->data << ' ' << v[i]->parent->data << endl;
    }

    return 0;
}

With this input:

有了这个输入：

It prints the expected:

它打印预期的：

Your forest is the composition of the trees:

你的森林是树木的组成：

   7    1       5    6   9
 /    / | \              |
0    2  3  4             8

So you algorithm is good, have the best complexity for Union-find as far as I know and you keep track of your Nodes on your vector. So you could simply deallocate:

所以你的算法很好，据我所知，Union-find 的复杂度是最好的，你Node在你的vector. 所以你可以简单地释放：

for (int i = 0; i < int(v.size()); ++i) {
    delete v[i];
}

Answer 4

回答by Puppy

I can't speak about the algorithm, but for memory management, typically you will use something called a smart pointer, which will free what it points to. You can get shared ownership and single ownership smart pointers, and non-ownership too. Correct use of these will guarantee no memory issues.

我不能谈论算法，但对于内存管理，通常您会使用称为智能指针的东西，它将释放它指向的内容。您可以获得共享所有权和单一所有权智能指针，也可以获得非所有权。正确使用这些将保证没有内存问题。

Answer 5

回答by dionyziz

Your implementation is fine. All you now need to do is to keep an array of disjoint set Nodes so that you can call your union/find methods on them.

你的实现很好。您现在需要做的就是保留一组不相交的集合节点，以便您可以对它们调用 union/find 方法。

For Kruskal's algorithm, you'll need an array containing one disjoint-set Node per graph vertex. Then, when you look up the next edge to add to your subgraph, you would use the find method to check if those nodes are both in your subgraph already. If they are, then you can move on to the next edge. Otherwise, it's time to add that edge to your subgraph and perform a union operation between the two vertices connected by that edge.

对于 Kruskal 算法，您需要一个数组，每个图顶点包含一个不相交集节点。然后，当您查找要添加到子图中的下一条边时，您将使用 find 方法检查这些节点是否都已在您的子图中。如果是，那么您可以移动到下一个边缘。否则，是时候将该边添加到您的子图中并在由该边连接的两个顶点之间执行联合操作了。

Answer 6

回答by user3310464

This blog article shows a C++ implementation using path compression: http://toughprogramming.blogspot.com/2013/04/implementing-disjoint-sets-in-c.html

这篇博客文章展示了使用路径压缩的 C++ 实现：http: //toughprogramming.blogspot.com/2013/04/implementing-disjoint-sets-in-c.html

Answer 7

回答by user1423561

Have a look at this code

看看这个代码

class Node {
    int id,rank,data;
    Node *parent;

public :

    Node(int id,int data) {
        this->id = id;
        this->data = data;
        this->rank =0;
        this->parent = this;
    }

    friend class DisjointSet;
};

class DisjointSet {
    unordered_map<int,Node*> forest;

    Node *find_set_helper(Node *aNode) {
        if( aNode->parent != aNode)
            aNode->parent = find_set_helper(aNode->parent);

        return aNode->parent;
    }

    void link(Node *xNode,Node *yNode) {
        if( xNode->rank > yNode->rank)
            yNode->parent = xNode;
        else if(xNode-> rank < yNode->rank)
            xNode->parent = yNode;
        else {
            xNode->parent = yNode;
            yNode->rank++;
        }
    }

public:
    DisjointSet() {

    }

    void make_set(int id,int data) {
        Node *aNode = new Node(id,data);
        this->forest.insert(make_pair(id,aNode));
    }

    void Union(int xId, int yId) {
        Node *xNode = find_set(xId);
        Node *yNode = find_set(yId);

        if(xNode && yNode)
            link(xNode,yNode);
    }

    Node* find_set(int id) {
        unordered_map<int,Node*> :: iterator itr = this->forest.find(id);
        if(itr == this->forest.end())
            return NULL;

        return this->find_set_helper(itr->second);
    }

    void print() {
        unordered_map<int,Node*>::iterator itr;
        for(itr = forest.begin(); itr != forest.end(); itr++) {
            cout<<"\nid : "<<itr->second->id<<"  parent :"<<itr->second->parent->id;
        }
    }

    ~DisjointSet(){
        unordered_map<int,Node*>::iterator itr;
        for(itr = forest.begin(); itr != forest.end(); itr++) {
            delete (itr->second);
        }
    }

};

Answer 8

回答by Spectral

For implementing Disjoint Sets from scratch, I highly recommend reading the book of Data Structures & Algorithm Analysis in C++by Mark A. Weiss.

从头开始实施独立集合，我强烈建议你阅读的书数据结构与算法分析C ++由马克·魏斯。

In Chap 8, it starts from basic find / union, then it gradually adds union by height / depth / rank, and find compression. In the end, it provides Big-O analysis.

在第8章中，从基本的find/union开始，然后通过height/depth/rank逐步增加union，并找到压缩。最后，它提供了 Big-O 分析。

Trust me, it has everything you want to know about Disjoint Sets and its C++ implementation.

相信我，它拥有您想了解的有关 Disjoint Sets 及其 C++ 实现的所有信息。

Answer 9

回答by Spectral

The following code seems simple to understand for implementing union-find disjoints sets by path compression

以下代码似乎很容易理解，用于通过路径压缩实现联合查找不相交集

int find(int i)
{
    if(parent[i]==i)
    return i;
    else
    return parent[i]=find(parent[i]);
}
void union(int a,int b)
{
    x=find(a);y=find(b);
        if(x!=y)
        {
            if(rank[x]>rank[y])
            parent[y]=x;
            else
            {
            parent[x]=y;
            if(rank[x]==rank[y])
            rank[y]+=1;             
            }
        }
}

Answer 10

回答by 95_96

If you are trying to ask which style is better to imlement disjoint set (vector or map(rb tree) ) , then i may have something to add

如果您想询问哪种样式更适合实现不相交集（向量或映射（rb 树）），那么我可能需要添加一些内容

make_set (int key , node info ): this is generally a member function to (1) add a node and (2) make node point to itself ( parent = key) , this intially creates a disjoint set. operation time complexity for vector O(n) , for map O(n*logn) .
find_set( int key ): this has two fuctions generally , (1) finding the node through given key (2) path compression . I couldnt really calculate it for path compression , but for simply searching for the node , the time complexity for (1) vector O(1) and for (2) map O(log(n)) .

make_set (int key , node info )：这通常是一个成员函数，用于 (1) 添加一个节点和 (2) 使节点指向自身 (parent = key)，这最初会创建一个不相交的集合。向量 O(n) 的操作时间复杂度，映射 O(n*logn) 。
find_set( int key )：这通常有两个功能，（1）通过给定的键找到节点（2）路径压缩。我无法真正为路径压缩计算它，但是为了简单地搜索节点，(1) 向量 O(1) 和 (2) 映射 O(log(n)) 的时间复杂度。

Concludingly , i would like to say that although looking over here , vector implementation looks better, time complexities of both are O(M*α(n)) ≈ O(M*5) or so i have read .

ps. do verify what i have written though i am pretty sure it's correct .

最后，我想说，虽然看这里，矢量实现看起来更好，但两者的时间复杂度都是 O(M*α(n)) ≈ O(M*5) 左右，我读过。

附：尽管我很确定它是正确的，但请验证我所写的内容。

在 C++ 中实现不相交集（联合查找）

提问by Isaac

回答by Stuart Golodetz

回答by THK

回答by Murilo Vasconcelos

回答by Puppy

回答by dionyziz

回答by user3310464

回答by user1423561

回答by Spectral

回答by Spectral

回答by 95_96

相关推荐

最近更新

标签

在 C++ 中实现不相交集（联合查找）

提问by Isaac

回答by Stuart Golodetz

回答by THK

回答by Murilo Vasconcelos

回答by Puppy

回答by dionyziz

回答by user3310464

回答by user1423561

回答by Spectral

回答by Spectral

回答by 95_96

相关推荐

C++ 使用 CMake 为 VS 2017 生成解决方案的问题

C++ 源文件中的嵌套类定义

C++ Visual Studio 2017、Boost 和 CMake 的版本号

C++ 如何迭代priority_queue？

相关推荐

最近更新

标签