在 C++ 中使用带密钥更新的最小优先级队列的最简单方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9209323/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 12:29:35  来源:igfitidea点击:

Easiest way of using min priority queue with key update in C++

c++algorithmdata-structures

提问by Chong Luo

Sometimes during programming contests etc., we need a simple working implementation of min priority queue with decrease-key to implement Dijkstra algorithm etc.. I often use set< pair<key_value, ID> > and an array (mapping ID-->key_value) together to achieve that.

有时在编程竞赛等过程中,我们需要一个带有递减键的最小优先级队列的简单工作实现来实现 Dijkstra 算法等。我经常使用 set< pair<key_value, ID> > 和一个数组(映射 ID-->key_value ) 共同实现这一目标。

  • Adding an element to the set takes O(log(N)) time. To build a priority queue out of N elements, we simply add them one by one into the set. This takes O(N log(N)) time in total.

  • The element with min key_value is simply the first element of the set. Probing the smallest element takes O(1) time. Removing it takes O(log(N)) time.

  • To test whether some ID=k is in the set, we first look up its key_value=v_k in the array and then search the element (v_k, k) in the set. This takes O(log(N)) time.

  • To change the key_value of some ID=k from v_k to v_k', we first look up its key_value=v_k in the array, and then search the element (v_k, k) in the set. Next we remove that element from the set and then insert the element (v_k', k) into the set. We then update the array, too. This takes O(log(N)) time.

  • 向集合中添加元素需要 O(log(N)) 时间。为了用 N 个元素构建一个优先级队列,我们​​只需将它们一个一个地添加到集合中。这总共需要 O(N log(N)) 时间。

  • 具有 min key_value 的元素只是集合的第一个元素。探测最小元素需要 O(1) 时间。删除它需要 O(log(N)) 时间。

  • 为了测试某个 ID=k 是否在集合中,我们首先在数组中查找它的 key_value=v_k,然后在集合中搜索元素 (v_k, k)。这需要 O(log(N)) 时间。

  • 要将某个ID=k的key_value从v_k改为v_k',我们首先在数组中查找它的key_value=v_k,然后在集合中查找元素(v_k, k)。接下来,我们从集合中删除该元素,然后将元素 (v_k', k) 插入到集合中。然后我们也更新数组。这需要 O(log(N)) 时间。

Although the above approach works, most textbooks usually recommend using binary heaps to implement priority queues, as the time of building the binary heaps is just O(N). I heard that there is a built-in priority queue data structure in STL of C++ that uses binary heaps. However, I'm not sure how to update the key_value for that data structure.

尽管上述方法有效,但大多数教科书通常建议使用二叉堆来实现优先级队列,因为构建二叉堆的时间仅为 O(N)。听说C++的STL里有内置的优先级队列数据结构,使用的是二进制堆。但是,我不确定如何更新该数据结构的 key_value。

What's the easiest and most efficient way of using min priority queue with key update in C++?

在 C++ 中使用带密钥更新的最小优先级队列的最简单和最有效的方法是什么?

回答by Googol

Although my response will not answer the original question, I think it could be useful for people who reach this question when trying to implement Dijkstra algorithm in C++/Java (like myself), something that was comment by the OP,

虽然我的回答不会回答最初的问题,但我认为这对于在 C++/Java 中尝试实现 Dijkstra 算法时遇到这个问题的人(如我自己)可能有用,这是 OP 的评论,

priority_queuein C++ (or PriorityQueuein Java) do not provide a decrease-keyoperation, as said previously. A nice trick for using those classes when implementing Dijkstra is using "lazy deletion". The main loop of Dijkstra algorithm extracts the next node to be processed from the priority queue, and analises all its adjacent nodes, eventually changing the cost of the minimal path for a node in the priority queue. This is the point where decrease-keyis usually needed in order to update the value of that node.

priority_queue如前所述,在 C++(或PriorityQueueJava)中不提供decrease-key操作。在实现 Dijkstra 时使用这些类的一个很好的技巧是使用“延迟删除”。Dijkstra 算法的主循环从优先级队列中提取下一个要处理的节点,并分析其所有相邻节点,最终改变优先级队列中某个节点的最小路径成本。这是decrease-key通常需要以更新该节点的值的点。

The trick is not change itat all. Instead, a "new copy" for that node (with its new better cost) is added into the priority queue. Having a lower cost, that new copy of the node will be extracted before the original copy in the queue, so it will be processed earlier.

诀窍是根本不改变它。相反,该节点的“新副本”(具有新的更好的成本)被添加到优先级队列中。由于成本较低,节点的新副本将在队列中的原始副本之前提取,因此将更早地进行处理。

The problem with this "lazy deletion" is that the second copy of the node, with the higher bad cost, will be eventually extracted from the priority queue. But that will be always occur after the second added copy, with a better cost, has being processed. So the very first thingthat the main Dijkstra loop must do when extracting the next node from the priority queue is checking if the node has being previously visited (and we know the shortest path already). It is in that precise moment when we will be doing the "lazy deletion" and the element must be simply ignored.

这种“懒惰删除”的问题在于,节点的第二个副本,具有更高的坏成本,最终会从优先级队列中提取出来。但这将始终发生在第二个添加的副本(成本更高)被处理之后。因此当从优先级队列中提取下一个节点时,主 Dijkstra 循环必须做的第一件事是检查该节点之前是否已被访问过(并且我们已经知道最短路径)。正是在那个精确的时刻,我们将进行“惰性删除”,并且必须简单地忽略该元素。

This solution will have a cost both in memory and time, because the priority queue is storing "dead elements" that we have not removed. But the real cost will be quite small, and programming this solution is, IMHO, easier than any other alternative that tries to simulate the missing decrease-keyoperation.

这个解决方案将在内存和时间上都有成本,因为优先级队列正在存储我们没有删除的“死元素”。但实际成本将非常小,恕我直言,对这个解决方案进行编程比任何其他试图模拟缺失decrease-key操作的替代方案更容易。

回答by Christian Rau

Well, as Darren already said, std::priority_queuedoesn't have means for decreasing the priority of an element and neither the removal of an element other than the current min. But the default std::priority_queueis nothing more than a simple container adaptor around a std::vectorthat uses the std heap functions from <algorithm>(std::push_heap, std::pop_heapand std::make_heap). So for Dijkstra (where you need priority update) I usually just do this myself and use a simple std::vector.

好吧,正如 Darren 已经说过的,std::priority_queue没有降低元素优先级的方法,也没有删除当前 min 以外的元素。但是默认std::priority_queue值只不过是一个围绕 a 的简单容器适配器,std::vector它使用<algorithm>( std::push_heap,std::pop_heapstd::make_heap) 中的 std 堆函数。因此,对于 Dijkstra(您需要优先更新的地方),我通常自己做这件事并使用简单的std::vector.

A push is then just the O(log N) operation

推送只是 O(log N) 操作

vec.push_back(item);
std::push_heap(vec.begin(), vec.end());

Of course for constructing a queue anew from N elements, we don't push them all using this O(log N) operation (making the whole thing O(Nlog N)) but just put them all into the vector followed by a simple O(N)

当然,为了从 N 个元素重新构造一个队列,我们​​不会使用这个 O(log N) 操作将它们全部推送(使整个事情 O(Nlog N)),而是将它们全部放入向量中,然后是一个简单的 O (N)

std::make_heap(vec.begin(), vec.end());

The min element is a simple O(1)

min 元素是一个简单的 O(1)

vec.front();

A pop is the simple O(log N) sequence

pop 是简单的 O(log N) 序列

std::pop_heap(vec.begin(), vec.end());
vec.pop_back();

So far this is just what std::priority_queueusually does under the hood. Now to change an item's priority we just need to change it (however it may be incorporated in the item's type) and make the sequence a valid heap again

到目前为止,这只是std::priority_queue通常在引擎盖下所做的。现在要更改项目的优先级,我们只需要更改它(但是它可能会包含在项目的类型中)并使序列再次成为有效堆

std::make_heap(vec.begin(), vec.end());

I know this is an O(N) operation, but on the other hand this removes any need for keeping track of an item's position in the heap with an additional data structure or (even worse) an augmentation of the items' type. And the performance penalty over a logarithmic priority update is in practice not that signficant, considering the ease of use, compact and linear memory useage of std::vector(which impacts runtime, too), and the fact that I often work with graphs that have rather few edges (linear in the vertex count) anyway.

我知道这是一个 O(N) 操作,但另一方面,这消除了使用附加数据结构跟踪项目在堆中位置的任何需要,或者(甚至更糟)项目类型的增强。考虑到易用性、紧凑和线性内存使用std::vector(这也会影响运行时),以及我经常使用边缘很少的图这一事实,对数优先级更新的性能损失在实践中并不那么显着(在顶点数中是线性的)无论如何。

It may not in all cases be the fastest way, but certainly the easiest.

它可能并非在所有情况下都是最快的方式,但肯定是最简单的方式。

EDIT:Oh, and since the standard library uses max-heaps, you need to use an equivalent to >for comparing priorities (however you get them from the items), instead of the default <operator.

编辑:哦,由于标准库使用最大堆,您需要使用等效>于比较优先级(但是您从项目中获取它们),而不是默认<运算符。

回答by Darren Engwirda

I don't think the std::priority_queueclass allows for an efficient implementation of decrease-keystyle operations.

我不认为std::priority_queue该类允许有效实现decrease-key样式操作。

I rolled my own binary heap based data structure that supports this, bascially along very similar lines to what you've described for the std::setbased priority queue you have:

我推出了我自己的基于二进制堆的数据结构来支持这一点,基本上与您所描述的std::set基于优先级队列的内容非常相似:

  • Maintain a binary heap, sorted by valuethat stores elements of pair<value, ID>and an array that maps ID -> heap_index. Within the heap routines heapify_up, heapify_downetc it's necessary to ensure that the mapping array is kept in-sync with the current heap position of elements. This adds some extra O(1)overhead.
  • Conversion of an array to a heap can be done in O(N)according to the standard algorithm described here.
  • Peeking at the root element is O(1).
  • Checking if an IDis currently in the heap just requires an O(1)look-up in the mapping array. This also allows O(1)peeking at the element corresponding to any ID.
  • Decrease-keyrequires an O(1)look-up in the mapping array followed by an O(log(N))update to the heap via heapify_up, heapify_down.
  • Pushing a new item onto the heap is O(log(N))as is popping an exitsing item from the heap.
  • 维护一个二元堆,按value存储 元素的pair<value, ID>和映射 的数组排序ID -> heap_index。在堆例程heapify_up, heapify_down等中,有必要确保映射数组与元素的当前堆位置保持同步。这会增加一些额外的O(1)开销。
  • 可以O(N)根据此处描述的标准算法将数组转换为堆。
  • 查看根元素是O(1)
  • 检查 anID当前是否在堆中只需要O(1)在映射数组中查找。这也允许O(1)查看与 any 对应的元素ID
  • Decrease-key需要O(1)在映射数组中查找,然后O(log(N))通过heapify_up, heapify_down.
  • 将新项目推送到堆上O(log(N))就像从堆中弹出一个退出项目一样。

So asymptotically the runtime is improved for a few of the operations compared with the std::setbased data structure. Another important improvment is that binary heaps can be implemented on an array, while binary trees are node-based containers. The extra data locality of the binary heap usually translates to improved runtime.

因此,与std::set基于数据结构相比,一些操作的运行时间逐渐得到改进。另一个重要的改进是二叉堆可以在数组上实现,而二叉树是基于节点的容器。二叉堆的额外数据局部性通常会转化为改进的运行时间。

A few issues with this implementation are:

此实现的一些问题是:

  • It only allows integer item ID's.
  • It assumes a tight distribution of item ID's, starting at zero (otherwise the space complexity of the mapping array blows up!).
  • 它只允许整数 item ID
  • 它假设 itemID的分布紧密,从零开始(否则映射数组的空间复杂度会爆炸!)。

You could potentially overcome these issues if you maintained a mapping hash-table, rather than a mapping array, but with a little more runtime overhead. For my use, integer ID's have always been enough.

如果您维护一个映射哈希表,而不是一个映射数组,您可能会克服这些问题,但运行时开销会增加一些。对于我的使用, integerID一直就足够了。

Hope this helps.

希望这可以帮助。

回答by kubus

Actually, there is a way to use built-in binary heap from the standard c++ library. The key observation is that the implementation of all heap functions (i.e. std::push_heap, std::pop_heapand std::make_heap) need only the following methods from the stored element of class A:

实际上,有一种方法可以使用标准 C++ 库中的内置二进制堆。关键观察是所有堆函数(即std::push_heapstd::pop_heapstd::make_heap)的实现只需要来自类 A 的存储元素的以下方法:

  • Constructor A::A()
  • Assignment operator A& A::operator=(const A& rhs)
  • Comparison operator bool operator<(const A& lhs, const A& rhs)
  • 构造函数 A::A()
  • 赋值运算符 A& A::operator=(const A& rhs)
  • 比较运算符 bool operator<(const A& lhs, const A& rhs)

It means that assignment operator is called whenever an element is moved in the container storing all heap elements. By overloading this operator you can control the index in the heap. When you have the index you can access the element in the heap, change its value and call std::push_heapto update its position in the heap.

这意味着只要元素在存储所有堆元素的容器中移动,就会调用赋值运算符。通过重载此运算符,您可以控制堆中的索引。当您拥有索引时,您可以访问堆中的元素,更改其值并调用std::push_heap以更新其在堆中的位置。

See the simplified implementation of Dijkstra algorithm (without graph):

参见 Dijkstra 算法的简化实现(无图):

#include <bits/stdc++.h>

using namespace std;

vector<int> queue_idx;

struct Elem {
    int label;
    int dist;
    bool operator<(const Elem& other) const { return dist > other.dist; }
    Elem& operator=(const Elem& other);
};

vector<Elem> q;

Elem& Elem::operator=(const Elem& other)
{
    label = other.label;
    dist = other.dist;
    queue_idx[label] = this - q.data();
    return *this;
}

void AddElem(int label, int dist)
{
    queue_idx[label] = label;
    q.push_back(Elem{label, dist});
}

void RemoveMin()
{
    pop_heap(q.begin(), q.end());
    Elem res = q.back();
    q.pop_back();
    cout << "dist to " << res.label << " is " << res.dist << endl;
}

void Relax(int label, int dist)
{
    int idx = queue_idx[label];
    Elem& elem = q[idx];
    if (elem.dist > dist)
    {
        elem.dist = dist;
        push_heap(q.begin(), q.begin() + idx + 1);
    }
}

int main()
{
    int n = 5;
    queue_idx.resize(n);
    AddElem(0, 0);
    for (int i = 1; i < n; ++i)
        AddElem(i, INT_MAX);
    make_heap(q.begin(), q.end());
    RemoveMin();
    Relax(1, 50);
    Relax(2, 40);
    Relax(4, 10);
    RemoveMin();
    Relax(3, 20);
    RemoveMin();
    Relax(1, 30);
    RemoveMin();
    Relax(2, 80);
    RemoveMin();
    return 0;
}

I know that this solution depends on the inner implementation of the standard library, however it just works for any compiler I am aware of and I used it in programming competitions.

我知道这个解决方案取决于标准库的内部实现,但是它只适用于我知道的任何编译器,我在编程比赛中使用了它。