C++ 什么更快:插入优先队列,还是追溯排序?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3759112/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What's faster: inserting into a priority queue, or sorting retrospectively?
提问by static_rtti
What's faster: inserting into a priority queue, or sorting retrospectively?
什么更快:插入优先队列,还是追溯排序?
I am generating some items that I need to be sorted at the end. I was wondering, what is faster in terms of complexity: inserting them directly in a priority_queue or a similar data structure, or using a sort algorithm at end?
我正在生成一些需要在最后进行排序的项目。我想知道,就复杂性而言,什么更快:将它们直接插入到 priority_queue 或类似的数据结构中,或者在最后使用排序算法?
采纳答案by Konrad Rudolph
Inserting nitems into a priority queue will have asymptotic complexity O(nlog n) so in terms of complexity, it's not more efficient than using sort
once, at the end.
将n 个项目插入优先级队列将具有渐近复杂度 O( nlog n),因此就复杂性而言,它并不比最后使用sort
一次更有效。
Whether it's more efficient in practice really depends. You need to test. In fact, in practice, even continued insertioninto a linear array (as in insertion sort, without building a heap) may be the most efficient, even though asymptotically it has worseruntime.
它在实践中是否更有效真的取决于。你需要测试。事实上,在实践中,即使是连续插入线性数组(如插入排序,不构建堆)也可能是最有效的,尽管渐近地它的运行时间更糟。
回答by Richard
This probably comes to you a little late in the game as far as your question is concerned, but let's be complete.
就您的问题而言,这可能会在游戏中稍晚出现,但让我们完成。
Testing is the best way to answer this question for your specific computer architecture, compiler, and implementation. Beyond that, there are generalizations.
针对您的特定计算机体系结构、编译器和实现,测试是回答此问题的最佳方式。除此之外,还有概括。
First off, priority queues are not necessarily O(n log n).
首先,优先级队列不一定是 O(n log n)。
If you have integer data, there are priority queues which work in O(1) time. Beucher and Meyer's 1992 publication "The morphological approach to segmentation: the watershed transformation" describes hierarchical queues, which work quite quickly for integer values with limited range. Brown's 1988 publication "Calendar queues: a fast 0 (1) priority queue implementation for the simulation event set problem" offers another solution which deals well with larger ranges of integers - two decades of work following Brown's publication has produced some nice results for doing integer priority queues fast. But the machinery of these queues can become complicated: bucket sorts and radix sorts may still provide O(1) operation. In some cases, you may even be able to quantize floating-point data to take advantage of an O(1) priority queue.
如果您有整数数据,则有在 O(1) 时间内工作的优先级队列。Beucher 和 Meyer 1992 年的出版物“分割的形态学方法:分水岭变换”描述了分层队列,它对范围有限的整数值工作得非常快。布朗 1988 年的出版物“日历队列:模拟事件集问题的快速 0 (1) 优先级队列实现”提供了另一种可以很好地处理更大范围整数的解决方案 - 布朗出版后二十年的工作已经产生了一些很好的整数结果优先排队快. 但是这些队列的机制可能会变得复杂:桶排序和基数排序可能仍然提供 O(1) 操作。在某些情况下,您甚至可以量化浮点数据以利用 O(1) 优先级队列。
Even in the general case of floating-point data, that O(n log n) is a little misleading. Edelkamp's book "Heuristic Search: Theory and Applications" has the following handy table showing the time complexity for various priority queue algorithms (remember, priority queues are equivalent to sorting and heap management):
即使在浮点数据的一般情况下,O(n log n) 也有点误导。Edelkamp 的书“启发式搜索:理论和应用”有以下方便的表格,显示了各种优先级队列算法的时间复杂度(记住,优先级队列相当于排序和堆管理):
As you can see, many priority queues have O(log n) costs not just for insertion, but also for extraction, and even queue management! While the coefficient is generally dropped for measuring the time complexity of an algorithm, these costs are still worth knowing.
如您所见,许多优先级队列的开销为 O(log n),不仅用于插入,还用于提取,甚至队列管理!虽然通常会丢弃系数来衡量算法的时间复杂度,但这些成本仍然值得了解。
But all these queues still have time complexities which are comparable. Which is best? A 2010 paper by Cris L. Luengo Hendriks entitled "Revisiting priority queues for image analysis" addresses this question.
但是所有这些队列仍然具有可比较的时间复杂度。哪个最好?Cris L. Luengo Hendriks 2010 年题为“重新审视图像分析的优先队列”的论文解决了这个问题。
In Hendriks' hold test, a priority queue was seeded with Nrandom numbers in the range [0,50]. The top-most element of the queue was then dequeued, incremented by a random value in the range [0,2], and then queued. This operation was repeated 10^7times. The overhead of generating the random numbers was subtracted from the measured times. Ladder queues and hierarchical heaps performed quite well by this test.
在 Hendriks 的保持测试中,优先级队列被植入[0,50]范围内的N 个随机数。然后队列的最顶部元素出列,增加[0,2]范围内的随机值,然后排队。这个操作重复了10^7次。从测量的时间中减去生成随机数的开销。通过这个测试,梯形队列和分层堆表现得相当好。
The per element time to initialize and empty the queues were also measured---these tests are very relevant to your question.
还测量了每个元素初始化和清空队列的时间——这些测试与您的问题非常相关。
As you can see, the different queues often had very different responses to enqueueing and dequeueing. These figures imply that while there may be priority queue algorithms which are superior for continuous operation, there is no best choice of algorithm for simply filling and then emptying a priority queue (the operation you're doing).
如您所见,不同的队列通常对入队和出队有非常不同的响应。这些数字意味着,虽然可能存在优于连续操作的优先级队列算法,但对于简单地填充然后清空优先级队列(您正在执行的操作),没有最佳算法选择。
Let's look back at your questions:
让我们回顾一下你的问题:
What's faster: inserting into a priority queue, or sorting retrospectively?
什么更快:插入优先队列,还是追溯排序?
As shown above, priority queues can be made efficient, but there are still costs for insertion, removal, and management. Insertion into a vector is fast. It's O(1) in amortized time, and there are no management costs, plus the vector is O(n) to be read.
如上所示,优先级队列可以变得高效,但仍然存在插入、移除和管理的成本。插入向量很快。摊销时间是O(1),而且没有管理成本,再加上要读取的向量是O(n)。
Sorting the vector will cost you O(n log n) assuming that you have floating-point data, but this time complexity's not hiding things like the priority queues were. (You have to be a little careful, though. Quicksort runs very well on some data, but it has a worst-case time complexity of O(n^2). For some implementations, this is a serious security risk.)
假设您有浮点数据,对向量进行排序将花费您 O(n log n),但是这个时间复杂度并没有像优先级队列那样隐藏东西。(不过,您必须小心一点。Quicksort 在某些数据上运行得很好,但它的最坏情况时间复杂度为 O(n^2)。对于某些实现,这是一个严重的安全风险。)
I'm afraid I don't have data for the costs of sorting, but I'd say that retroactive sorting captures the essence of what you're trying to do better and is therefore the better choice. Based on the relative complexity of priority queue management versus post-sorting, I'd say that post-sorting should be faster. But again, you should test this.
恐怕我没有排序成本的数据,但我想说追溯排序抓住了你想要做得更好的本质,因此是更好的选择。基于优先队列管理与后排序的相对复杂性,我认为后排序应该更快。但同样,您应该对此进行测试。
I am generating some items that I need to be sorted at the end. I was wondering, what is faster in terms of complexity: inserting them directly in a priority-queue or a similar data structure, or using a sort algorithm at end?
我正在生成一些需要在最后进行排序的项目。我想知道,就复杂性而言,什么更快:将它们直接插入优先级队列或类似的数据结构中,或者最后使用排序算法?
We're probably covered this above.
我们可能在上面已经介绍过了。
There's another question you didn't ask, though. And perhaps you already know the answer. It's a question of stability. The C++ STL says that the priority queue must maintain a "strict weak" order. This means that elements of equal priority are incomparable and may be placed in any order, as opposed to a "total order" where every element is comparable. (There's a nice description of ordering here.) In sorting, "strict weak" is analogous to an unstable sort and "total order" is analogous to a stable sort.
不过,还有一个你没有问的问题。也许你已经知道答案了。这是稳定性的问题。C++ STL 说优先级队列必须保持“严格弱”的顺序。这意味着具有相同优先级的元素是不可比较的,可以按任何顺序放置,而不是每个元素都具有可比性的“总顺序”。(有订货的一个很好的描述在这里。)在排序,“严格弱”是类似于一个不稳定的排序和“总序”类似于一个稳定的排序。
The upshot is that if elements of the same priority should stay in the same order you pushed them into your data structure, then you need a stable sort or a total order. If you plan to use the C++ STL, then you have only one option. Priority queues use a strict weak ordering, so they're useless here, but the "stable_sort" algorithm in the STL Algorithm library will get the job done.
结果是,如果相同优先级的元素应该保持相同的顺序,您将它们推送到您的数据结构中,那么您需要一个稳定的排序或总顺序。如果您打算使用 C++ STL,那么您只有一种选择。优先队列使用严格的弱排序,所以它们在这里没用,但是 STL 算法库中的“stable_sort”算法将完成工作。
I hope this helps. Let me know if you'd like a copy of any of the papers mentioned or would like clarification. :-)
我希望这有帮助。如果您想要任何提到的文件的副本或想要澄清,请告诉我。:-)
回答by Soylent Graham
Depends on the data, but I generally find InsertSort to be faster.
取决于数据,但我通常发现 InsertSort 更快。
I had a related question, and I found in the end the bottleneck was just that I was doing a deffered sort (Only when I ended up needed it) and on a large amount of items, I usually had the worst-case-scenario for my QuickSort (already in order), So I used an insert sort
我有一个相关的问题,最后我发现瓶颈只是我在做延迟排序(只有当我最终需要它时)并且在大量项目上,我通常有最坏的情况我的 QuickSort(已经按顺序),所以我使用了插入排序
Sorting 1000-2000 elements with many cache misses
So analyze your data!
所以分析你的数据!
回答by Steve Jessop
To your first question (which is faster): it depends. Just test it. Assuming you want the final result in a vector, the alternatives might look something like this:
对于您的第一个问题(更快):这取决于。只是测试一下。假设您想要向量中的最终结果,替代方案可能如下所示:
#include <iostream>
#include <vector>
#include <queue>
#include <cstdlib>
#include <functional>
#include <algorithm>
#include <iterator>
#ifndef NUM
#define NUM 10
#endif
int main() {
std::srand(1038749);
std::vector<int> res;
#ifdef USE_VECTOR
for (int i = 0; i < NUM; ++i) {
res.push_back(std::rand());
}
std::sort(res.begin(), res.end(), std::greater<int>());
#else
std::priority_queue<int> q;
for (int i = 0; i < NUM; ++i) {
q.push(std::rand());
}
res.resize(q.size());
for (int i = 0; i < NUM; ++i) {
res[i] = q.top();
q.pop();
}
#endif
#if NUM <= 10
std::copy(res.begin(), res.end(), std::ostream_iterator<int>(std::cout,"\n"));
#endif
}
$ g++ sortspeed.cpp -o sortspeed -DNUM=10000000 && time ./sortspeed
real 0m20.719s
user 0m20.561s
sys 0m0.077s
$ g++ sortspeed.cpp -o sortspeed -DUSE_VECTOR -DNUM=10000000 && time ./sortspeed
real 0m5.828s
user 0m5.733s
sys 0m0.108s
So, std::sort
beats std::priority_queue
, in this case. But maybe you have a better or worse std:sort
, and maybe you have a better or worse implementation of a heap. Or if not better or worse, just more or less suited to your exact usage, which is different from my invented usage: "create a sorted vector containing the values".
所以,std::sort
摔打std::priority_queue
,在这种情况下。但也许你有一个更好或更坏的std:sort
,也许你有一个更好或更坏的堆实现。或者,如果不是更好或更坏,只是或多或少适合您的确切用法,这与我发明的用法不同:“创建一个包含值的排序向量”。
I can say with a lot of confidence that random data won't hit the worst case of std::sort
, so in a sense this test might flatter it. But for a good implementation of std::sort
, its worst case will be very difficult to construct, and might not actually be all that bad anyway.
我可以很有信心地说,随机数据不会达到 最坏的情况std::sort
,所以从某种意义上说,这个测试可能会让它更受宠若惊。但是对于 的良好实现std::sort
,其最坏的情况将非常难以构建,而且实际上可能不会那么糟糕。
Edit: I added use of a multiset, since some people have suggested a tree:
编辑:我添加了多重集的使用,因为有些人建议使用一棵树:
#elif defined(USE_SET)
std::multiset<int,std::greater<int> > s;
for (int i = 0; i < NUM; ++i) {
s.insert(std::rand());
}
res.resize(s.size());
int j = 0;
for (std::multiset<int>::iterator i = s.begin(); i != s.end(); ++i, ++j) {
res[j] = *i;
}
#else
$ g++ sortspeed.cpp -o sortspeed -DUSE_SET -DNUM=10000000 && time ./sortspeed
real 0m26.656s
user 0m26.530s
sys 0m0.062s
To your second question (complexity): they're all O(n log n), ignoring fiddly implementation details like whether memory allocation is O(1) or not (vector::push_back
and other forms of insert at the end are amortized O(1)) and assuming that by "sort" you mean a comparison sort. Other kinds of sort can have lower complexity.
关于您的第二个问题(复杂性):它们都是 O(n log n),忽略繁琐的实现细节,例如内存分配是否为 O(1)(vector::push_back
以及最后的其他形式的插入均摊销为 O(1))并假设“排序”是指比较排序。其他种类的排序可以具有较低的复杂性。
回答by SPIRiT_1984
As far as I understand, your problem does not require Priority Queue, since your tasks sounds like "Make many insertions, after that sort everything". That's like shooting birds from a laser, not an appropriate tool. Use standard sorting techniques for that.
据我了解,您的问题不需要优先队列,因为您的任务听起来像是“进行多次插入,然后对所有内容进行排序”。这就像用激光打鸟,而不是合适的工具。为此使用标准排序技术。
You would need a Priority Queue, if your task was to imitate a sequence of operations, where each operation can be either "Add an element to the set" or "Remove smallest/greatest element from the set". This can be used in problem of finding a shortest path on the graph, for example. Here you cannot just use standard sorting techniques.
如果您的任务是模拟一系列操作,您将需要一个优先队列,其中每个操作可以是“向集合中添加一个元素”或“从集合中删除最小/最大元素”。例如,这可以用于在图上寻找最短路径的问题。在这里,您不能只使用标准的排序技术。
回答by midtiby
回答by midtiby
A priority queue is usually implemented as a heap. Sorting using a heap is on average slower than quicksort, except that quicksort has a worse worst case performance. Also heaps are relatively heavy data structures, so there's more overhead.
优先级队列通常以堆的形式实现。使用堆排序平均比快速排序慢,除了快速排序在最坏情况下的性能更差。此外,堆是相对较重的数据结构,因此开销更大。
I'd reccomend sort at end.
我建议最后排序。
回答by Elemental
I think that the insertion is more efficient in almost all cases where you are generating the data (i.e. don't already have it in a list).
我认为插入在几乎所有生成数据的情况下都更有效(即尚未在列表中包含它)。
A priority queue is not your only option for insertion as you go. As mentioned in other answers a binary tree (or related RB-tree) is equally efficient.
优先队列并不是您随时插入的唯一选择。正如其他答案中提到的,二叉树(或相关的 RB 树)同样有效。
I would also check how the priority queue is implemented - many are based on b-trees already but a few implementations are not very good at extracting the elements (they essentially go through the entire queue and look for the highest priority).
我还将检查优先级队列是如何实现的——许多已经基于 b 树,但一些实现不太擅长提取元素(它们基本上遍历整个队列并寻找最高优先级)。
回答by John Ortega
On a max-insert priority queue operations are O(lg n)
在最大插入优先级队列操作是 O(lg n)