Python 的 heapq 模块是什么？

Question

提问by minerals

I tried "heapq"and arrived at the conclusion that my expectations differ from what I see on the screen. I need somebody to explain how it works and where it can be useful.

我尝试了“heapq”并得出结论，我的期望与我在屏幕上看到的不同。我需要有人来解释它是如何工作的以及它在哪里有用。

From the book Python Module of the Weekunder paragraph 2.2 Sortingit is written

从《本周的 Python 模块》一书的第2.2段排序下，它是这样写的

If you need to maintain a sorted list as you add and remove values, check out heapq. By using the functions in heapq to add or remove items from a list, you can maintain the sort order of the list with low overhead.

如果您需要在添加和删除值时维护一个排序列表，请查看 heapq。通过使用 heapq 中的函数在列表中添加或删除项目，您可以以较低的开销维护列表的排序顺序。

Here is what I do and get.

这是我所做的和得到的。

import heapq
heap = []

for i in range(10):
    heap.append(i)

heap
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

heapq.heapify(heap)    
heapq.heappush(heap, 10)    
heap
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

heapq.heappop(heap)
0    
heap
[1, 3, 2, 7, 4, 5, 6, 10, 8, 9] <<< Why the list does not remain sorted?

heapq.heappushpop(heap, 11)
1
heap
[2, 3, 5, 7, 4, 11, 6, 10, 8, 9] <<< Why is 11 put between 4 and 6?

So, as you see the "heap" list is not sorted at all, in fact the more you add and remove the items the more cluttered it becomes. Pushed values take unexplainable positions. What is going on?

因此，正如您所看到的，“堆”列表根本没有排序，实际上您添加和删除的项目越多，它就越混乱。推动值占据无法解释的位置。到底是怎么回事？

Answer 1

采纳答案by Martijn Pieters

The heapqmodule maintains the heap invariant, which is not the same thing as maintaining the actual list object in sorted order.

该heapq模块维护堆不变性，这与按排序顺序维护实际列表对象不同。

Quoting from the heapqdocumentation:

从heapq文档中引用：

Heaps are binary trees for which every parent node has a value less than or equal to any of its children. This implementation uses arrays for which heap[k] <= heap[2*k+1]and heap[k] <= heap[2*k+2]for all k, counting elements from zero. For the sake of comparison, non-existing elements are considered to be infinite. The interesting property of a heap is that its smallest element is always the root, heap[0].

堆是二叉树，其每个父节点的值都小于或等于其任何子节点。此实现使用数组 whichheap[k] <= heap[2*k+1]和heap[k] <= heap[2*k+2]for all k，从零开始计数元素。为了比较，不存在的元素被认为是无限的。堆的有趣特性是它的最小元素总是根，heap[0]。

This means that it is very efficient to find the smallest element (just take heap[0]), which is great for a priority queue. After that, the next 2 values will be larger (or equal) than the 1st, and the next 4 after that are going to be larger than their 'parent' node, then the next 8 are larger, etc.

这意味着找到最小元素（只需 take heap[0]）非常有效，这对于优先级队列非常有用。之后，接下来的 2 个值将大于（或等于）第一个，之后的接下来的 4 个将大于它们的“父”节点，然后接下来的 8 个更大，依此类推。

You can read more about the theory behind the datastructure in the Theory section of the documentation. You can also watch this lecture from the MIT OpenCourseWare Introduction to Algorithms course, which explains the algorithm in general terms.

您可以在文档的理论部分阅读有关数据结构背后理论的更多信息。您还可以从 MIT OpenCourseWare 算法介绍课程中观看此讲座，该课程对算法进行了一般性的解释。

A heap can be turned back into a sorted list very efficiently:

堆可以非常有效地转换回排序列表：

def heapsort(heap):
    return [heapq.heappop(heap) for _ in range(len(heap))]

by just popping the next element from the heap. Using sorted(heap)should be faster still, however, as the TimSort algorithm used by Python's sort will take advantage of the partial ordering already present in a heap.

只需从堆中弹出下一个元素。sorted(heap)但是，使用应该更快，因为 Python 的排序使用的 TimSort 算法将利用堆中已经存在的部分排序。

You'd use a heap if you are only interested in the smallest value, or the first nsmallest values, especially if you are interested in those values on an ongoing basis; adding new items and removing the smallest is very efficient indeed, more so than resorting the list each time you added a value.

如果您只对最小值或第一个n最小值感兴趣，则可以使用堆，特别是如果您持续对这些值感兴趣；添加新项目并删除最小的项目确实非常有效，比每次添加值时都重新使用列表更有效。

Answer 2

回答by Alexander Zhukov

There is some misunderstanding of the heap data structure implementation. The heapqmodule is actually a variant of the binary heapimplementation, where heap elements are stored in a list, as described here: https://en.wikipedia.org/wiki/Binary_heap#Heap_implementation

对堆数据结构的实现存在一些误解。该heapq模块实际上是二进制堆实现的一种变体，其中堆元素存储在列表中，如下所述：https: //en.wikipedia.org/wiki/Binary_heap#Heap_implementation

Quoting Wikipedia:

引用维基百科：

Heaps are commonly implemented with an array. Any binary tree can be stored in an array, but because a binary heap is always a complete binary tree, it can be stored compactly. No space is required for pointers; instead, the parent and children of each node can be found by arithmetic on array indices.

堆通常用数组实现。任何二叉树都可以存储在数组中，但由于二叉堆始终是一棵完整的二叉树，因此可以紧凑地存储。指针不需要空间；相反，可以通过数组索引的算术找到每个节点的父节点和子节点。

This image below should help you to feel the difference between tree and list representation of the heap and (note, that this is a max heap, which is the inverse of the usual min-heap!):

下图应该可以帮助您感受堆的树和列表表示之间的区别和（注意，这是一个最大堆，它是通常的最小堆的倒数！）：

enter image description here

在此处输入图片说明

In general, heap data structure is different from a sorted list in that it sacrifices some information about whether any particular element is bigger or smaller than any other. Heap only can tell, that this particular element is less, than it's parent and bigger, than it's children. The less information a data structure stores, the less time/memory it takes to modify it. Compare the complexity of some operations between a heap and a sorted array:

一般来说，堆数据结构与排序列表的不同之处在于它牺牲了一些关于任何特定元素是大于还是小于其他元素的信息。堆只能告诉，这个特定元素比它的父元素小，比它的子元素大。数据结构存储的信息越少，修改它所需的时间/内存就越少。比较堆和排序数组之间某些操作的复杂度：

        Heap                  Sorted array
        Average  Worst case   Average   Worst case

Space   O(n)     O(n)         O(n)      O(n)

Search  O(n)     O(n)         O(log n)  O(log n)

Insert  O(1)     O(log n)     O(n)      O(n)

Delete  O(log n) O(log n)     O(n)      O(n)

Answer 3

回答by Colonel Panic

Your book is wrong!As you demonstrate, a heap is not a sorted list (though a sorted list is a heap). What is a heap? To quote Skiena's Algorithm Design Manual

你的书错了！正如您所演示的，堆不是排序列表（尽管排序列表是堆）。什么是堆？引用 Skiena 的算法设计手册

Heaps are a simple and elegant data structure for efficiently supporting the priority queue operations insert and extract-min. They work by maintaining a partial order on the set of elements which is weaker than the sorted order (so it can be efficient to maintain) yet stronger than random order (so the minimum element can be quickly identified).

堆是一种简单而优雅的数据结构，用于有效支持优先队列操作插入和提取分钟。它们通过维护一组元素的偏序来工作，偏序弱于排序顺序（因此可以有效维护）但强于随机顺序（因此可以快速识别最小元素）。

Compared to a sorted list, a heap obeys a weaker condition the heap invariant. Before defining it, first think why relaxing the condition might be useful. The answer is the weaker condition is easier to maintain. You can do less with a heap, but you can do it faster.

与排序列表相比，堆服从更弱的条件堆不变量。在定义它之前，首先想一想为什么放松条件可能有用。答案是越弱的状态越容易维持。堆可以做得更少，但可以做得更快。

A heap has three operations:

一个堆有三个操作：

Find-Minimum is O(1)
Insert O(log n)
Remove-Min O(log n)

查找最小值为 O(1)
插入 O(log n)
删除-最小 O(log n)

Crucially Insert is O(log n) which beats O(n) for a sorted list.

至关重要的插入是 O(log n)，它在排序列表中胜过 O(n)。

What is the heap invariant? "A binary tree where parents dominate their children". That is, "p ≤ cfor all children c of p". Skiena illustrates with pictures and goes on to demonstrate the algorithm for inserting elements while maintaining the invariant. If you think a while, you can invent them yourself. (Hint: they are known as bubble up and bubble down)

什么是堆不变量？“父母支配孩子的二叉树”。也就是说，“p ≤ c对于 p 的所有孩子 c”。Skiena 用图片进行了说明，并继续演示了在保持不变性的同时插入元素的算法。如果你想了一会儿，你可以自己发明它们。（提示：它们被称为向上气泡和向下气泡）

The good news is that batteries-included Python implements everything for you, in the heapqmodule. It doesn't define a heap type (which I think would be easier to use), but provides them as helper functions on list.

好消息是包含电池的 Python 在heapq模块中为您实现了一切。它没有定义堆类型（我认为它会更容易使用），而是将它们作为列表中的辅助函数提供。

Moral: If you write an algorithm using a sorted list but only ever inspect and remove from one end, then you can make the algorithm more efficient by using a heap.

道德：如果您使用排序列表编写算法但只从一端检查和删除，那么您可以通过使用堆使算法更高效。

For a problem in which a heap data structure is useful, read https://projecteuler.net/problem=500

对于堆数据结构有用的问题，阅读https://projecteuler.net/problem=500

Answer 4

回答by IGotAHeadache

I know this is an older question, but the OP just missed the answer, with diagrams and an explanation of why the sort order looks off when listed in a liner fashion.

我知道这是一个较旧的问题，但 OP 只是错过了答案，并附有图表并解释了为什么在以线性方式列出时排序顺序看起来不正常。

(so I am not going into the optimization, efficiency, etc. I am answering the visual ordering, structure of the OP quesion)

（所以我不讨论优化、效率等。我正在回答 OP 问题的视觉顺序、结构）

He was at pymotw.com but if he had only gotten to: https://pymotw.com/2/heapq/

他在 pymotw.com 但如果他只访问过：https://pymotw.com/2/heapq/

" A min-heap requires that the parent be less than or equal to its children"

“最小堆要求父级小于或等于其子级”

So think tree, think pyramid.

所以想想树，想想金字塔。

This isn't a bad link at all either https://medium.com/basecs/learning-to-love-heaps-cef2b273a238

这根本不是一个坏链接 https://medium.com/basecs/learning-to-love-heaps-cef2b273a238

So each parent has a two-child policy. And the kids can only have two child elements as well.

所以每个父母都有二孩政策。而且孩子们也只能有两个子元素。

The beauty of it is that the kids will always be either less than or equal to (heap-max) to their parents or more than or equal to their parents (heap min).

它的美妙之处在于孩子们总是小于或等于（堆最大）到他们的父母或大于或等于他们的父母（堆最小）。

heap-max or heap-min (that causes confusion) refer to the top-most element or if linear,

heap-max 或 heap-min（导致混淆）指的是最顶层的元素，或者如果是线性的，

heap[0]. Whether that represents the max value as a start or min value as a start.

堆[0]。是将最大值表示为开始还是将最小值表示为开始。

I'm going to leave the math out as much as possible.

我将尽可能地将数学排除在外。

So (numbers are indices)

所以（数字是指数）

heap[0] has two kids. heap[1] and heap[2].

heap[0] 有两个孩子。堆[1] 和堆[2]。

heap[1] kids would be heap[3] and heap[4]

heap[1] 孩子将是 heap[3] 和 heap[4]

heap[2] kids would be heap[5] and heap[6]

heap[2] 孩子将是 heap[5] 和 heap[6]

heap[3] kids would be heap[7] and heap[8]

heap[3] 孩子将是 heap[7] 和 heap[8]

heap[4] kids would be heap[9] and heap[10]

heap[4] 孩子将是 heap[9] 和 heap[10]

and so on.

等等。

so, the question,

所以，问题，

[2, 3, 5, 7, 4, 11, 6, 10, 8, 9] <<< Why is 11 put between 4 and 6?

because value 11 stored at index 5. And index 5 is a child of index 2 which has the value of 3. The value 4 (index 4) and is the child of index 1

因为值 11 存储在索引 5 处。索引 5 是索引 2 的子代，其值为 3。值 4（索引 4）是索引 1 的子代

It is ordered from smallest, it just doesn't LOOK it when examined in a linear fashion.

它是从最小的顺序排列的，当以线性方式检查时，它看起来并不像。

parent -> child 

[0] -> [0] is 2
-
[0] -> [1] is 3
[0] -> [2] is 5
-
[1] -> [3] is 7
[1] -> [4] is 4
[2] -> [5] is 11  <-- between 4 and 6
[2] -> [6] is 6

so.... this again. And it is true. "A min-heap requires that the parent be less than or equal to its children"

所以……又是这个。这是真的。“最小堆要求父级小于或等于其子级”

Make yourself crazy and pencil it out for max.... it will be true still.

让自己疯狂并用铅笔写出最大....它仍然是真的。

(ever write one of these things and just wait to get squashed by some post doctoral?)

（有没有写过这些东西，然后等着被一些博士后压扁？）

so let's pop off the first element and do like a normal list or queue

所以让我们弹出第一个元素并像普通列表或队列一样

[0] -> [0] is 3
-
[0] -> [1] is 5
[0] -> [2] is 7
-
[1] -> [3] is 4
[1] -> [4] is 11

Let's stop.

我们停止吧。

index 1 has a value of 5. index 3, it child's value is 4 and is smaller.... the rule is broken. The heap is reordered to maintain the relationships. so it will basically, never looksorted and it won't look anything like the prior iteration of itself before popping off the value.

索引 1 的值为 5。索引 3，它的子值是 4 并且更小……规则被破坏了。堆被重新排序以维持关系。所以它基本上不会看起来是排序的，并且在弹出值之前它不会看起来像它自己的先前迭代。

There are ways to reorder the node, and that second article talks bout them. I just wanted to answer the question specifically.

有多种方法可以重新排序节点，第二篇文章将讨论它们。我只是想具体回答这个问题。

Python 的 heapq 模块是什么？

提问by minerals

采纳答案by Martijn Pieters

回答by Alexander Zhukov

回答by Colonel Panic

回答by IGotAHeadache

相关推荐

最近更新

标签

Python 的 heapq 模块是什么？

提问by minerals

采纳答案by Martijn Pieters

回答by Alexander Zhukov

回答by Colonel Panic

回答by IGotAHeadache

相关推荐

Python 从 (row,col,values) 的元组列表构造 pandas DataFrame

Python DataFrame 对象没有属性“sort_values”

Python Angular 的 Flask RESTful 跨域问题：PUT、OPTIONS 方法

Python 抓取 https://www.thenewboston.com/ 时出现“SSL：certificate_verify_failed”错误

相关推荐

最近更新

标签