C++ 为什么 std::map 实现为红黑树?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5288320/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why is std::map implemented as a red-black tree?
提问by Denis Gorodetskiy
Why is std::map
implemented as a red-black tree?
为什么std::map
实现为红黑树?
There are several balanced binary search trees(BSTs) out there. What were design trade-offs in choosing a red-black tree?
有几种平衡二叉搜索树(BST)。选择红黑树的设计权衡是什么?
采纳答案by Chris Taylor
Probably the two most common self balancing tree algorithms are Red-Black treesand AVL trees. To balance the tree after an insertion/update both algorithms use the notion of rotations where the nodes of the tree are rotated to perform the re-balancing.
可能最常见的两种自平衡树算法是红黑树和AVL 树。为了在插入/更新后平衡树,两种算法都使用旋转的概念,其中树的节点被旋转以执行重新平衡。
While in both algorithms the insert/delete operations are O(log n), in the case of Red-Black tree re-balancing rotation is an O(1)operation while with AVL this is a O(log n)operation, making the Red-Black tree more efficient in this aspect of the re-balancing stage and one of the possible reasons that it is more commonly used.
虽然在这两种算法中,插入/删除操作都是 O(log n),但在红黑树重新平衡旋转的情况下,这是一个O(1)操作,而对于 AVL,这是一个O(log n)操作,使得红黑树在重新平衡阶段的这方面效率更高,也是它更常用的可能原因之一。
Red-Black trees are used in most collection libraries, including the offerings from Java and Microsoft .NET Framework.
大多数集合库都使用红黑树,包括来自 Java 和 Microsoft .NET Framework 的产品。
回答by webbertiger
It really depends on the usage. AVL tree usually has more rotations of rebalancing. So if your application doesn't have too many insertion and deletion operations, but weights heavily on searching, then AVL tree probably is a good choice.
这真的取决于用法。AVL 树通常具有更多的重新平衡旋转。因此,如果您的应用程序没有太多的插入和删除操作,但重在搜索,那么 AVL 树可能是一个不错的选择。
std::map
uses Red-Black tree as it gets a reasonable trade-off between the speed of node insertion/deletion and searching.
std::map
使用红黑树,因为它在节点插入/删除和搜索的速度之间获得了合理的权衡。
回答by user847376
AVL trees have a maximum height of 1.44logn, while RB trees have a maximum of 2logn. Inserting an element in a AVL may imply a rebalance at one point in the tree. The rebalancing finishes the insertion. After insertion of a new leaf, updating the ancestors of that leaf has to be done up to the root, or up to a point where the two subtrees are of equal depth. The probability of having to update k nodes is 1/3^k. Rebalancing is O(1). Removing an element may imply more than one rebalancing (up to half the depth of the tree).
AVL 树的最大高度为 1.44logn,而 RB 树的最大高度为 2logn。在 AVL 中插入元素可能意味着在树中的某一点重新平衡。重新平衡完成插入。插入新叶子后,必须更新该叶子的祖先直到根,或者直到两个子树的深度相等的点。必须更新 k 个节点的概率是 1/3^k。重新平衡是 O(1)。删除一个元素可能意味着不止一次重新平衡(最多为树的一半深度)。
RB-trees are B-trees of order 4 represented as binary search trees. A 4-node in the B-tree results in two levels in the equivalent BST. In the worst case, all the nodes of the tree are 2-nodes, with only one chain of 3-nodes down to a leaf. That leaf will be at a distance of 2logn from the root.
RB 树是表示为二叉搜索树的 4 阶 B 树。B 树中的 4 节点导致等效 BST 中的两个级别。在最坏的情况下,树的所有节点都是 2 个节点,只有一个 3 个节点的链直到一个叶子节点。那片叶子与根的距离为 2logn。
Going down from the root to the insertion point, one has to change 4-nodes into 2-nodes, to make sure any insertion will not saturate a leaf. Coming back from the insertion, all these nodes have to be analysed to make sure they correctly represent 4-nodes. This can also be done going down in the tree. The global cost will be the same. There is no free lunch! Removing an element from the tree is of the same order.
从根向下到插入点,必须将 4-nodes 更改为 2-nodes,以确保任何插入都不会使叶子饱和。从插入回来后,必须分析所有这些节点以确保它们正确表示 4 节点。这也可以在树中向下完成。全球成本将是相同的。天下没有免费的午餐!从树中删除元素的顺序相同。
All these trees require that nodes carry information on height, weight, color, etc. Only Splay trees are free from such additional info. But most people are afraid of Splay trees, because of the ramdomness of their structure!
所有这些树都要求节点携带高度、重量、颜色等信息。只有 Splay 树没有这些附加信息。但是大多数人都害怕 Splay 树,因为它们的结构过于随意!
Finally, trees can also carry weight information in the nodes, permitting weight balancing. Various schemes can be applied. One should rebalance when a subtree contains more than 3 times the number of elements of the other subtree. Rebalancing is again done either throuh a single or double rotation. This means a worst case of 2.4logn. One can get away with 2 times instead of 3, a much better ratio, but it may mean leaving a little less thant 1% of the subtrees unbalanced here and there. Tricky!
最后,树还可以在节点中携带权重信息,从而实现权重平衡。可以应用各种方案。当一棵子树包含的元素数量超过另一棵子树的 3 倍时,应该重新平衡。再平衡是通过单轮或双轮来完成的。这意味着 2.4logn 的最坏情况。可以使用 2 次而不是 3 次,这是一个更好的比率,但这可能意味着在这里和那里留下不到 1% 的子树不平衡。棘手!
Which type of tree is the best? AVL for sure. They are the simplest to code, and have their worst height nearest to logn. For a tree of 1000000 elements, an AVL will be at most of height 29, a RB 40, and a weight balanced 36 or 50 depending on the ratio.
哪种树最好?肯定是AVL。它们最容易编码,并且最接近 logn 的高度。对于包含 1000000 个元素的树,AVL 最多为 29,RB 为 40,权重平衡为 36 或 50,具体取决于比率。
There are a lot of other variables: randomness, ratio of adds, deletes, searches, etc.
还有很多其他变量:随机性、添加、删除、搜索的比率等。
回答by Justin Meiners
The previous answers only address tree alternatives and red black probably only remains for historical reasons.
先前的答案仅针对树替代方案,而红黑可能仅出于历史原因而保留。
Why not a hash table?
为什么不是哈希表?
A type only requires <
operator (comparison) to be used as a key in a tree. However, hash tables require that each key type has a hash
function defined. Keeping type requirements to a minimum is very important for generic programming so you can use it with a wide variety of types and algorithms.
类型只需要<
运算符(比较)用作树中的键。但是,哈希表要求每个键类型都hash
定义了一个函数。将类型要求保持在最低限度对于泛型编程非常重要,因此您可以将其与各种类型和算法一起使用。
Designing a good hash table requires intimate knowledge of the context it which it will be used. Should it use open addressing, or linked chaining? What levels of load should it accept before resizing? Should it use an expensive hash that avoids collisions, or one that is rough and fast?
设计一个好的哈希表需要深入了解它将被使用的上下文。它应该使用开放寻址还是链接链接?在调整大小之前它应该接受什么级别的负载?它应该使用避免冲突的昂贵散列,还是粗略快速的散列?
Since the STL can't anticipate which is the best choice for your application, the default needs to be more flexible. Trees "just work" and scale nicely.
由于 STL 无法预测哪个是您的应用程序的最佳选择,因此默认值需要更加灵活。树木“正常工作”并且可以很好地扩展。
(C++11 did add hash tables with unordered_map
. You can see from the documentationit requires setting policies to configure many of these options.)
(C++11 确实添加了哈希表unordered_map
。您可以从文档中看到它需要设置策略来配置许多这些选项。)
What about other trees?
其他树呢?
Red Black trees offer fast lookup and are self balancing, unlike BSTs. Another user pointed out its advantages over the self-balancing AVL tree.
与 BST 不同,红黑树提供快速查找和自我平衡。另一位用户指出了它相对于自平衡 AVL 树的优势。
Alexander Stepanov (The creator of STL) said that he would use a B* Tree instead of a Red-Black tree if he wrote std::map
again, because it is more friendly for modern memory caches.
Alexander Stepanov(STL的创始人)说如果再写的话他会用B*树而不是红黑树std::map
,因为它对现代内存缓存更友好。
One of the biggest changes since then has been the growth of caches. Cache misses are very costly, so locality of reference is much more important now. Node-based data structures, which have low locality of reference, make much less sense. If I were designing STL today, I would have a different set of containers. For example, an in-memory B*-tree is a far better choice than a red-black tree for implementing an associative container. - Alexander Stepanov
从那时起最大的变化之一就是缓存的增长。缓存未命中成本非常高,因此引用的局部性现在更为重要。基于节点的数据结构具有较低的引用局部性,意义不大。如果我今天设计 STL,我会有一组不同的容器。例如,在实现关联容器时,内存中的 B* 树是比红黑树更好的选择。——亚历山大·斯捷潘诺夫
Should maps always use trees?
地图应该总是使用树吗?
Another possible maps implementation would be a sorted vector (insertion sort) and binary search. This would work well
for containers which aren't modified often but are queried frequently.
I often do this in C as qsort
and bsearch
are built in.
另一种可能的映射实现是排序向量(插入排序)和二分搜索。这对于不经常修改但经常查询的容器很有效。我经常在 C 中这样做,qsort
并且bsearch
是内置的。
Do I even need to use map?
我什至需要使用地图吗?
Cache considerations mean it rarely makes sense to use std::list
or std::deque
over std:vector
even for those situations we were taught in school (such as removing an element from the middle of the list).
Applying that same reasoning, using a for loop to linear search a list is often more efficient and cleaner than building a map for a few lookups.
缓存考虑意味着即使对于我们在学校教过的那些情况(例如从列表中间删除元素),使用std::list
或std::deque
结束std:vector
也很少有意义。应用相同的推理,使用 for 循环线性搜索列表通常比为几次查找构建地图更有效和更清晰。
Of course choosing a readable container is usually more important than performance.
当然,选择一个可读的容器通常比性能更重要。
回答by Eric Ouellet
Update 2017-06-14: webbertiger edit its answer after I commented. I should point out that its answer is now a lot better to my eyes. But I kept my answer just as additional information...
2017 年 6 月 14 日更新:webbertiger 在我发表评论后编辑其答案。我应该指出,它的答案现在对我来说好多了。但我保留了我的答案作为附加信息......
Due to the fact that I think first answer is wrong (correction: not both anymore) and the third has a wrong affirmation. I feel I had to clarify things...
由于我认为第一个答案是错误的(更正:不再是两个答案),而第三个答案是错误的。我觉得我必须澄清一些事情......
The 2 most popular tree are AVL and Red Black (RB). The main difference lie in the utilization:
两种最受欢迎的树是 AVL 和红黑 (RB)。主要区别在于利用率:
- AVL : Better if ratio of consultation (read) is bigger than manipulation (modification). Memory foot print is a little less than RB (due to the bit required for coloring).
- RB : Better in general cases where there is a balance between consultation (read) and manipulation (modification) or more modification over consultation. A slightly bigger memory footprint due to the storing of red-black flag.
- AVL :如果咨询(读取)的比率大于操作(修改)的比率,则更好。内存占用量略小于 RB(由于着色所需的位)。
- RB :在咨询(阅读)和操作(修改)或更多修改而不是咨询之间存在平衡的一般情况下更好。由于存储红黑标志,内存占用略大。
The main difference come from the coloring. You do have less re-balance action in RB tree than AVL because the coloring enable you to sometimes skip or shorten re-balance actions which have a relative hi cost. Because of the coloring, RB tree also have higher level of nodes because it could accept red nodes between black ones (having the possibilities of ~2x more levels) making search (read) a little bit less efficient... but because it is a constant (2x), it stay in O(log n).
主要区别来自着色。RB 树中的重新平衡操作确实比 AVL 少,因为着色使您有时可以跳过或缩短成本相对较高的重新平衡操作。由于着色,RB 树还具有更高级别的节点,因为它可以接受黑色节点之间的红色节点(有可能增加约 2 倍的级别),从而使搜索(读取)效率稍低……但因为它是一个常数 (2x),它保持在 O(log n) 中。
If you consider the performance hit for a modification of a tree (significative) VS the performance hit of consultation of a tree (almost insignificant), it become natural to prefer RB over AVL for a general case.
如果您考虑修改树的性能损失(有意义的)与咨询树的性能损失(几乎无关紧要),那么在一般情况下更喜欢 RB 而不是 AVL 就变得很自然了。
回答by necromancer
It is just the choice of your implementation - they could be implemented as any balanced tree. The various choices are all comparable with minor differences. Therefore any is as good as any.
这只是您实现的选择 - 它们可以作为任何平衡树来实现。各种选择都具有可比性,但差异很小。因此,任何都和任何一样好。