C++ 中 STL 集的底层数据结构是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2558153/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 23:54:29  来源:igfitidea点击:

What is the underlying data structure of a STL set in C++?

c++set

提问by zebraman

I would like to know how a set is implemented in C++. If I were to implement my own set container without using the STL provided container, what would be the best way to go about this task?

我想知道如何在 C++ 中实现集合。如果我要在不使用 STL 提供的容器的情况下实现自己的 set 容器,那么执行此任务的最佳方法是什么?

I understand STL sets are based on the abstract data structure of a binary search tree. So what is the underlying data structure? An array?

我了解 STL 集基于二叉搜索树的抽象数据结构。那么底层的数据结构是什么?数组?

Also, how does insert()work for a set? How does the set check whether an element already exists in it?

另外,如何insert()为一个集合工作?set 如何检查一个元素是否已经存在于其中?

I read on wikipedia that another way to implement a set is with a hash table. How would this work?

我在维基百科上读到实现集合的另一种方法是使用哈希表。这将如何运作?

采纳答案by Raul Agrait

You could implement a binary search tree by first defining a Nodestruct:

您可以通过首先定义一个Node结构来实现二叉搜索树:

struct Node
{
  void *nodeData;
  Node *leftChild;
  Node *rightChild;
}

Then, you could define a root of the tree with another Node *rootNode;

然后,你可以用另一个定义树的根 Node *rootNode;

The Wikipedia entry on Binary Search Treehas a pretty good example of how to implement an insert method, so I would also recommend checking that out.

Binary Search Tree的 Wikipedia 条目有一个很好的示例,说明如何实现插入方法,因此我也建议您查看一下。

In terms of duplicates, they are generally not allowed in sets, so you could either just discard that input, throw an exception, etc, depending on your specification.

就重复项而言,它们通常不允许出现在集合中,因此您可以根据您的规范丢弃该输入、抛出异常等。

回答by Toli

As KTC said, how std::setis implemented can vary -- the C++ standard simply specifies an abstract data type. In other words, the standard does not specify how a container should be implemented, just what operations it is required to support. However, most implementations of the STL do, as far as I am aware, use red-black treesor other balanced binary search trees of some kind (GNU libstdc++, for instance, uses red-black trees).

正如 KTC 所说,std::set实现方式可能会有所不同——C ++ 标准只是指定了一种抽象数据类型。换句话说,该标准没有指定容器应该如何实现,只是需要它支持哪些操作。但是,据我所知,STL 的大多数实现都使用红黑树或其他某种平衡二叉搜索树(例如,GNU libstdc++ 使用红黑树)。

While you could theoretically implement a set as a hash table and get faster asymptotic performance (amortized O(key length) versus O(log n) for lookup and insert), that would require having the user supply a hash function for whatever type they wanted to store (see Wikipedia's entry on hash tablesfor a good explanation of how they work). As for an implementation of a binary search tree, you wouldn't want to use an array -- as Raul mentioned, you would want some kind of Nodedata structure.

虽然理论上您可以将集合实现为哈希表并获得更快的渐近性能(用于查找和插入的分摊 O(key length) 与 O(log n)),但这需要让用户为他们想要的任何类型提供哈希函数存储(请参阅维基百科关于哈希表的条目以了解它们的工作原理)。至于二叉搜索树的实现,您不会想要使用数组——正如 Raul 所提到的,您会想要某种Node数据结构。

回答by jasonline

I understand STL sets are based on the abstract data structure of a binary search tree. So what is the underlying data structure? An array?

我了解 STL 集基于二叉搜索树的抽象数据结构。那么底层的数据结构是什么?数组?

As others have pointed out, it varies. A set is commonly implemented as a tree (red-black tree, balanced tree, etc.) though but there may be other implementations that exist.

正如其他人指出的那样,情况各不相同。集合通常实现为树(红黑树、平衡树等),但可能存在其他实现。

Also, how does insert() work for a set?

另外,insert() 如何为集合工作?

It depends on the underlying implementation of your set. If it is implemented as a binary tree, Wikipediahas a sample recursive implementation for the insert() function. You may want to check it out.

这取决于您的集合的底层实现。如果它被实现为二叉树,维基百科有一个 insert() 函数的示例递归实现。你可能想检查一下。

How does the set check whether an element already exists in it?

set 如何检查一个元素是否已经存在于其中?

If it is implemented as a tree, then it traverses the tree and check each element. However, sets do not allow duplicate elements to be stored though. If you want a set that allows duplicate elements, then you need a multiset.

如果它被实现为一棵树,那么它会遍历树并检查每个元素。但是,集合不允许存储重复的元素。如果你想要一个允许重复元素的集合,那么你需要一个多重集。

I read on wikipedia that another way to implement a set is with a hash table. How would this work?

我在维基百科上读到实现集合的另一种方法是使用哈希表。这将如何运作?

You may be referring to a hash_set, where the set is implemented using hash tables. You'll need to provide a hash function to know which location to store your element. This implementation is ideal when you want to be able to search for an element quickly. However, if it is important for your elements to be stored in particular order, then the tree implementation is more appropriate since you can traverse it preorder, inorder or postorder.

您可能指的是 hash_set,该集合是使用哈希表实现的。您需要提供一个散列函数来知道存储元素的位置。当您希望能够快速搜索元素时,此实现是理想的选择。但是,如果您的元素以特定顺序存储很重要,那么树实现更合适,因为您可以按前序、中序或后序遍历它。

回答by KTC

How a particular container is implemented in C++ is entirely implementation dependent. All that is required is for the result to meet the requirements set out in the standard, such as complexity requirement for the various methods, iterators requirements etc.

如何在 C++ 中实现特定容器完全取决于实现。所需要的只是结果满足标准中规定的要求,例如各种方法的复杂性要求、迭代器要求等。

回答by Timmmm

cppreference says:

cppreference 说

Sets are usually implemented as red-black trees.

集合通常被实现为红黑树。

I checked, and both libc++and libstdc++do use red-black trees for std::set.

我查了一下,都libc++libstdc++你使用红黑树的std::set

std::unordered_setwas implemented with a hash table in libc++and I presume the same for libstdc++but didn't check.

std::unordered_set是用哈希表实现的libc++,我认为相同libstdc++但没有检查。

Edit:Apparently my word is not good enough.

编辑:显然我的话不够好。

  • libc++: 12
  • libstdc++: 1
  • libc++: 1 2
  • libstdc++: 1

回答by jschmerge

Chiming in on this, because I did not see anyone explicitly mention it... The C++ standard does notspecify the data structure to use for std::set and std::map. What it does however specify is the run-time complexity of various operations. The requirements on computational complexity for the insert, delete and find operations more-or-less force an implementation to use a balanced tree algorithm.

补充一下,因为我没有看到任何人明确提到它...... C++ 标准没有指定用于 std::set 和 std::map 的数据结构。然而,它指定的是各种操作的运行时复杂性。对插入、删除和查找操作的计算复杂度的要求或多或少迫使实现使用平衡树算法。

There are two common algorithms for implementing balanced binary trees: Red-Black and AVL. Of the two, Red-Black is a little bit simpler of an implementation, requiring 1 less bit of storage per tree node (which hardly matters, since you're going to burn a byte on it in a simple implementation anyway), and is a little bit faster than AVL on node deletions (this is due to a more relaxed requirement on the balancing of the tree).

有两种常见的实现平衡二叉树的算法:Red-Black 和 AVL。在这两者中,Red-Black 的实现稍微简单一点,每个树节点需要少 1 位存储(这几乎无关紧要,因为无论如何您都将在一个简单的实现中在其上烧掉一个字节),并且是在节点删除上比 AVL 快一点(这是由于对树的平衡要求更宽松)。

All of this, combined with std::map's requirement that the key & datum are stored in an std::pair force this all upon you without explicitly naming the data structure you must use for the container.

所有这些,结合 std::map 的要求,即键和数据存储在 std::pair 中,这一切都强加给您,而无需明确命名您必须用于容器的数据结构。

This all, in turn is compounded by the c++14/17 supplemental features to the container that allow splicing of nodes from one tree into another.

这一切反过来又与容器的 c++14/17 补充特性相结合,这些特性允许将节点从一棵树拼接到另一棵树。