C++ list::size() 真的是 O(n) 吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/228908/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 13:53:31  来源:igfitidea点击:

Is list::size() really O(n)?

c++liststlcomplexity-theorybig-o

提问by foraidt

Recently, I noticed some people mentioning that std::list::size()has a linear complexity.
According to somesources, this is in fact implementation dependent as the standard doesn't say what the complexity has to be.
The comment in this blog entrysays:

最近,我注意到有人提到它std::list::size()具有线性复杂性。
根据一些消息来源,这实际上取决于实现,因为标准没有说明复杂性是什么。此博客条目中
的评论说:

Actually, it depends on which STL you are using. Microsoft Visual Studio V6 implements size() as {return (_Size); } whereas gcc (at least in versions 3.3.2 and 4.1.0) do it as { return std::distance(begin(), end()); } The first has constant speed, the second has o(N) speed

实际上,这取决于您使用的是哪种 STL。Microsoft Visual Studio V6 将 size() 实现为 {return (_Size); 而 gcc(至少在 3.3.2 和 4.1.0 版本中)将其作为 { return std::distance(begin(), end()); 第一个具有恒定速度,第二个具有 o(N) 速度

  1. So my guess is that for the VC++ crowd size()has constant complexity as Dinkumware probably won't have changed that fact since VC6. Am I right there?
  2. What does it look like currently in gcc? If it is really O(n), why did the developers choose to do so?
  1. 所以我的猜测是,对于 VC++ 人群来说,它size()具有持续的复杂性,因为 Dinkumware 自 VC6 以来可能不会改变这一事实。我在吗?
  2. 它现在是gcc什么样子?如果真的是O(n),为什么开发者会选择这样做?

采纳答案by Michael Burr

Pre-C++11 answer

Pre-C++11 答案

You are correct that the standard does not state what the complexity of list::size()must be - however, it does recommend that it "should have constant complexity" (Note A in Table 65).

您是正确的,标准没有说明list::size()必须是什么复杂性- 但是,它确实建议它“应该具有恒定的复杂性”(表 65 中的注释 A)。

Here's an interesting article by Howard Hinnantthat explains why some people think list::size()should have O(N) complexity (basically because they believe that O(1) list::size()makes list::splice()have O(N) complexity) and why an O(1) list::size()is be a good idea (in the author's opinion):

这是一个被霍华德Hinnant(欣南特)一个有趣的文章,解释为什么有些人认为list::size()应该有O(N)的复杂性(主要是因为他们认为,O(1)list::size()品牌list::splice()有O(N)的复杂性),为什么一个O(1)list::size()是个好主意(作者认为):

I think the main points in the paper are:

我认为论文的主要观点是:

  • there are few situations where maintaining an internal count so list::size()can be O(1) causes the splice operation to become linear
  • there are probably many more situations where someone might be unaware of the negative effects that might happen because they call an O(N) size()(such as his one example where list::size()is called while holding a lock).
  • that instead of permitting size()be O(N), in the interest of 'least surprise', the standard should require any container that implements size()to implement it in an O(1) fashion. If a container cannot do this, it should not implement size()at all. In this case, the user of the container will be made aware that size()is unavailable, and if they still want or need to get the number of elements in the container they can still use container::distance( begin(), end())to get that value - but they will be completely aware that it's an O(N) operation.
  • 很少有保持内部计数的情况,所以list::size()可以是 O(1) 导致拼接操作变为线性
  • 可能还有更多的情况,某人可能不知道可能发生的负面影响,因为他们调用了 O(N) size()(例如他的一个示例,其中list::size()在持有锁时调用)。
  • 为了size()“最小惊喜”,标准应该要求任何容器size()以 O(1) 方式实现它,而不是允许O(N) 。如果一个容器不能做到这一点,它根本就不应该实现size()。在这种情况下,容器的用户将知道这size()是不可用的,如果他们仍然想要或需要获取容器中的元素数量,他们仍然可以container::distance( begin(), end())用来获取该值 - 但他们将完全意识到这是O(N) 操作。

I think I tend to agree with most of his reasoning. However, I do not like his proposed addition to the splice()overloads. Having to pass in an nthat must be equal to distance( first, last)to get correct behavior seems like a recipe for hard to diagnose bugs.

我想我倾向于同意他的大部分推理。但是,我不喜欢他提出的对splice()重载的补充。必须传入n必须等于才能distance( first, last)获得正确行为的方法似乎是难以诊断错误的秘诀。

I'm not sure what should or could be done moving forward, as any change would have a significant impact on existing code. But as it stands, I think that existing code is already impacted - behavior might be rather significantly different from one implementation to another for something that should have been well-defined. Maybe onebyone's comment about having the size 'cached' and marked known/unknown might work well - you get amortized O(1) behavior - the only time you get O(N) behavior is when the list is modified by some splice() operations. The nice thing about this is that it can be done by implementors today without a change to the standard (unless I'm missing something).

我不确定应该或可以做什么,因为任何更改都会对现有代码产生重大影响。但就目前而言,我认为现有代码已经受到影响 - 对于本应明确定义的某些内容,从一种实现到另一种实现的行为可能会有相当大的不同。也许某人关于将大小“缓存”并标记为已知/未知的评论可能会很好地工作 - 您会得到分摊 O(1) 行为 - 唯一一次获得 O(N) 行为是当列表被某些 splice() 操作修改时. 这样做的好处是,它可以由今天的实现者完成,而无需更改标准(除非我遗漏了一些东西)。

As far as I know, C++0x is not changing anything in this area.

据我所知,C++0x 在这方面没有改变任何东西。

回答by kennytm

In C++11 it is required that for anystandard container the .size()operation must be complete in "constant" complexity (O(1)). (Table 96 — Container requirements). Previously in C++03 .size()shouldhave constant complexity, but is not required (see Is std::string size() a O(1) operation?).

在 C++11 中,要求对于任何标准容器,.size()操作必须以“恒定”复杂度 (O(1)) 完成。(表 96 — 集装箱要求)。以前在 C++03 中.size()应该具有恒定的复杂性,但不是必需的(请参阅Is std::string size() a O(1) operation?)。

The change in standard is introduced by n2923: Specifying the complexity of size() (Revision 1).

n2923引入了标准的变化:指定 size() 的复杂性(修订版 1)

However, the implementation of .size()in libstdc++ still uses an O(N) algorithm in gcc up to 4.8:

但是,.size()在 libstdc++ 中的实现仍然使用了 gcc 中高达 4.8 的 O(N) 算法:

  /**  Returns the number of elements in the %list.  */
  size_type
  size() const _GLIBCXX_NOEXCEPT
  { return std::distance(begin(), end()); }

See also Why is std::list bigger on c++11?for detail why it is kept this way.

另请参阅为什么 c++11 上的 std::list 更大?详细了解为什么保持这种方式。

Update: std::list::size()is properly O(1)when using gcc 5.0in C++11 mode (or above).

更新std::list::size()正确O(1)使用gcc时5.0在C ++ 11模式(或以上)。



By the way, the .size()in libc++ is correctly O(1):

顺便说一句,.size()在 libc++ 中正确的是 O(1):

_LIBCPP_INLINE_VISIBILITY
size_type size() const _NOEXCEPT     {return base::__sz();}

...

__compressed_pair<size_type, __node_allocator> __size_alloc_;

_LIBCPP_INLINE_VISIBILITY
const size_type& __sz() const _NOEXCEPT
    {return __size_alloc_.first();}

回答by introp

I've had to look into gcc 3.4's list::sizebefore, so I can say this:

我之前不得不研究 gcc 3.4 list::size,所以我可以这样说:

  1. It uses std::distance(head, tail).
  2. std::distancehas two implementations: for types that satisfy RandomAccessIterator, it uses "tail-head", and for types that merely satisfy InputIterator, it uses an O(n) algorithm relying on "iterator++", counting until it hits the given tail.
  3. std::listdoes not satisfy RandomAccessIterator, so size is O(n).
  1. 它使用std::distance(head, tail).
  2. std::distance有两种实现:对于满足RandomAccessIterator 的类型,它使用“tail-head”,对于仅满足InputIterator 的类型,它使用依赖于“iterator++”的 O(n) 算法,计数直到它碰到给定的尾部。
  3. std::list不满足RandomAccessIterator,所以大小是 O(n)。

As to the "why", I can only say that std::listis appropriate for problems that require sequential access. Storing the size as a class variable would introduce overhead on every insert, delete, etc., and that waste is a big no-no per the intent of the STL. If you really need a constant-time size(), use std::deque.

至于“为什么”,我只能说std::list适合需要顺序访问的问题。将大小存储为类变量会在每次插入、删除等时引入开销,并且根据 STL 的意图,这种浪费是一个很大的禁忌。如果您确实需要一个常量时间size(),请使用std::deque.

回答by Greg Rogers

I personally don't see the issue with splice being O(N) as the only reason why size is permitted to be O(N). You don't pay for what you don't useis an important C++ motto. In this case, maintaining the list size requires an extra increment/decrement on every insert/erase whether you check the list's size or not. This is a small fixed overhead, but its still important to consider.

我个人不认为拼接 O(N) 的问题是允许大小为 O(N) 的唯一原因。你不为你不使用的东西付费是一个重要的 C++ 座右铭。在这种情况下,无论您是否检查列表的大小,保持列表大小都需要在每次插入/擦除时进行额外的增量/减量。这是一个很小的固定开销,但仍然需要考虑。

Checking the size of a list is rarely needed. Iterating from begin to end without caring the total size is infinitely more common.

很少需要检查列表的大小。从头到尾迭代而不关心总大小是非常常见的。

回答by Yuval F

I would go to the source(archive). SGI's STL page says that it is permitted to have a linear complexity. I believe that the design guideline they followed was to allow the list implementation to be as general as possible, and thus to allow more flexibility in using lists.

我会去存档)。SGI 的 STL 页面说它允许具有线性复杂性。我相信他们遵循的设计准则是允许列表实现尽可能通用,从而在使用列表时提供更大的灵活性。

回答by nobar

This bug report: [C++0x] std::list::size complexity, captures in excruciating detail the fact that the implementation in GCC 4.x is linear time and how the transition to constant time for C++11 was slow in coming (available in 5.0) due to ABI compatibility concerns.

这个错误报告:[C++0x] std::list::size complex,捕捉到了一个事实,即 GCC 4.x 中的实现是线性时间以及 C++11 向恒定时间的转换是如何缓慢的由于 ABI 兼容性问题,即将推出(在 5.0 中可用)。

The manpage for the GCC 4.9 series still includes the following disclaimer:

GCC 4.9 系列的联机帮助页仍然包含以下免责声明:

Support for C++11 is still experimental, and may change in incompatible ways in future releases.

对 C++11 的支持仍处于试验阶段,可能会在未来版本中以不兼容的方式进行更改。



The same bug report is referenced here: Should std::list::size have constant complexity in C++11?

此处引用了相同的错误报告:应在 C++11 中 std::list::size 具有恒定的复杂性吗?

回答by Luke Givens

If you are correctly using lists you aren't probably noticing any difference.

如果您正确使用列表,您可能不会注意到任何区别。

Lists are good with big data structures that you want to rearrange without copying, of for data you want to keep valid pointers after insertion.

列表适用于您想在不复制的情况下重新排列的大数据结构,对于您想在插入后保留有效指针的数据。

In the first case it makes no difference, in the second i would prefer the old (smaller) size() implementation.

在第一种情况下它没有区别,在第二种情况下我更喜欢旧的(较小的) size() 实现。

Anyway std is more about correctness and standard behavious and 'user friendlyness' than raw speed.

无论如何,标准比原始速度更多地是关于正确性和标准行为以及“用户友好性”。