C++11 中 COW std::string 实现的合法性
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12199710/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Legality of COW std::string implementation in C++11
提问by acm
It had been my understanding that copy-on-write is not a viable way to implement a conforming std::string
in C++11, but when it came up in discussion recently I found myself unable to directly support that statement.
我的理解是写时复制不是std::string
在 C++11 中实现合规的可行方法,但是当它最近在讨论中出现时,我发现自己无法直接支持该声明。
Am I correct that C++11 does not admit COW based implementations of std::string
?
我说 C++11 不承认基于 COW 的实现是正确的std::string
吗?
If so, is this restriction explicitly stated somewhere in the new standard (where)?
如果是这样,此限制是否在新标准(where)中的某处明确说明?
Or is this restriction implied, in the sense that it is the combined effect of the new requirements on std::string
that precludes a COW based implementation of std::string
. In this case, I'd be interested in a chapter and verse style derivation of 'C++11 effectively prohibits COW based std::string
implementations'.
或者这个限制是否隐含,从某种意义上说,新要求的综合影响std::string
排除了基于 COW 的std::string
. 在这种情况下,我会对“C++11 有效禁止基于 COW 的std::string
实现”的章节和诗句风格派生感兴趣。
回答by Dave S
It's not allowed, because as per the standard 21.4.1 p6, invalidation of iterators/references is only allowed for
这是不允许的,因为根据标准 21.4.1 p6,迭代器/引用的失效只允许
— as an argument to any standard library function taking a reference to non-const basic_string as an argument.
— Calling non-const member functions, except operator[], at, front, back, begin, rbegin, end, and rend.
— 作为任何标准库函数的参数,将非常量 basic_string 的引用作为参数。
— 调用非常量成员函数,operator[]、at、front、back、begin、rbegin、end 和 rend 除外。
For a COW string, calling non-const operator[]
would require making a copy (and invalidating references), which is disallowed by the paragraph above. Hence, it's no longer legal to have a COW string in C++11.
对于 COW 字符串,调用非常量operator[]
将需要进行复制(并使引用无效),这是上一段所不允许的。因此,在 C++11 中使用 COW 字符串不再合法。
回答by Jonathan Wakely
The answers by Dave Sand gbjbaanbare correct. (And Luc Danton's is correct too, although it's more a side-effect of forbidding COW strings rather than the original rule that forbids it.)
通过这些问题的答案戴夫小号和 gbjbaanb是正确的。(而且 Luc Danton 的也是正确的,尽管它更多是禁止 COW 字符串的副作用,而不是禁止它的原始规则。)
But to clear up some confusion, I'm going to add some further exposition. Various comments link to a comment of mine on the GCC bugzillawhich gives the following example:
但是为了澄清一些混乱,我将添加一些进一步的说明。各种评论链接到我对 GCC bugzilla 的评论,其中给出了以下示例:
std::string s("str");
const char* p = s.data();
{
std::string s2(s);
(void) s[0];
}
std::cout << *p << '\n'; // p is dangling
The point of that example is to demonstrate why GCC's reference counted (COW) string is not valid in C++11. The C++11 standard requires this code to work correctly. Nothing in the code permits the p
to be invalidated in C++11.
该示例的重点是演示为什么 GCC 的引用计数 (COW) 字符串在 C++11 中无效。C++11 标准要求此代码正常工作。代码中的p
任何内容都不允许在 C++11 中无效。
Using GCC's old reference-counted std::string
implementation, that code has undefined behaviour, because p
isinvalidated, becoming a dangling pointer. (What happens is that when s2
is constructed it shares the data with s
, but obtaining a non-const reference via s[0]
requires the data to be unshared, so s
does a "copy on write" because the reference s[0]
could potentially be used to write into s
, then s2
goes out of scope, destroying the array pointed to by p
).
使用GCC的老引用计数的std::string
实现,该代码具有不确定的行为,因为p
是无效的,成为一个悬摆指针。(发生的情况是,在s2
构造时,它与 共享数据s
,但是通过获取非常量引用s[0]
需要取消共享数据,因此s
“写时复制”也是如此,因为该引用s[0]
可能用于写入s
,然后s2
转到超出范围,破坏由p
)指向的数组。
The C++03 standard explicitly permits that behaviourin 21.3 [lib.basic.string] p5 where it says that subsequent to a call to data()
the first call to operator[]()
may invalidate pointers, references and iterators. So GCC's COW string was a valid C++03 implementation.
C++03 标准在 21.3 [lib.basic.string] p5 中明确允许这种行为,它说在调用data()
第一次调用之后operator[]()
可能会使指针、引用和迭代器无效。所以 GCC 的 COW 字符串是一个有效的 C++03 实现。
The C++11 standard no longer permitsthat behaviour, because no call to operator[]()
may invalidate pointers, references or iterators, irrespective of whether they follow a call to data()
.
C++11 标准不再允许这种行为,因为没有调用operator[]()
可能会使指针、引用或迭代器无效,无论它们是否跟在对 的调用之后data()
。
So the example above mustwork in C++11, but does notwork with libstdc++'s kind of COW string, therefore that kind of COW string is not permitted in C++11.
所以上面的例子必须在 C++11中工作,但不适用于 libstdc++ 的那种 COW 字符串,因此这种 COW 字符串在 C++11 中是不允许的。
回答by gbjbaanb
It is, CoW is an acceptable mechanism for making faster strings... but...
确实,CoW 是一种可以接受的制作更快字符串的机制……但是……
it makes multithreading code slower (all that locking to check if you're the only one writing kills performance when using a lot of strings). This was the main reason CoW was killed off years ago.
它使多线程代码变慢(所有这些锁定来检查您是否是唯一一个在使用大量字符串时会降低性能的人)。这是 CoW 多年前被杀的主要原因。
The other reasons are that the []
operator will return you the string data, without any protection for you to overwrite a string someone else expects to be unchanging. The same applies to c_str()
and data()
.
另一个原因是[]
操作员将返回字符串数据,而没有任何保护措施让您覆盖其他人希望保持不变的字符串。这同样适用于c_str()
和data()
。
Quick google says that the multithreading is basically the reason it was effectively disallowed(not explicitly).
Quick google 说多线程基本上是它被有效禁止(未明确)的原因。
The proposal says :
提案说:
Proposal
We propose to make all iterator and element access operations safely concurrently executable.
We are increasing the stability of operations even in sequential code.
This change effectively disallows copy-on-write implementations.
提议
我们建议使所有迭代器和元素访问操作安全地并发执行。
即使在顺序代码中,我们也在提高操作的稳定性。
此更改有效地禁止了写时复制实现。
followed by
其次是
The largest potential loss in performance due to a switch away from copy-on-write implementations is the increased consumption of memory for applications with very large read-mostly strings. However, we believe that for those applications ropes are a better technical solution, and recommend a rope proposal be considered for inclusion in Library TR2.
由于从写时复制实现的切换而导致的最大潜在性能损失是具有非常大的主要读取字符串的应用程序的内存消耗增加。但是,我们认为对于这些应用,绳索是更好的技术解决方案,并建议考虑将绳索提案纳入库 TR2。
Ropesare part of STLPort and SGIs STL.
绳索是 STLPort 和 SGIs STL 的一部分。
回答by Luc Danton
From 21.4.2 basic_string constructors and assignment operators [string.cons]
来自 21.4.2 basic_string 构造函数和赋值运算符 [string.cons]
basic_string(const basic_string<charT,traits,Allocator>& str);
[...]
2 Effects: Constructs an object of class
basic_string
as indicated in Table 64. [...]
basic_string(const basic_string<charT,traits,Allocator>& str);
[...]
2效果:构造一个类的对象,
basic_string
如表 64 所示。 [...]
Table 64 helpfully documents that after construction of an object via this (copy) constructor, this->data()
has as value:
表 64 有用地记录了在通过此(复制)构造函数构造对象后,this->data()
具有以下值:
points at the ?rst element of an allocated copy of the array whose ?rst element is pointed at by str.data()
指向第一个元素由 str.data() 指向的数组的已分配副本的第一个元素
There are similar requirements for other similar constructors.
其他类似的构造函数也有类似的要求。
回答by Dirk Holsopple
Since it is now guaranteed that strings are stored contiguously and you are now allowed to take a pointer to the internal storage of a string, (i.e. &str[0] works like it would for an array), it's not possible to make a useful COW implementation. You would have to make a copy for way too many things. Even just using operator[]
or begin()
on a non-const string would require a copy.
由于现在可以保证字符串是连续存储的,并且您现在可以使用指向字符串内部存储的指针(即 &str[0] 的工作方式类似于数组),因此不可能制作有用的 COW执行。您将不得不为太多东西制作副本。即使只是在非常量字符串上使用operator[]
或begin()
也需要一个副本。
回答by Cheers and hth. - Alf
Is COW basic_string
prohibited in C++11 and later?
basic_string
C++11 及更高版本中是否禁止使用COW ?
Regarding
关于
”Am I correct that C++11 does not admit COW based implementations of
std::string
?
“我说 C++11 不承认基于 COW 的实现是正确的
std::string
吗?
Yes.
是的。
Regarding
关于
”If so, is this restriction explicitly stated somewhere in the new standard (where)?
”如果是这样,这个限制是否在新标准(where)的某处明确说明?
Almostdirectly, by requirements of constant complexity for a number of operations that would require O(n) physical copying of the string data in a COW implementation.
几乎直接地,由于需要对COW 实现中的字符串数据进行 O(n) 物理复制的许多操作的恒定复杂性要求。
For example, for the member functions
例如,对于成员函数
auto operator[](size_type pos) const -> const_reference;
auto operator[](size_type pos) -> reference;
… which in a COW implementation would 1both trigger string data copying to un-share the string value, the C++11 standard requires
...在 COW 实现中 1both 会触发字符串数据复制以取消共享字符串值,C++11 标准要求
C++11 §21.4.5/4C++11 §21.4.5/4:”Complexity:constant time.
”复杂性:恒定时间。
… which rules out such data copying, and hence, COW.
……排除了这种数据复制,因此排除了 COW。
C++03 supported COW implementations by nothaving these constant complexity requirements, and by, under certain restrictive conditions, allowing calls to operator[]()
, at()
, begin()
, rbegin()
, end()
, or rend()
to invalidate references, pointers and iterators referring to the string items, i.e. to possibly incur a COW data copying. This support was removed in C++11.
C++03 支持 COW 实现,因为没有这些恒定的复杂性要求,并且在某些限制条件下,允许调用operator[]()
, at()
, begin()
, rbegin()
, end()
, 或rend()
使引用字符串项的引用、指针和迭代器无效,即可能导致COW 数据复制。C++11 中删除了此支持。
Is COW also prohibited via the C++11 invalidation rules?
C++11 失效规则是否也禁止了 COW?
In another answer which at the time of writing is selected as solution, and which is heavily upvoted and therefore apparently believed, it's asserted that
在撰写本文时被选为解决方案的另一个答案中,该答案得到了大量支持,因此显然相信,它断言
”For a COW string, calling non-
const
operator[]
would require making a copy (and invalidating references), which is disallowed by the [quoted] paragraph above [C++11 §21.4.1/6]. Hence, it's no longer legal to have a COW string in C++11.
”对于 COW 字符串,调用 non-
const
operator[]
将需要进行复制(并使引用无效),这是上面 [C++11 §21.4.1/6] 上的 [quoted] 段落所不允许的。因此,在 C++11 中使用 COW 字符串不再合法。
That assertion is incorrect and misleading in two main ways:
该断言在两个主要方面是不正确和具有误导性的:
- It incorrectly indicates that only the non-
const
item accessors need to trigger a COW data copying.
But also theconst
item accessors need to trigger data copying, because they allow client code to form references or pointers that (in C++11) it's not permitted to invalidate later via the operations that can trigger COW data copying. - It incorrectly assumes that COW data copying can cause reference invalidation.
But in a correct implementation COW data copying, un-sharing the string value, is done at a point before there are any references that can be invalidated.
- 它错误地表明只有非
const
项目访问者需要触发 COW 数据复制。
但是const
项目访问器也需要触发数据复制,因为它们允许客户端代码形成引用或指针(在 C++11 中)它不允许稍后通过可以触发 COW 数据复制的操作无效。 - 它错误地假设 COW 数据复制会导致引用失效。
但是在正确的实现中,COW 数据复制、取消共享字符串值是在存在任何可能无效的引用之前的某个时间点完成的。
To see how a correct C++11 COW implementation of basic_string
would work, when the O(1) requirements that make this invalid are ignored, think of an implementation where a string can switch between ownership policies. A string instance starts out with policy Sharable. With this policy active there can be no external item references. The instance can transition to Unique policy, and it must do so when an item reference is potentially created such as with a call to .c_str()
(at least if that produces a pointer to the internal buffer). In the general case of multiple instances sharing ownership of the value, this entails copying the string data. After that transition to Unique policy the instance can only transition back to Sharable by an operation that invalidates all references, such as assignment.
要了解正确的 C++11 COW 实现如何basic_string
工作,当忽略使该无效的 O(1) 要求时,请考虑一个字符串可以在所有权策略之间切换的实现。字符串实例以策略 Shareable 开始。在此策略处于活动状态时,不能有外部项目引用。实例可以转换为 Unique 策略,并且它必须在可能创建项引用时这样做,例如调用.c_str()
(至少如果这会产生指向内部缓冲区的指针)。在多个实例共享值所有权的一般情况下,这需要复制字符串数据。在转换到唯一策略之后,实例只能通过使所有引用无效的操作(例如赋值)转换回可共享。
So, while that answer's conclusion, that COW strings are ruled out, is correct, the reasoning offered is incorrect and strongly misleading.
因此,虽然排除 COW 字符串的答案的结论是正确的,但提供的推理是不正确的并且具有很强的误导性。
I suspect the cause of this misunderstanding is a non-normative note in C++11's annex C:
我怀疑造成这种误解的原因是 C++11 的附录 C 中的一个非规范性注释:
C++11 §C.2.11 [diff.cpp03.strings], about §21.3:C++11 §C.2.11 [diff.cpp03.strings],关于 §21.3:Change:
basic_string
requirements no longer allow reference-counted strings
Rationale:Invalidation is subtly different with reference-counted strings. This change regularizes behavor (sic) for this International Standard.
Effect on original feature:Valid C ++ 2003 code may execute differently in this International Standard
更改:
basic_string
要求不再允许引用计数的字符串
理由:无效与引用计数的字符串略有不同。这一变化规范了本国际标准的行为(原文如此)。
对原始特性的影响:有效的 C++ 2003 代码在本国际标准中可能以不同的方式执行
Here the rationaleexplains the primary whyone decided to remove the C++03 special COW support. This rationale, the why, is not howthe standard effectively disallows COW implementation. The standard disallows COW via the O(1) requirements.
这里的理由解释了主为什么一个决定删除C ++ 03特牛的支持。这个基本原理,原因,并不是标准如何有效地禁止 COW 实施。该标准通过 O(1) 要求不允许 COW。
In short, the C++11 invalidation rules don't rule out a COW implementation of std::basic_string
. But they do rule out a reasonably efficient unrestricted C++03-style COW implementation like the one in at least one of g++'s standard library implementations. The special C++03 COW support allowed practical efficiency, in particular using const
item accessors, at the cost of subtle, complex rules for invalidation:
简而言之,C++11 失效规则不排除std::basic_string
. 但是他们确实排除了一种合理有效的不受限制的 C++03 风格的 COW 实现,就像在至少一个 g++ 标准库实现中的实现一样。特殊的 C++03 COW 支持允许实际效率,特别是使用const
项目访问器,以微妙的、复杂的无效规则为代价:
”References, pointers, and iterators referring to the elements of a
basic_string
sequence may be invalidated by the following uses of thatbasic_string
object:
— As an argument to non-member functionsswap()
(21.3.7.8),operator>>()
(21.3.7.9), andgetline()
(21.3.7.9).
— As an argument tobasic_string::swap()
.
— Callingdata()
andc_str()
member functions.
— Calling non-const
member functions, exceptoperator[]()
,at()
,begin()
,rbegin()
,end()
, andrend()
.
— Subsequent to any of the above uses except the forms ofinsert()
anderase()
which return iterators, the first call to non-const
member functionsoperator[]()
,at()
,begin()
,rbegin()
,end()
, orrend()
.
”引用
basic_string
序列元素的引用、指针和迭代器可能会因该basic_string
对象的以下用途而失效:
— 作为非成员函数的参数swap()
(21.3.7.8)、operator>>()
(21.3.7.9) 和getline()
(21.3. 7.9)。
— 作为 的论据basic_string::swap()
。
— 调用data()
和c_str()
成员函数。
-调用非const
成员函数,除了operator[]()
,at()
,begin()
,rbegin()
,end()
,和rend()
。
— 在上述任何一种使用之后,除了返回迭代器的insert()
和形式之外erase()
,第一次调用非const
成员函数operator[]()
,at()
,begin()
,rbegin()
,end()
,或rend()
。
These rules are so complex and subtle that I doubt many programmers, if any, could give a precise summary. I could not.
这些规则是如此复杂和微妙,以至于我怀疑许多程序员(如果有的话)能否给出准确的总结。我不能。
What if O(1) requirements are disregarded?
如果 O(1) 要求被忽视怎么办?
If the C++11 constant time requirements on e.g. operator[]
are disregarded, then COW for basic_string
could be technically feasible, but difficult to implement.
如果operator[]
忽略对 eg 的 C++11 常量时间要求,那么 COW forbasic_string
可能在技术上可行,但难以实现。
Operations which could access the contents of a string without incurring COW data copying include:
可以访问字符串内容而不引起 COW 数据复制的操作包括:
- Concatenation via
+
. - Output via
<<
. - Using a
basic_string
as argument to standard library functions.
- 通过连接
+
。 - 通过 输出
<<
。 - 使用 a
basic_string
作为标准库函数的参数。
The latter because the standard library is permitted to rely on implementation specific knowledge and constructs.
后者是因为标准库被允许依赖于实现特定的知识和构造。
Additionally an implementation could offer various non-standard functions for accessing string contents without triggering COW data copying.
此外,一个实现可以提供各种非标准函数来访问字符串内容而不触发 COW 数据复制。
A main complicating factor is that in C++11 basic_string
item access must trigger data copying (un-sharing the string data) but is required to not throw, e.g. C++11 §21.4.5/3 “Throws:Nothing.”. And so it can't use ordinary dynamic allocation to create a new buffer for COW data copying. One way around this is to use a special heap where memory can be reservedwithout being actually allocated, and then reserve the requisite amount for each logical reference to a string value. Reserving and un-reserving in such a heap can be constant time, O(1), and allocating the amount that one has already reserved, can be noexcept
. In order to comply with the standard's requirements, with this approach it seems there would need to be one such special reservation-based heap per distinct allocator.
一个主要的复杂因素是,在 C++11 中,basic_string
项访问必须触发数据复制(取消共享字符串数据)但要求不 throw,例如 C++11 §21.4.5/3 “ Throws:Nothing.”。所以它不能使用普通的动态分配来创建一个新的缓冲区用于 COW 数据复制。解决这个问题的一种方法是使用一个特殊的堆,可以在其中保留内存而无需实际分配,然后为每个对字符串值的逻辑引用保留必要的数量。在这样的堆中保留和取消保留可以是常数时间,O(1),并且分配已经保留的数量,可以是noexcept
. 为了符合标准的要求,使用这种方法似乎每个不同的分配器都需要一个这样的基于保留的特殊堆。
Notes:
1 The const
item accessor triggers a COW data copying because it allows the client code to obtain a reference or pointer to the data, which it's not permitted to invalidate by a later data copying triggered by e.g. the non-const
item accessor.
注意:
1const
项目访问器触发 COW 数据复制,因为它允许客户端代码获取数据的引用或指针,不允许由非const
项目访问器触发的稍后数据复制使其无效。
回答by zzz777
I was always wondering about immutable cows: once cow is created I could be changed only through assignment from another cow, hence it will be compliant with the standard.
我一直想知道不可变的母牛:一旦创建了母牛,我只能通过另一头母牛的分配来更改,因此它将符合标准。
I had time to try it today for a simple comparison test: a map of size N keyed by string/cow with every node holding a set of all strings in the map (we have NxN number of objects).
我今天有时间尝试一个简单的比较测试:一个大小为 N 的映射,由 string/cow 键控,每个节点都持有映射中所有字符串的集合(我们有 NxN 个对象)。
With strings sized ~300 bytes and N=2000 cows are slightly faster and use almost order of magnitude less memory. See below, sizes are in kbs, run b is with cows.
字符串大小约为 300 字节且 N=2000 奶牛会稍微快一些,并且使用的内存几乎要少一个数量级。见下文,大小以 kbs 为单位,运行 b 是奶牛。
~/icow$ ./tst 2000
preparation a
run
done a: time-delta=6 mem-delta=1563276
preparation b
run
done a: time-delta=3 mem-delta=186384