C++ 为什么无符号整数溢出定义了行为,但有符号整数溢出不是?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18195715/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why is unsigned integer overflow defined behavior but signed integer overflow isn't?
提问by Anthony Vallée-Dubois
Unsigned integer overflow is well defined by both the C and C++ standards. For example, the C99 standard(§6.2.5/9
) states
C 和 C++ 标准都很好地定义了无符号整数溢出。例如,C99 标准( §6.2.5/9
) 规定
A computation involving unsigned operands can never over?ow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
涉及无符号操作数的计算永远不会溢出,因为无法由结果无符号整数类型表示的结果会以比结果类型可以表示的最大值大 1 的数为模减少。
However, both standards state that signed integer overflow is undefined behavior. Again, from the C99 standard (§3.4.3/1
)
但是,这两个标准都声明有符号整数溢出是未定义的行为。再次,来自 C99 标准 ( §3.4.3/1
)
An example of unde?ned behavior is the behavior on integer over?ow
未定义行为的一个例子是整数溢出的行为
Is there an historical or (even better!) a technical reason for this discrepancy?
这种差异是否存在历史原因或(甚至更好!)技术原因?
采纳答案by Pascal Cuoq
The historical reason is that most C implementations (compilers) just used whatever overflow behaviour was easiest to implement with the integer representation it used. C implementations usually used the same representation used by the CPU - so the overflow behavior followed from the integer representation used by the CPU.
历史原因是大多数 C 实现(编译器)只使用最容易通过它使用的整数表示实现的溢出行为。C 实现通常使用 CPU 使用的相同表示 - 因此溢出行为遵循 CPU 使用的整数表示。
In practice, it is only the representations for signed values that may differ according to the implementation: one's complement, two's complement, sign-magnitude. For an unsigned type there is no reason for the standard to allow variation because there is only one obvious binary representation (the standard only allows binary representation).
在实践中,根据实现的不同,只有符号值的表示可能有所不同:一个的补码、二进制的补码、符号大小。对于无符号类型,标准没有理由允许变化,因为只有一种明显的二进制表示(标准只允许二进制表示)。
Relevant quotes:
相关引述:
C99 6.2.6.1:3:
C99 6.2.6.1:3:
Values stored in unsigned bit-fields and objects of type unsigned char shall be represented using a pure binary notation.
存储在无符号位域中的值和 unsigned char 类型的对象应使用纯二进制符号表示。
C99 6.2.6.2:2:
C99 6.2.6.2:2:
If the sign bit is one, the value shall be modified in one of the following ways:
— the corresponding value with sign bit 0 is negated (sign and magnitude);
— the sign bit has the value ?(2N) (two's complement);
— the sign bit has the value ?(2N? 1) (one's complement).
如果符号位为 1,则应按以下方式之一修改该值:
— 符号位为 0 的相应值取反(符号和幅度);
— 符号位的值为 ?(2 N) (二进制补码);
— 符号位的值为 ?(2 N? 1) (一个补码)。
Nowadays, all processors use two's complement representation, but signed arithmetic overflow remains undefined and compiler makers want it to remain undefined because they use this undefinedness to help with optimization. See for instance this blog postby Ian Lance Taylor or this complaintby Agner Fog, and the answers to his bug report.
如今,所有处理器都使用二进制补码表示,但有符号算术溢出仍然未定义,编译器制造商希望它保持未定义,因为他们使用这种未定义来帮助优化。例如,请参阅Ian Lance Taylor 的这篇博文或Agner Fog 的投诉,以及对他的错误报告的回答。
回答by Mats Petersson
Aside from Pascal's good answer (which I'm sure is the main motivation), it is also possible that some processors cause an exception on signed integer overflow, which of course would cause problems if the compiler had to "arrange for another behaviour" (e.g. use extra instructions to check for potential overflow and calculate differently in that case).
除了 Pascal 的好答案(我确定这是主要动机)之外,某些处理器也可能导致有符号整数溢出异常,如果编译器必须“安排另一种行为”,这当然会导致问题(例如,使用额外的指令来检查潜在的溢出并在这种情况下进行不同的计算)。
It is also worth noting that "undefined behaviour" doesn't mean "doesn't work". It means that the implementation is allowed to do whatever it likes in that situation. This includes doing "the right thing" as well as "calling the police" or "crashing". Most compilers, when possible, will choose "do the right thing", assuming that is relatively easy to define (in this case, it is). However, if you are having overflows in the calculations, it is important to understand what that actually results in, and that the compiler MAY do something other than what you expect (and that this may very depending on compiler version, optimisation settings, etc).
还值得注意的是,“未定义的行为”并不意味着“不起作用”。这意味着允许实现在那种情况下做任何它喜欢的事情。这包括做“正确的事情”以及“报警”或“撞车”。大多数编译器在可能的情况下会选择“做正确的事情”,假设这相对容易定义(在这种情况下,确实如此)。但是,如果您在计算中遇到溢出,重要的是要了解实际结果是什么,并且编译器可能会做您期望之外的事情(这可能非常取决于编译器版本、优化设置等) .
回答by Lundin
First of all, please note that C11 3.4.3, like all examples and foot notes, is not normative text and therefore not relevant to cite!
首先,请注意,C11 3.4.3 与所有示例和脚注一样,不是规范文本,因此与引用无关!
The relevant text that states that overflow of integers and floats is undefined behavior is this:
说明整数和浮点数溢出是未定义行为的相关文本是这样的:
C11 6.5/5
C11 6.5/5
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.
如果在对表达式求值期间出现异常情况(即,如果结果未在数学上定义或不在其类型的可表示值范围内),则行为未定义。
A clarification regarding the behavior of unsigned integer types specifically can be found here:
关于无符号整数类型行为的具体说明可以在这里找到:
C11 6.2.5/9
C11 6.2.5/9
The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same. A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
有符号整数类型的非负值范围是对应的无符号整数类型的一个子范围,每个类型中相同值的表示是相同的。涉及无符号操作数的计算永远不会溢出,因为无法由结果无符号整数类型表示的结果会以比结果类型可以表示的最大值大 1 的数为模减少。
This makes unsigned integer types a special case.
这使得无符号整数类型成为一种特殊情况。
Also note that there is an exception if any type is convertedto a signed type and the old value can no longer be represented. The behavior is then merely implementation-defined, although a signal may be raised.
另请注意,如果任何类型转换为有符号类型并且不能再表示旧值,则会出现异常。行为只是实现定义的,尽管可能会引发信号。
C11 6.3.1.3
C11 6.3.1.3
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
6.3.1.3 有符号和无符号整数
将整数类型的值转换为_Bool 以外的其他整数类型时,如果该值可以用新类型表示,则不变。
否则,如果新类型是无符号的,则通过重复加或减一个新类型可以表示的最大值来转换该值,直到该值在新类型的范围内。
否则,新类型是有符号的,值不能在其中表示;要么结果是实现定义的,要么引发实现定义的信号。
回答by supercat
In addition to the other issues mentioned, having unsigned math wrap makes the unsigned integer types behave as abstract algebraic groups (meaning that, among other things, for any pair of values X
and Y
, there will exist some other value Z
such that X+Z
will, if properly cast, equal Y
and Y-Z
will, if properly cast, equal X
). If unsigned values were merely storage-location types and not intermediate-expression types (e.g. if there were no unsigned equivalent of the largest integer type, and arithmetic operations on unsigned types behaved as though they were first converted them to larger signed types, then there wouldn't be as much need for defined wrapping behavior, but it's difficult to do calculations in a type which doesn't have e.g. an additive inverse.
除了其他问题所提到的,具有无符号数包,使无符号整数类型表现为抽象代数群(这意味着,除其他事项外,对于任何一对值X
和Y
,将存在一些其他的价值Z
,使得X+Z
意志,如果得到适当的投, equalY
并且Y-Z
will,如果正确转换,等于X
)。如果无符号值仅仅是存储位置类型而不是中间表达式类型(例如,如果没有最大整数类型的无符号等价物,并且对无符号类型的算术运算表现得好像它们首先被转换为更大的有符号类型,那么不需要定义的包装行为,但很难在没有加法逆的类型中进行计算。
This helps in situations where wrap-around behavior is actually useful - for example with TCP sequence numbers or certain algorithms, such as hash calculation. It may also help in situations where it's necessary to detect overflow, since performing calculations and checking whether they overflowed is often easier than checking in advance whether they would overflow, especially if the calculations involve the largest available integer type.
这在环绕行为实际有用的情况下很有帮助 - 例如对于 TCP 序列号或某些算法,例如哈希计算。在需要检测溢出的情况下也可能有所帮助,因为执行计算并检查它们是否溢出通常比预先检查它们是否会溢出更容易,特别是如果计算涉及最大的可用整数类型。
回答by yth
Perhaps another reason for why unsigned arithmetic is defined is because unsigned numbers form integers modulo 2^n, where n is the width of the unsigned number. Unsigned numbers are simply integers represented using binary digits instead of decimal digits. Performing the standard operations in a modulus system is well understood.
也许定义无符号算术的另一个原因是因为无符号数形成整数模 2^n,其中 n 是无符号数的宽度。无符号数只是使用二进制数字而不是十进制数字表示的整数。在模数系统中执行标准操作是很好理解的。
The OP's quote refers to this fact, but also highlights the fact that there is only one, unambiguous, logical way to represent unsigned integers in binary. By contrast, Signed numbers are most often represented using two's complement but other choices are possible as described in the standard (section 6.2.6.2).
OP 的引用引用了这一事实,但也强调了这样一个事实,即只有一种明确的逻辑方式可以用二进制表示无符号整数。相比之下,有符号数最常使用二进制补码表示,但其他选择也是可能的,如标准中所述(第 6.2.6.2 节)。
Two's complement representation allows certain operations to make more sense in binary format. E.g., incrementing negative numbers is the same that for positive numbers (expect under overflow conditions). Some operations at the machine level can be the same for signed and unsigned numbers. However, when interpreting the result of those operations, some cases don't make sense - positive and negative overflow. Furthermore, the overflow results differ depending on the underlying signed representation.
二进制补码表示允许某些操作在二进制格式中更有意义。例如,增加负数与增加正数相同(预计在溢出条件下)。对于有符号数和无符号数,机器级别的某些操作可能相同。然而,在解释这些操作的结果时,有些情况是没有意义的——正溢出和负溢出。此外,溢出结果因底层有符号表示而异。