C++ 有效的无符号到有符号转换避免实现定义的行为
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13150449/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Efficient unsigned-to-signed cast avoiding implementation-defined behavior
提问by Nemo
I want to define a function that takes an unsigned int
as argument and returns an int
congruent modulo UINT_MAX+1 to the argument.
我想定义一个函数,该函数将一个unsigned int
作为参数并返回一个int
全等模 UINT_MAX+1 给该参数。
A first attempt might look like this:
第一次尝试可能如下所示:
int unsigned_to_signed(unsigned n)
{
return static_cast<int>(n);
}
But as any language lawyer knows, casting from unsigned to signed for values larger than INT_MAX is implementation-defined.
但是任何语言律师都知道,对于大于 INT_MAX 的值,从无符号转换为有符号是实现定义的。
I want to implement this such that (a) it only relies on behavior mandated by the spec; and (b) it compiles into a no-op on any modern machine and optimizing compiler.
我想实现这一点,以便(a)它只依赖于规范规定的行为;(b) 它在任何现代机器和优化编译器上编译为无操作。
As for bizarre machines... If there is no signed int congruent modulo UINT_MAX+1 to the unsigned int, let's say I want to throw an exception. If there is more than one (I am not sure this is possible), let's say I want the largest one.
至于奇怪的机器......如果没有签名的int全等模UINT_MAX+1到无符号的int,假设我想抛出一个异常。如果有多个(我不确定这是否可能),假设我想要最大的一个。
OK, second attempt:
好的,第二次尝试:
int unsigned_to_signed(unsigned n)
{
int int_n = static_cast<int>(n);
if (n == static_cast<unsigned>(int_n))
return int_n;
// else do something long and complicated
}
I do not much care about the efficiency when I am not on a typical twos-complement system, since in my humble opinion that is unlikely. And if my code becomes a bottleneck on the omnipresent sign-magnitude systems of 2050, well, I bet someone can figure that out and optimize it then.
当我不在典型的二进制补码系统上时,我不太关心效率,因为在我看来,这不太可能。如果我的代码成为 2050 年无所不在的符号幅度系统的瓶颈,那么,我敢打赌有人可以解决这个问题并对其进行优化。
Now, this second attempt is pretty close to what I want. Although the cast to int
is implementation-defined for some inputs, the cast back to unsigned
is guaranteed by the standard to preserve the value modulo UINT_MAX+1. So the conditional does check exactly what I want, and it will compile into nothing on any system I am likely to encounter.
现在,第二次尝试非常接近我想要的。尽管int
对于某些输入,转换为实现定义,但unsigned
标准保证转换回以保留模 UINT_MAX+1 的值。因此,条件确实会检查我想要的内容,并且在我可能遇到的任何系统上它都不会编译为任何内容。
However... I am still casting to int
without first checking whether it will invoke implementation-defined behavior. On some hypothetical system in 2050 it could do who-knows-what. So let's say I want to avoid that.
但是......我仍然在int
没有首先检查它是否会调用实现定义的行为的情况下进行转换。在 2050 年的某个假设系统中,它可以做谁知道呢。所以假设我想避免这种情况。
Question: What should my "third attempt" look like?
问题:我的“第三次尝试”应该是什么样的?
To recap, I want to:
回顾一下,我想:
- Cast from unsigned int to signed int
- Preserve the value mod UINT_MAX+1
- Invoke only standard-mandated behavior
- Compile into a no-op on a typical twos-complement machine with optimizing compiler
- 从无符号整数转换为有符号整数
- 保留值 mod UINT_MAX+1
- 仅调用标准规定的行为
- 使用优化编译器在典型的二进制补码机器上编译为无操作
[Update]
[更新]
Let me give an example to show why this is not a trivial question.
让我举一个例子来说明为什么这不是一个微不足道的问题。
Consider a hypothetical C++ implementation with the following properties:
考虑具有以下属性的假设 C++ 实现:
sizeof(int)
equals 4sizeof(unsigned)
equals 4INT_MAX
equals 32767INT_MIN
equals -232+ 32768UINT_MAX
equals 232- 1- Arithmetic on
int
is modulo 232(into the rangeINT_MIN
throughINT_MAX
) std::numeric_limits<int>::is_modulo
is true- Casting unsigned
n
to int preserves the value for 0 <= n <= 32767 and yields zerootherwise
sizeof(int)
等于 4sizeof(unsigned)
等于 4INT_MAX
等于 32767INT_MIN
等于 -2 32+ 32768UINT_MAX
等于 2 32- 1- 算术 on
int
是模 2 32(进入范围INT_MIN
通过INT_MAX
) std::numeric_limits<int>::is_modulo
是真的- 将 unsigned 转换
n
为 int 会保留 0 <= n <= 32767 的值,否则产生零
On this hypothetical implementation, there is exactly one int
value congruent (mod UINT_MAX+1) to each unsigned
value. So my question would be well-defined.
在这个假设的实现中,int
每个unsigned
值恰好有一个全等 (mod UINT_MAX+1)值。所以我的问题将是明确的。
I claim that this hypothetical C++ implementation fully conforms to the C++98, C++03, and C++11 specifications. I admit I have not memorized every word of all of them... But I believe I have read the relevant sections carefully. So if you want me to accept your answer, you either must (a) cite a spec that rules out this hypothetical implementation or (b) handle it correctly.
我声称这个假设的 C++ 实现完全符合 C++98、C++03 和 C++11 规范。我承认我没有记住所有的单词……但我相信我已经仔细阅读了相关部分。因此,如果您希望我接受您的答案,您要么必须 (a) 引用排除此假设实现的规范,要么 (b) 正确处理它。
Indeed, a correct answer must handle everyhypothetical implementation permitted by the standard. That is what "invoke only standard-mandated behavior" means, by definition.
事实上,正确的答案必须处理标准允许的每个假设实现。根据定义,这就是“仅调用标准规定的行为”的含义。
Incidentally, note that std::numeric_limits<int>::is_modulo
is utterly useless here for multiple reasons. For one thing, it can be true
even if unsigned-to-signed casts do not work for large unsigned values. For another, it can be true
even on one's-complement or sign-magnitude systems, if arithmetic is simply modulo the entire integer range. And so on. If your answer depends on is_modulo
, it's wrong.
顺便说一句,请注意,std::numeric_limits<int>::is_modulo
出于多种原因,这里完全没用。一方面,true
即使无符号到有符号的强制转换对大的无符号值也不起作用。另一方面,true
如果算术只是对整个整数范围取模,它甚至可以在一个补码或符号大小系统上。等等。如果你的答案取决于is_modulo
,那就错了。
[Update 2]
[更新2]
hvd's answertaught me something: My hypothetical C++ implementation for integers is notpermitted by modern C. The C99 and C11 standards are very specific about the representation of signed integers; indeed, they only permit twos-complement, ones-complement, and sign-magnitude (section 6.2.6.2 paragraph (2); ).
HVD的答案教给我的东西:我的假设C ++的整数实现不通过现代C.的C99和C11标准允许有非常具体符号整数的表示; 实际上,它们只允许二进制补码、一个补码和符号大小(第 6.2.6.2 节第 (2); 节)。
But C++ is not C. As it turns out, this fact lies at the very heart of my question.
但 C++ 不是 C。事实证明,这个事实是我问题的核心。
The original C++98 standard was based on the much older C89, which says (section 3.1.2.5):
最初的 C++98 标准基于更老的 C89,它说(第 3.1.2.5 节):
For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements. The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same.
对于每一种有符号整数类型,都有一个对应的(但不同的)无符号整数类型(用关键字 unsigned 指定),它使用相同的存储量(包括符号信息)并具有相同的对齐要求。有符号整数类型的非负值范围是对应的无符号整数类型的一个子范围,每个类型中相同值的表示是相同的。
C89 says nothing about only having one sign bit or only allowing twos-complement/ones-complement/sign-magnitude.
C89 没有说明只有一个符号位或只允许二进制补码/一个补码/符号大小。
The C++98 standard adopted this language nearly verbatim (section 3.9.1 paragraph (3)):
C++98 标准几乎逐字采用了这种语言(第 3.9.1 节第 (3) 段):
For each of the signed integer types, there exists a corresponding (but different) unsigned integer type: "
unsigned char
", "unsigned short int
", "unsigned int
", and "unsigned long int
", each of which occupies the same amount of storage and has the same alignment requirements (3.9) as the corresponding signed integer type ; that is, each signed integertype has the same object representation as its corresponding unsigned integertype. The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the value representation of each corresponding signed/unsigned type shall be the same.
对于每一种有符号整数类型,都存在相应的(但不同的)无符号整数类型:“
unsigned char
”、“unsigned short int
”、“unsigned int
”和“unsigned long int
”,它们各自占用相同的存储量并具有相同的对齐要求(3.9 ) 作为对应的有符号整数类型;也就是说,每个有符号整数类型都具有与其对应的无符号整数类型相同的对象表示。有符号整数类型的非负值范围是对应的无符号整数类型的子范围,每个对应的有符号/无符号类型的值表示应该相同。
The C++03 standard uses essentially identical language, as does C++11.
C++03 标准使用基本相同的语言,C++11 也是如此。
No standard C++ spec constrains its signed integer representations to any C spec, as far as I can tell. And there is nothing mandating a single sign bit or anything of the kind. All it says is that non-negativesigned integers must be a subrange of the corresponding unsigned.
据我所知,没有标准的 C++ 规范将其有符号整数表示限制为任何 C 规范。并且没有强制要求单个符号位或任何类型的东西。它只是说非负有符号整数必须是相应无符号整数的子范围。
So, again I claim that INT_MAX=32767 with INT_MIN=-232+32768 is permitted. If your answer assumes otherwise, it is incorrect unless you cite a C++standard proving me wrong.
所以,我再次声明 INT_MAX=32767 和 INT_MIN=-2 32+32768 是允许的。如果您的答案另有假设,则除非您引用C++标准证明我是错误的,否则这是不正确的。
采纳答案by Nemo
Expanding on user71404's answer:
扩展 user71404 的回答:
int f(unsigned x)
{
if (x <= INT_MAX)
return static_cast<int>(x);
if (x >= INT_MIN)
return static_cast<int>(x - INT_MIN) + INT_MIN;
throw x; // Or whatever else you like
}
If x >= INT_MIN
(keep the promotion rules in mind, INT_MIN
gets converted to unsigned
), then x - INT_MIN <= INT_MAX
, so this won't have any overflow.
如果x >= INT_MIN
(记住促销规则,INT_MIN
转换为unsigned
),则x - INT_MIN <= INT_MAX
,所以这不会有任何溢出。
If that is not obvious, take a look at the claim "If x >= -4u
, then x + 4 <= 3
.", and keep in mind that INT_MAX
will be equal to at least the mathematical value of -INT_MIN - 1.
如果这不明显,请查看声明“If x >= -4u
, then x + 4 <= 3
.”,并记住它INT_MAX
至少等于 -INT_MIN - 1 的数学值。
On the most common systems, where !(x <= INT_MAX)
implies x >= INT_MIN
, the optimizer should be able (and on my system, is able) to remove the second check, determine that the two return
statements can be compiled to the same code, and remove the first check too. Generated assembly listing:
在最常见的系统上,其中!(x <= INT_MAX)
隐含x >= INT_MIN
,优化器应该能够(在我的系统上,能够)删除第二个检查,确定两个return
语句可以编译为相同的代码,并删除第一个检查。生成的程序集列表:
__Z1fj:
LFB6:
.cfi_startproc
movl 4(%esp), %eax
ret
.cfi_endproc
The hypothetical implementation in your question:
您问题中的假设实现:
- INT_MAX equals 32767
- INT_MIN equals -232+ 32768
- INT_MAX 等于 32767
- INT_MIN 等于 -2 32+ 32768
is not possible, so does not need special consideration. INT_MIN
will be equal to either -INT_MAX
, or to -INT_MAX - 1
. This follows from C's representation of integer types (6.2.6.2), which requires n
bits to be value bits, one bit to be a sign bit, and only allows one single trap representation (not including representations that are invalid because of padding bits), namely the one that would otherwise represent negative zero / -INT_MAX - 1
. C++ doesn't allow any integer representations beyond what C allows.
是不可能的,所以不需要特别考虑。INT_MIN
将等于-INT_MAX
, 或-INT_MAX - 1
。这遵循 C 对整数类型的表示 (6.2.6.2),它要求n
位是值位,一位是符号位,并且只允许一个陷阱表示(不包括由于填充位而无效的表示),即否则将表示负零 / 的那个-INT_MAX - 1
。C++ 不允许任何超出 C 允许的整数表示。
Update: Microsoft's compiler apparently does not notice that x > 10
and x >= 11
test the same thing. It only generates the desired code if x >= INT_MIN
is replaced with x > INT_MIN - 1u
, which it can detect as the negation of x <= INT_MAX
(on this platform).
更新:微软的编译器显然没有注意到这一点,x > 10
并x >= 11
测试了同样的事情。如果x >= INT_MIN
被替换为x > INT_MIN - 1u
,它只会生成所需的代码,它可以将其检测为x <= INT_MAX
(在此平台上)的否定。
[Update from questioner (Nemo), elaborating on our discussion below]
[来自提问者 (Nemo) 的更新,详细说明我们在下面的讨论]
I now believe this answer works in all cases, but for complicated reasons. I am likely to award the bounty to this solution, but I want to capture all the gory details in case anybody cares.
我现在相信这个答案适用于所有情况,但原因很复杂。我可能会为此解决方案提供赏金,但我想记录所有血腥细节,以防有人关心。
Let's start with C++11, section 18.3.3:
让我们从 C++11 的 18.3.3 节开始:
Table 31 describes the header
<climits>
....
The contents are the same as the Standard C library header
<limits.h>
.
表 31 描述了标题
<climits>
。...
内容与标准 C 库头文件相同
<limits.h>
。
Here, "Standard C" means C99, whose specification severely constrains the representation of signed integers. They are just like unsigned integers, but with one bit dedicated to "sign" and zero or more bits dedicated to "padding". The padding bits do not contribute to the value of the integer, and the sign bit contributes only as twos-complement, ones-complement, or sign-magnitude.
这里,“标准 C”是指 C99,它的规范严格限制了有符号整数的表示。它们就像无符号整数,但有一位专用于“符号”,而零位或多位专用于“填充”。填充位不影响整数值,符号位仅作为二进制补码、一补码或符号大小起作用。
Since C++11 inherits the <climits>
macros from C99, INT_MIN is either -INT_MAX or -INT_MAX-1, and hvd's code is guaranteed to work. (Note that, due to the padding, INT_MAX could be much less than UINT_MAX/2... But thanks to the way signed->unsigned casts work, this answer handles that fine.)
由于 C++11 继承了<climits>
C99的宏,INT_MIN 要么是 -INT_MAX 要么是 -INT_MAX-1,保证 hvd 的代码可以工作。(请注意,由于填充,INT_MAX 可能比 UINT_MAX/2 小得多......但由于有符号-> 无符号强制转换的工作方式,这个答案处理得很好。)
C++03/C++98 is trickier. It uses the same wording to inherit <climits>
from "Standard C", but now "Standard C" means C89/C90.
C++03/C++98 比较棘手。它使用相同的措辞继承<climits>
自“标准 C”,但现在“标准 C”意味着 C89/C90。
All of these -- C++98, C++03, C89/C90 -- have the wording I give in my question, but also include this (C++03 section 3.9.1 paragraph 7):
所有这些——C++98、C++03、C89/C90——都有我在我的问题中给出的措辞,但也包括这个(C++03 第 3.9.1 节第 7 段):
The representations of integral types shall define values by use of a pure binary numeration system.(44) [Example: this International Standard permits 2's complement, 1's complement and signed magnitude representations for integral types.]
整数类型的表示应使用纯二进制计数系统定义值。(44) [示例:本国际标准允许整数类型的 2 的补码、1 的补码和带符号的幅度表示。]
Footnote (44) defines "pure binary numeration system":
脚注 (44) 定义了“纯二进制数字系统”:
A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral power of 2, except perhaps for the bit with the highest position.
整数的位置表示,使用二进制数字 0 和 1,其中连续位表示的值是相加的,从 1 开始,然后乘以 2 的连续整数幂,但最高位置的位除外。
What is interesting about this wording is that it contradicts itself, because the definition of "pure binary numeration system" does not permit a sign/magnitude representation! It does allow the high bit to have, say, the value -2n-1(twos complement) or -(2n-1-1) (ones complement). But there is no value for the high bit that results in sign/magnitude.
这个措辞的有趣之处在于它自相矛盾,因为“纯二进制计数系统”的定义不允许符号/大小表示!它确实允许高位具有值 -2 n-1(二进制补码)或 -(2 n-1-1)(一个补码)。但是导致符号/幅度的高位没有值。
Anyway, my "hypothetical implementation" does not qualify as "pure binary" under this definition, so it is ruled out.
无论如何,我的“假设实现”在此定义下不符合“纯二进制”的条件,因此排除在外。
However, the fact that the high bit is special means we can imagine it contributing any value at all: A small positive value, huge positive value, small negative value, or huge negative value. (If the sign bit can contribute -(2n-1-1), why not -(2n-1-2)? etc.)
然而,高位是特殊的这一事实意味着我们可以想象它贡献了任何值:一个小的正值、巨大的正值、小的负值或巨大的负值。(如果符号位可以贡献 -(2 n-1-1),为什么不 -(2 n-1-2)?等)
So, let's imagine a signed integer representation that assigns a wacky value to the "sign" bit.
所以,让我们想象一个有符号整数表示,它为“符号”位分配一个古怪的值。
A small positive value for the sign bit would result in a positive range for int
(possibly as large as unsigned
), and hvd's code handles that just fine.
符号位的一个小的正值会导致int
(可能和 一样大unsigned
)的正范围,并且 hvd 的代码处理得很好。
A huge positive value for the sign bit would result in int
having a maximum larger than unsigned
, which is is forbidden.
符号位的巨大正值将导致int
最大值大于unsigned
,这是被禁止的。
A huge negative value for the sign bit would result in int
representing a non-contiguous range of values, and other wording in the spec rules that out.
符号位的巨大负值将导致int
表示值的非连续范围,规范中的其他措辞将其排除在外。
Finally, how about a sign bit that contributes a small negative quantity? Could we have a 1 in the "sign bit" contribute, say, -37 to the value of the int? So then INT_MAX would be (say) 231-1 and INT_MIN would be -37?
最后,贡献一个小的负数的符号位怎么样?我们是否可以在“符号位”中有一个 1 对 int 的值有贡献,比如说 -37?那么 INT_MAX 将是(比如说)2 31-1 而 INT_MIN 将是 -37?
This would result in some numbers having two representations... But ones-complement gives two representations to zero, and that is allowed according to the "Example". Nowhere does the spec say that zero is the onlyinteger that might have two representations. So I think this new hypothetical is allowed by the spec.
这将导致某些数字具有两种表示形式......但是ones-complement 将两种表示形式归零,根据“示例”,这是允许的。规范中没有任何地方说零是唯一可能有两种表示形式的整数。所以我认为这个新的假设是规范允许的。
Indeed, any negative value from -1 down to -INT_MAX-1
appears to be permissible as a value for the "sign bit", but nothing smaller (lest the range be non-contiguous). In other words, INT_MIN
might be anything from -INT_MAX-1
to -1.
实际上,从 -1 到的任何负值-INT_MAX-1
似乎都可以作为“符号位”的值,但不能更小(以免范围不连续)。换句话说,INT_MIN
可能是从-INT_MAX-1
到 -1 的任何值。
Now, guess what? For the second cast in hvd's code to avoid implementation-defined behavior, we just need x - (unsigned)INT_MIN
less than or equal to INT_MAX
. We just showed INT_MIN
is at least -INT_MAX-1
. Obviously, x
is at most UINT_MAX
. Casting a negative number to unsigned is the same as adding UINT_MAX+1
. Put it all together:
现在,你猜怎么着?对于 hvd 代码中的第二个强制转换以避免实现定义的行为,我们只需要x - (unsigned)INT_MIN
小于或等于INT_MAX
。我们刚刚展示的INT_MIN
是至少-INT_MAX-1
。显然,x
最多是UINT_MAX
。将负数转换为 unsigned 与添加相同UINT_MAX+1
。把它们放在一起:
x - (unsigned)INT_MIN <= INT_MAX
if and only if
当且仅当
UINT_MAX - (INT_MIN + UINT_MAX + 1) <= INT_MAX
-INT_MIN-1 <= INT_MAX
-INT_MIN <= INT_MAX+1
INT_MIN >= -INT_MAX-1
That last is what we just showed, so even in this perverse case, the code actually works.
最后一个是我们刚刚展示的,所以即使在这种反常的情况下,代码实际上也能工作。
That exhausts all of the possibilities, thus ending this extremely academic exercise.
这用尽了所有的可能性,从而结束了这个极其学术的练习。
Bottom line: There is some seriously under-specified behavior for signed integers in C89/C90 that got inherited by C++98/C++03. It is fixed in C99, and C++11 indirectly inherits the fix by incorporating <limits.h>
from C99. But even C++11 retains the self-contradictory "pure binary representation" wording...
底线:C89/C90 中的有符号整数存在一些严重未指定的行为,这些行为被 C++98/C++03 继承。它在 C99 中得到了修复,而 C++11 通过<limits.h>
从 C99合并来间接继承了该修复。但即使是 C++11 也保留了自相矛盾的“纯二进制表示”措辞……
回答by Evgeny Kluev
This code relies only on behavior, mandated by the spec, so requirement (a) is easily satisfied:
此代码仅依赖于规范所要求的行为,因此很容易满足要求 (a):
int unsigned_to_signed(unsigned n)
{
int result = INT_MAX;
if (n > INT_MAX && n < INT_MIN)
throw runtime_error("no signed int for this number");
for (unsigned i = INT_MAX; i != n; --i)
--result;
return result;
}
It's not so easy with requirement (b). This compiles into a no-op with gcc 4.6.3 (-Os, -O2, -O3) and with clang 3.0 (-Os, -O, -O2, -O3). Intel 12.1.0 refuses to optimize this. And I have no info about Visual C.
要求(b)并不那么容易。这与 gcc 4.6.3 (-Os, -O2, -O3) 和 clang 3.0 (-Os, -O, -O2, -O3) 编译成无操作。Intel 12.1.0 拒绝对此进行优化。而且我没有关于 Visual C 的信息。
回答by user71404
You can explicitly tell the compiler what you want to do:
你可以明确地告诉编译器你想做什么:
int unsigned_to_signed(unsigned n) {
if (n > INT_MAX) {
if (n <= UINT_MAX + INT_MIN) {
throw "no result";
}
return static_cast<int>(n + INT_MIN) - (UINT_MAX + INT_MIN + 1);
} else {
return static_cast<int>(n);
}
}
Compiles with gcc 4.7.2
for x86_64-linux
(g++ -O -S test.cpp
) to
用gcc 4.7.2
for x86_64-linux
( g++ -O -S test.cpp
)编译为
_Z18unsigned_to_signedj:
movl %edi, %eax
ret
回答by David Stone
The original answer solved the problem only for unsigned
=> int
. What if we want to solve the general problem of "some unsigned type" to its corresponding signed type? Furthermore, the original answer was excellent at citing sections of the standard and analyzing some corner cases, but it did not really help me get a feel for why it worked, so this answer will try to give a strong conceptual basis. This answer will try to help explain "why", and use modern C++ features to try to simplify the code.
原始答案仅针对unsigned
=>解决了问题int
。如果我们想将“某些无符号类型”的一般问题解决为其对应的有符号类型怎么办?此外,原始答案在引用标准的各个部分和分析一些极端情况方面非常出色,但它并没有真正帮助我了解它的工作原理,因此该答案将尝试提供强大的概念基础。这个答案将尝试帮助解释“为什么”,并使用现代 C++ 特性来尝试简化代码。
C++20 answer
C++20 答案
The problem has simplified dramatically with P0907: Signed Integers are Two's Complementand the final wording P1236that was voted into the C++20 standard. Now, the answer is as simple as possible:
P0907大大简化了这个问题:有符号整数是二进制补码,最终的措辞 P1236被投票纳入 C++20 标准。现在,答案尽可能简单:
template<std::unsigned_integral T>
constexpr auto cast_to_signed_integer(T const value) {
return static_cast<std::make_signed_t<T>>(value);
}
That's it. A static_cast
(or C-style cast) is finally guaranteed to do the thing you need for this question, and the thing many programmers thought it always did.
就是这样。A static_cast
(或 C 风格的强制转换)最终保证可以完成这个问题所需的事情,而且许多程序员认为它一直在做的事情。
C++17 answer
C++17 答案
In C++17, things are much more complicated. We have to deal with three possible integer representations (two's complement, ones' complement, and sign-magnitude). Even in the case where we know it must be two's complement because we checked the range of possible values, the conversion of a value outside the range of the signed integer to that signed integer still gives us an implementation-defined result. We have to use tricks like we have seen in other answers.
在 C++17 中,事情要复杂得多。我们必须处理三种可能的整数表示(二的补码、一的补码和符号大小)。即使在我们知道它必须是二进制补码的情况下,因为我们检查了可能值的范围,将有符号整数范围之外的值转换为该有符号整数仍然会给我们一个实现定义的结果。我们必须使用我们在其他答案中看到的技巧。
First, here is the code for how to solve the problem generically:
首先,这里是如何一般解决问题的代码:
template<typename T, typename = std::enable_if_t<std::is_unsigned_v<T>>>
constexpr auto cast_to_signed_integer(T const value) {
using unsigned_t = std::conditional_t<sizeof(T) <= sizeof(unsigned), unsigned, T>;
using signed_t = std::make_signed_t<unsigned_t>;
using result_t = std::make_signed_t<T>;
using result_limits = std::numeric_limits<result_t>;
if constexpr (result_limits::min() + 1 != -result_limits::max()) {
if (value == static_cast<T>(result_limits::max()) + 1) {
throw std::runtime_error("Cannot convert the maximum possible unsigned to a signed value on this system");
}
}
if (value <= result_limits::max()) {
return static_cast<result_t>(value);
} else {
constexpr auto window = static_cast<signed_t>(result_limits::min());
return static_cast<result_t>( // cast to the type we want the result in
static_cast<signed_t>( // cast back to signed or we end up with our original value
static_cast<T>( // cast to unsigned to force modular reduction
static_cast<unsigned_t>(value) + // cast to avoid promotion to int
static_cast<unsigned_t>(window) // shift values to overlapping range, cast to silence warning
)
) + window // shift values to negative range
);
}
}
This has a few more casts than the accepted answer, and that is to ensure there are no signed / unsigned mismatch warnings from your compiler and to properly handle integer promotion rules.
这比接受的答案有更多的转换,这是为了确保没有来自编译器的有符号/无符号不匹配警告并正确处理整数提升规则。
We first have a special case for systems that are not two's complement (and thus we must handle the maximum possible value specially because it doesn't have anything to map to). After that, we get to the real algorithm.
对于不是二进制补码的系统,我们首先有一个特殊情况(因此我们必须特别处理最大可能值,因为它没有任何映射到)。之后,我们进入真正的算法。
The second top-level condition is straightforward: we know the value is less than or equal to the maximum value, so it fits in the result type. The third condition is a little more complicated even with the comments, so some examples would probably help understand why each statement is necessary.
第二个顶级条件很简单:我们知道该值小于或等于最大值,因此它适合结果类型。即使有注释,第三个条件也有点复杂,因此一些示例可能有助于理解为什么每个语句都是必要的。
Conceptual basis: the number line
概念基础:数轴
First, what is this window
concept? Consider the following number line:
首先,这是什么window
概念?考虑以下数轴:
| signed |
<.........................>
| unsigned |
It turns out that for two's complement integers, you can divide the subset of the number line that can be reached by either type into three equally sized categories:
事实证明,对于二进制补码整数,您可以将任一类型可以到达的数轴子集划分为三个大小相同的类别:
- => signed only
= => both
+ => unsigned only
<..-------=======+++++++..>
This can be easily proven by considering the representation. An unsigned integer starts at 0
and uses all of the bits to increase the value in powers of 2. A signed integer is exactly the same for all of the bits except the sign bit, which is worth -(2^position)
instead of 2^position
. This means that for all n - 1
bits, they represent the same values. Then, unsigned integers have one more normal bit, which doubles the total number of values (in other words, there are just as many values with that bit set as without it set). The same logic holds for signed integers, except that all the values with that bit set are negative.
通过考虑表示可以很容易地证明这一点。一个无符号整数从 开始0
并使用所有位来增加值的 2 的幂。一个有符号整数对于除符号位之外的所有位都是完全相同的,它是值-(2^position)
而不是2^position
。这意味着对于所有n - 1
位,它们代表相同的值。然后,无符号整数多了一个正常位,这使值的总数加倍(换句话说,设置了该位的值与未设置它的值一样多)。相同的逻辑适用于有符号整数,除了具有该位设置的所有值都是负数。
The other two legal integer representations, ones' complement and sign-magnitude, have all of the same values as two's complement integers except for one: the most negative value. C++ defines everything about integer types, except for reinterpret_cast
(and the C++20 std::bit_cast
), in terms of the range of representable values, not in terms of the bit representation. This means that our analysis will hold for each of these three representations as long as we do not ever try to create the trap representation. The unsigned value that would map to this missing value is a rather unfortunate one: the one right in the middle of the unsigned values. Fortunately, our first condition checks (at compile time) whether such a representation exists, and then handles it specially with a runtime check.
其他两个合法的整数表示形式,1 的补码和符号大小,除了一个:最负的值外,所有的值都与 2 的补码整数相同。C++根据可表示值的范围,而不是根据位表示,定义了关于整数类型的所有内容,除了reinterpret_cast
(和 C++20 std::bit_cast
)。这意味着只要我们不尝试创建陷阱表示,我们的分析将适用于这三种表示中的每一种。映射到这个缺失值的无符号值是一个相当不幸的值:位于无符号值中间的那个。幸运的是,我们的第一个条件(在编译时)检查这样的表示是否存在,然后通过运行时检查专门处理它。
The window
variable in the code is the size of each of these segments (we use the negative value so that it can be represented as a signed integer). The first condition handles the case where we are in the =
section, which means that we are in the overlapping region where the values in one can be represented in the other without change. If we are outside of that region (we are in the +
region), we need to jump down by one window size. This puts us in the overlapping range, which means we can safely convert from unsigned to signed because there is no change in value. However, we are not done yet because we have mapped two unsigned values to each signed value. Therefore, we need to shift down to the next window (the -
region) so that we have a unique mapping again.
window
代码中的变量是每个段的大小(我们使用负值以便它可以表示为有符号整数)。第一个条件处理我们在=
部分中的情况,这意味着我们处于重叠区域,其中一个值可以在另一个中表示而无需更改。如果我们在该区域之外(我们在该区+
域内),我们需要向下跳一个窗口大小。这使我们处于重叠范围内,这意味着我们可以安全地从无符号转换为有符号,因为值没有变化。然而,我们还没有完成,因为我们已经将两个无符号值映射到每个有符号值。因此,我们需要向下移动到下一个窗口(-
区域),以便我们再次拥有唯一的映射。
Now, does this give us a result congruent mod UINT_MAX + 1
, as requested in the question? UINT_MAX + 1
is equivalent to 2^n
, where n
is the number of bits in the value representation. The value we use for our window size is equal to 2^(n - 1)
(the final index in a sequence of values is one less than the size). We subtract that value twice, which means we subtract 2 * 2^(n - 1)
which is equal to 2^n
. Adding and subtracting x
is a no-op in arithmetic mod x
, so we have not affected the original value mod 2^n
.
现在,这是否UINT_MAX + 1
按照问题的要求为我们提供了结果一致的 mod ?UINT_MAX + 1
等价于2^n
,其中n
是值表示中的位数。我们用于窗口大小的值等于2^(n - 1)
(值序列中的最终索引比大小小 1)。我们减去该值两次,这意味着我们减去2 * 2^(n - 1)
等于2^n
。加法和减法x
在算术 mod 中是无操作的x
,所以我们没有影响原始值 mod 2^n
。
Properly handling integer promotions
正确处理整数促销
Because this is a generic function and not just int
and unsigned
, we also have to concern ourselves with integral promotion rules. There are two possibly interesting cases: one in which short
is smaller than int
and one in which short
is the same size as int
.
因为这是一个通用函数而不仅仅是int
and unsigned
,我们还必须关注积分提升规则。有两种可能有趣的情况:一种short
是小于int
,另一种short
是与 相同的大小int
。
Example: short
smaller than int
示例:short
小于int
If short
is smaller than int
(common on modern platforms) then we also know that unsigned short
can fit in an int
, which means that any operations on it will actually happen in int
, so we explicitly cast to the promoted type to avoid this. Our final statement is pretty abstract and becomes easier to understand if we substitute in real values. For our first interesting case, with no loss of generality let us consider a 16-bit short
and a 17-bit int
(which is still allowed under the new rules, and would just mean that at least one of those two integer types have some padding bits):
如果short
小于int
(在现代平台上很常见),那么我们也知道unsigned short
可以放入 an 中int
,这意味着对它的任何操作实际上都将在 中发生int
,因此我们显式转换为提升的类型以避免这种情况。我们的最终陈述非常抽象,如果我们替换为实际值,就会更容易理解。对于我们的第一个有趣的案例,不失一般性,让我们考虑 16 位short
和 17 位int
(在新规则下这仍然是允许的,这只是意味着这两种整数类型中至少有一个有一些填充位):
constexpr auto window = static_cast<int17_t>(result_limits::min());
return static_cast<int16_t>(
static_cast<int17_t>(
static_cast<uint16_t>(
static_cast<uint17_t>(value) +
static_cast<uint17_t>(window)
)
) + window
);
Solving for the greatest possible 16-bit unsigned value
求解最大可能的 16 位无符号值
constexpr auto window = int17_t(-32768);
return int16_t(
int17_t(
uint16_t(
uint17_t(65535) +
uint17_t(window)
)
) + window
);
Simplifies to
简化为
return int16_t(
int17_t(
uint16_t(
uint17_t(65535) +
uint17_t(32768)
)
) +
int17_t(-32768)
);
Simplifies to
简化为
return int16_t(
int17_t(
uint16_t(
uint17_t(98303)
)
) +
int17_t(-32768)
);
Simplifies to
简化为
return int16_t(
int17_t(
uint16_t(32767)
) +
int17_t(-32768)
);
Simplifies to
简化为
return int16_t(-1);
We put in the largest possible unsigned and get back -1
, success! We can see in each of these steps how something bad would happen if we did not have each cast in place.
We put in the largest possible unsigned and get back -1
, success! We can see in each of these steps how something bad would happen if we did not have each cast in place.
Example: short
same size as int
Example: short
same size as int
If short
is the same size as int
(uncommon on modern platforms), the integral promotion rule are slightly different. In this case, short
promotes to int
and unsigned short
promotes to unsigned
. Fortunately, we explicitly cast each result to the type we want to do the calculation in, so we end up with no problematic promotions. With no loss of generality let us consider a 16-bit short
and a 16-bit int
:
If short
is the same size as int
(uncommon on modern platforms), the integral promotion rule are slightly different. In this case, short
promotes to int
and unsigned short
promotes to unsigned
. Fortunately, we explicitly cast each result to the type we want to do the calculation in, so we end up with no problematic promotions. With no loss of generality let us consider a 16-bit short
and a 16-bit int
:
constexpr auto window = static_cast<int16_t>(result_limits::min());
return static_cast<int16_t>(
static_cast<int16_t>(
static_cast<uint16_t>(
static_cast<uint16_t>(value) +
static_cast<uint16_t>(window)
)
) + window
);
Solving for the greatest possible 16-bit unsigned value
Solving for the greatest possible 16-bit unsigned value
return int16_t(
int16_t(
uint16_t(
uint16_t(65535) +
uint16_t(32768)
)
) +
int16_t(-32768)
);
Simplifies to
Simplifies to
return int16_t(
int16_t(32767) + int16_t(-32768)
);
Simplifies to
Simplifies to
return int16_t(-1);
We put in the largest possible unsigned and get back -1
, success!
We put in the largest possible unsigned and get back -1
, success!
What if I just care about int
and unsigned
and don't care about warnings, like the original question?
What if I just care about int
and unsigned
and don't care about warnings, like the original question?
constexpr int cast_to_signed_integer(unsigned const value) {
using result_limits = std::numeric_limits<int>;
if constexpr (result_limits::min() + 1 != -result_limits::max()) {
if (value == static_cast<unsigned>(result_limits::max()) + 1) {
throw std::runtime_error("Cannot convert the maximum possible unsigned to a signed value on this system");
}
}
if (value <= result_limits::max()) {
return static_cast<int>(value);
} else {
constexpr int window = result_limits::min();
return static_cast<int>(value + window) + window;
}
}
See it live
See it live
Here we see that clang, gcc, and icc generate no code at -O2
and -O3
, and MSVC generates no code at /O2
, so the solution is optimal.
Here we see that clang, gcc, and icc generate no code at -O2
and -O3
, and MSVC generates no code at /O2
, so the solution is optimal.
回答by Yakk - Adam Nevraumont
If x
is our input...
If x
is our input...
If x > INT_MAX
, we want to find a constant k
such that 0
< x - k*INT_MAX
< INT_MAX
.
If x > INT_MAX
, we want to find a constant k
such that 0
< x - k*INT_MAX
< INT_MAX
.
This is easy -- unsigned int k = x / INT_MAX;
. Then, let unsigned int x2 = x - k*INT_MAX;
This is easy -- unsigned int k = x / INT_MAX;
. Then, let unsigned int x2 = x - k*INT_MAX;
We can now cast x2
to int
safely. Let int x3 = static_cast<int>(x2);
We can now cast x2
to int
safely. Let int x3 = static_cast<int>(x2);
We now want to subtract something like UINT_MAX - k * INT_MAX + 1
from x3
, if k > 0
.
We now want to subtract something like UINT_MAX - k * INT_MAX + 1
from x3
, if k > 0
.
Now, on a 2s complement system, so long as x > INT_MAX
, this works out to:
Now, on a 2s complement system, so long as x > INT_MAX
, this works out to:
unsigned int k = x / INT_MAX;
x -= k*INT_MAX;
int r = int(x);
r += k*INT_MAX;
r -= UINT_MAX+1;
Note that UINT_MAX+1
is zero in C++ guaranteed, the conversion to int was a noop, and we subtracted k*INT_MAX
then added it back on "the same value". So an acceptable optimizer should be able to erase all that tomfoolery!
Note that UINT_MAX+1
is zero in C++ guaranteed, the conversion to int was a noop, and we subtracted k*INT_MAX
then added it back on "the same value". So an acceptable optimizer should be able to erase all that tomfoolery!
That leaves the problem of x > INT_MAX
or not. Well, we create 2 branches, one with x > INT_MAX
, and one without. The one without does a strait cast, which the compiler optimizes to a noop. The one with ... does a noop after the optimizer is done. The smart optimizer realizes both branches to the same thing, and drops the branch.
That leaves the problem of x > INT_MAX
or not. Well, we create 2 branches, one with x > INT_MAX
, and one without. The one without does a strait cast, which the compiler optimizes to a noop. The one with ... does a noop after the optimizer is done. The smart optimizer realizes both branches to the same thing, and drops the branch.
Issues: if UINT_MAX
is really large relative to INT_MAX
, the above might not work. I am assuming that k*INT_MAX <= UINT_MAX+1
implicitly.
Issues: if UINT_MAX
is really large relative to INT_MAX
, the above might not work. I am assuming that k*INT_MAX <= UINT_MAX+1
implicitly.
We could probably attack this with some enums like:
We could probably attack this with some enums like:
enum { divisor = UINT_MAX/INT_MAX, remainder = UINT_MAX-divisor*INT_MAX };
which work out to 2 and 1 on a 2s complement system I believe (are we guaranteed for that math to work? That's tricky...), and do logic based on these that easily optimize away on non-2s complement systems...
which work out to 2 and 1 on a 2s complement system I believe (are we guaranteed for that math to work? That's tricky...), and do logic based on these that easily optimize away on non-2s complement systems...
This also opens up the exception case. It is only possible if UINT_MAX is much larger than (INT_MIN-INT_MAX), so you can put your exception code in an if block asking exactly that question somehow, and it won't slow you down on a traditional system.
This also opens up the exception case. It is only possible if UINT_MAX is much larger than (INT_MIN-INT_MAX), so you can put your exception code in an if block asking exactly that question somehow, and it won't slow you down on a traditional system.
I'm not exactly sure how to construct those compile-time constants to deal correctly with that.
I'm not exactly sure how to construct those compile-time constants to deal correctly with that.
回答by Cheers and hth. - Alf
std::numeric_limits<int>::is_modulo
is a compile time constant. so you can use it for template specialization. problem solved, at least if compiler plays along with inlining.
std::numeric_limits<int>::is_modulo
is a compile time constant. so you can use it for template specialization. problem solved, at least if compiler plays along with inlining.
#include <limits>
#include <stdexcept>
#include <string>
#ifdef TESTING_SF
bool const testing_sf = true;
#else
bool const testing_sf = false;
#endif
// C++ "extensions"
namespace cppx {
using std::runtime_error;
using std::string;
inline bool hopefully( bool const c ) { return c; }
inline bool throw_x( string const& s ) { throw runtime_error( s ); }
} // namespace cppx
// C++ "portability perversions"
namespace cppp {
using cppx::hopefully;
using cppx::throw_x;
using std::numeric_limits;
namespace detail {
template< bool isTwosComplement >
int signed_from( unsigned const n )
{
if( n <= unsigned( numeric_limits<int>::max() ) )
{
return static_cast<int>( n );
}
unsigned const u_max = unsigned( -1 );
unsigned const u_half = u_max/2 + 1;
if( n == u_half )
{
throw_x( "signed_from: unsupported value (negative max)" );
}
int const i_quarter = static_cast<int>( u_half/2 );
int const int_n1 = static_cast<int>( n - u_half );
int const int_n2 = int_n1 - i_quarter;
int const int_n3 = int_n2 - i_quarter;
hopefully( n == static_cast<unsigned>( int_n3 ) )
|| throw_x( "signed_from: range error" );
return int_n3;
}
template<>
inline int signed_from<true>( unsigned const n )
{
return static_cast<int>( n );
}
} // namespace detail
inline int signed_from( unsigned const n )
{
bool const is_modulo = numeric_limits< int >::is_modulo;
return detail::signed_from< is_modulo && !testing_sf >( n );
}
} // namespace cppp
#include <iostream>
using namespace std;
int main()
{
int const x = cppp::signed_from( -42u );
wcout << x << endl;
}
EDITEDIT: Fixed up code to avoid possible trap on non-modular-int machines (only one is known to exist, namely the archaically configured versions of the Unisys Clearpath). For simplicity this is done by not supporting the value -2n-1n-1where nnis the number of
int
int
value bits, on such machine (i.e., on the Clearpath). in practice this value will not be supported by the machine either (i.e., with sign-and-magnitude or 1’s complement representation).回答by Cheers and hth. - Alf
I think the int type is at least two bytes, so the INT_MIN and INT_MAX may change in different platforms.
I think the int type is at least two bytes, so the INT_MIN and INT_MAX may change in different platforms.
回答by Someone
My money is on using memcpy. Any decent compiler knows to optimise it away:
My money is on using memcpy. Any decent compiler knows to optimise it away:
#include <stdio.h>
#include <memory.h>
#include <limits.h>
static inline int unsigned_to_signed(unsigned n)
{
int result;
memcpy( &result, &n, sizeof(result));
return result;
}
int main(int argc, const char * argv[])
{
unsigned int x = UINT_MAX - 1;
int xx = unsigned_to_signed(x);
return xx;
}
For me (Xcode 8.3.2, Apple LLVM 8.1, -O3), that produces:
For me (Xcode 8.3.2, Apple LLVM 8.1, -O3), that produces:
_main: ## @main
Lfunc_begin0:
.loc 1 21 0 ## /Users/Someone/main.c:21:0
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp0:
.cfi_def_cfa_offset 16
Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp2:
.cfi_def_cfa_register %rbp
##DEBUG_VALUE: main:argc <- %EDI
##DEBUG_VALUE: main:argv <- %RSI
Ltmp3:
##DEBUG_VALUE: main:x <- 2147483646
##DEBUG_VALUE: main:xx <- 2147483646
.loc 1 24 5 prologue_end ## /Users/Someone/main.c:24:5
movl $-2, %eax
popq %rbp
retq
Ltmp4:
Lfunc_end0:
.cfi_endproc