检测 C/C++ 中的有符号溢出

Question

提问by Channel72

At first glance, this question may seem like a duplicate of How to detect integer overflow?, however it is actually significantly different.

乍一看，这个问题似乎与如何检测整数溢出？，但实际上却大不相同。

I've found that while detecting an unsigned integer overflow is pretty trivial, detecting a signedoverflow in C/C++ is actually more difficult than most people think.

我发现虽然检测无符号整数溢出非常简单，但在 C/C++ 中检测有符号溢出实际上比大多数人想象的要困难。

The most obvious, yet naive, way to do it would be something like:

最明显但最幼稚的方法是：

int add(int lhs, int rhs)
{
 int sum = lhs + rhs;
 if ((lhs >= 0 && sum < rhs) || (lhs < 0 && sum > rhs)) {
  /* an overflow has occurred */
  abort();
 }
 return sum; 
}

The problem with this is that according to the C standard, signed integer overflow is undefined behavior.In other words, according to the standard, as soon as you even cause a signed overflow, your program is just as invalid as if you dereferenced a null pointer. So you can't cause undefined behavior, and then try to detect the overflow after the fact, as in the above post-condition check example.

问题在于，根据 C 标准，有符号整数溢出是未定义的行为。换句话说，根据标准，只要您甚至导致有符号溢出，您的程序就与取消引用空指针一样无效。所以你不能导致未定义的行为，然后尝试在事后检测溢出，如上面的后置条件检查示例。

Even though the above check is likely to work on many compilers, you can't count on it. In fact, because the C standard says signed integer overflow is undefined, some compilers (like GCC) will optimize away the above checkwhen optimization flags are set, because the compiler assumes a signed overflow is impossible. This totally breaks the attempt to check for overflow.

尽管上述检查可能适用于许多编译器，但您不能指望它。事实上，因为 C 标准说有符号整数溢出是未定义的，所以一些编译器（如 GCC）会在设置优化标志时优化掉上述检查，因为编译器假设有符号溢出是不可能的。这完全打破了检查溢出的尝试。

So, another possible way to check for overflow would be:

因此，检查溢出的另一种可能方法是：

int add(int lhs, int rhs)
{
 if (lhs >= 0 && rhs >= 0) {
  if (INT_MAX - lhs <= rhs) {
   /* overflow has occurred */
   abort();
  }
 }
 else if (lhs < 0 && rhs < 0) {
  if (lhs <= INT_MIN - rhs) {
   /* overflow has occurred */
   abort();
  }
 }

 return lhs + rhs;
}

This seems more promising, since we don't actually add the two integers together until we make sure in advance that performing such an add will not result in overflow. Thus, we don't cause any undefined behavior.

这似乎更有希望，因为我们实际上不会将两个整数相加，直到我们提前确保执行这样的相加不会导致溢出。因此，我们不会导致任何未定义的行为。

However, this solution is unfortunately a lot less efficient than the initial solution, since you have to perform a subtract operation just to test if your addition operation will work. And even if you don't care about this (small) performance hit, I'm still not entirely convinced this solution is adequate. The expression lhs <= INT_MIN - rhsseems exactly like the sort of expression the compiler might optimize away, thinking that signed overflow is impossible.

但是，不幸的是，此解决方案的效率远低于初始解决方案，因为您必须执行减法运算才能测试加法运算是否有效。即使你不关心这个（小）性能损失，我仍然不完全相信这个解决方案是足够的。该表达式lhs <= INT_MIN - rhs似乎与编译器可能优化掉的那种表达式完全一样，认为有符号溢出是不可能的。

So is there a better solution here? Something that is guaranteed to 1) not cause undefined behavior, and 2) not provide the compiler with an opportunity to optimize away overflow checks? I was thinking there might be some way to do it by casting both operands to unsigned, and performing checks by rolling your own two's-complement arithmetic, but I'm not really sure how to do that.

那么这里有更好的解决方案吗？可以保证 1) 不会导致未定义的行为，以及 2) 不会为编译器提供优化溢出检查的机会？我在想可能有某种方法可以通过将两个操作数强制转换为无符号，并通过滚动您自己的二进制补码算法来执行检查，但我不确定如何做到这一点。

Answer 1

采纳答案by R.. GitHub STOP HELPING ICE

Your approach with subtraction is correct and well-defined. A compiler cannot optimize it away.

您的减法方法是正确且定义明确的。编译器无法优化它。

Another correct approach, if you have a larger integer type available, is to perform the arithmetic in the larger type and then check that the result fits in the smaller type when converting it back

另一种正确的方法是，如果您有更大的整数类型可用，则在较大的类型中执行算术，然后在将其转换回来时检查结果是否适合较小的类型

int sum(int a, int b)
{
    long long c;
    assert(LLONG_MAX>INT_MAX);
    c = (long long)a + b;
    if (c < INT_MIN || c > INT_MAX) abort();
    return c;
}

A good compiler should convert the entire addition and ifstatement into an int-sized addition and a single conditional jump-on-overflow and never actually perform the larger addition.

一个好的编译器应该将整个加法和if语句转换为一个int大小的加法和一个单一的条件跳转溢出，并且永远不会实际执行更大的加法。

Edit:As Stephen pointed out, I'm having trouble getting a (not-so-good) compiler, gcc, to generate the sane asm. The code it generates is not terribly slow, but certainly suboptimal. If anyone knows variants on this code that will get gcc to do the right thing, I'd love to see them.

编辑：正如斯蒂芬指出的那样，我在使用（不太好的）编译器 gcc 来生成正常的 asm 时遇到了麻烦。它生成的代码并不是很慢，但肯定不是最理想的。如果有人知道此代码的变体可以让 gcc 做正确的事情，我很乐意看到它们。

Answer 2

回答by Jens Gustedt

No, your 2nd code isn't correct, but you are close: if you set

不，你的第二个代码不正确，但你很接近：如果你设置

int half = INT_MAX/2;
int half1 = half + 1;

the result of an addition is INT_MAX. (INT_MAXis always an odd number). So this is valid input. But in your routine you will have INT_MAX - half == half1and you would abort. A false positive.

加法的结果是INT_MAX。（INT_MAX总是奇数）。所以这是有效的输入。但是在您的日常工作中，您将拥有INT_MAX - half == half1并且您将中止。一个误报。

This error can be repaired by putting <instead of <=in both checks.

可以通过放入<而不是<=放入两个检查来修复此错误。

But then also your code isn't optimal. The following would do:

但是，您的代码也不是最佳的。将执行以下操作：

int add(int lhs, int rhs)
{
 if (lhs >= 0) {
  if (INT_MAX - lhs < rhs) {
   /* would overflow */
   abort();
  }
 }
 else {
  if (rhs < INT_MIN - lhs) {
   /* would overflow */
   abort();
  }
 }
 return lhs + rhs;
}

To see that this is valid, you have to symbolically add lhson both sides of the inequalities, and this gives you exactly the arithmetical conditions that your result is out of bounds.

为了证明这是有效的，您必须lhs在不等式的两边象征性地相加，这为您提供了结果超出界限的算术条件。

Answer 3

回答by JaredPar

IMHO, the eastiest way to deal with overflow sentsitive C++ code is to use SafeInt<T>. This is a cross platform C++ template hosted on code plex which provides the safety guarantees that you desire here.

恕我直言，处理溢出敏感 C++ 代码的最东方式是使用SafeInt<T>. 这是一个托管在 code plex 上的跨平台 C++ 模板，它提供了您想要的安全保证。

http://safeint.codeplex.com/

http://safeint.codeplex.com/

I find it very intuitive to use as it provides the many of the same usage patterns as normal numerical opertations and expresses over and under flows via exceptions.

我发现它使用起来非常直观，因为它提供了许多与普通数值运算相同的使用模式，并通过异常表示上流和下流。

Answer 4

回答by Shafik Yaghmour

For the gcc case, from gcc 5.0 Release noteswe can see it now provides a __builtin_add_overflowfor checking overflow in addition:

对于 gcc 的情况，从gcc 5.0 Release notes我们可以看到它现在还提供了一个__builtin_add_overflow用于检查溢出的方法：

A new set of built-in functions for arithmetics with overflow checking has been added: __builtin_add_overflow, __builtin_sub_overflow and __builtin_mul_overflow and for compatibility with clang also other variants. These builtins have two integral arguments (which don't need to have the same type), the arguments are extended to infinite precision signed type, +, - or * is performed on those, and the result is stored in an integer variable pointed to by the last argument. If the stored value is equal to the infinite precision result, the built-in functions return false, otherwise true. The type of the integer variable that will hold the result can be different from the types of the first two arguments.

添加了一组新的内置函数，用于带有溢出检查的算术：__builtin_add_overflow、__builtin_sub_overflow 和 __builtin_mul_overflow 以及与 clang 和其他变体的兼容性。这些内置函数有两个整型参数（它们不需要具有相同的类型），这些参数被扩展为无限精度有符号类型，对它们执行 +、- 或 *，并将结果存储在指向的整数变量中通过最后一个论点。如果存储的值等于无限精度结果，则内置函数返回 false，否则返回 true。保存结果的整数变量的类型可能与前两个参数的类型不同。

For example:

例如：

__builtin_add_overflow( rhs, lhs, &result )

We can see from the gcc document Built-in Functions to Perform Arithmetic with Overflow Checkingthat:

我们可以从 gcc 文档内建函数以溢出检查执行算术运算中看到：

[...]these built-in functions have fully defined behavior for all argument values.

[...] 这些内置函数对所有参数值都有完全定义的行为。

clang also provides a set of checked arithmetic builtins:

clang 还提供了一组经过检查的算术内置函数：

Clang provides a set of builtins that implement checked arithmetic for security critical applications in a manner that is fast and easily expressable in C.

Clang 提供了一组内置函数，它们以一种快速且易于用 C 表达的方式为安全关键应用程序实现检查算法。

in this case the builtin would be:

在这种情况下，内置将是：

__builtin_sadd_overflow( rhs, lhs, &result )

Answer 5

回答by rook

If you use inline assembler you can check the overflow flag. Another possibility is taht you can use a safeint datatype. I recommend that read this paper on Integer Security.

如果您使用内联汇编器，您可以检查溢出标志。另一种可能性是您可以使用safeint 数据类型。我建议阅读这篇关于Integer Security 的论文。

Answer 6

回答by tbodt

The fastest possible way is to use the GCC builtin:

最快的方法是使用 GCC 内置：

int add(int lhs, int rhs) {
    int sum;
    if (__builtin_add_overflow(lhs, rhs, &sum))
        abort();
    return sum;
}

On x86, GCC compiles this into:

在 x86 上，GCC 将其编译为：

    mov %edi, %eax
    add %esi, %eax
    jo call_abort 
    ret
call_abort:
    call abort

which uses the processor's built-in overflow detection.

它使用处理器的内置溢出检测。

If you're not OK with using GCC builtins, the next fastest way is to use bit operations on the sign bits. Signed overflow in addition occurs when:

如果您对使用 GCC 内置函数不满意，下一个最快的方法是对符号位使用位操作。在以下情况下还会发生有符号溢出：

the two operands have the same sign, and
the result has a different sign than the operands.

两个操作数具有相同的符号，并且
结果与操作数的符号不同。

The sign bit of ~(lhs ^ rhs)is on iff the operands have the same sign, and the sign bit of lhs ^ sumis on iff the result has a different sign than the operands. So you can do the addition in unsigned form to avoid undefined behavior, and then use the sign bit of ~(lhs ^ rhs) & (lhs ^ sum):

的符号位~(lhs ^ rhs)在当操作数具有相同的符号时，并且的符号位lhs ^ sum在当当结果与操作数的符号不同时。因此，您可以以无符号形式进行加法以避免未定义的行为，然后使用的符号位~(lhs ^ rhs) & (lhs ^ sum)：

int add(int lhs, int rhs) {
    unsigned sum = (unsigned) lhs + (unsigned) rhs;
    if ((~(lhs ^ rhs) & (lhs ^ sum)) & 0x80000000)
        abort();
    return (int) sum;
}

This compiles into:

这编译成：

    lea (%rsi,%rdi), %eax
    xor %edi, %esi
    not %esi
    xor %eax, %edi
    test %edi, %esi
    js call_abort
    ret
call_abort:
    call abort

which is quite a lot faster than casting to a 64-bit type on a 32-bit machine (with gcc):

这比在 32 位机器（使用 gcc）上转换为 64 位类型要快得多：

    push %ebx
    mov 12(%esp), %ecx
    mov 8(%esp), %eax
    mov %ecx, %ebx
    sar , %ebx
    clt
    add %ecx, %eax
    adc %ebx, %edx
    mov %eax, %ecx
    add $-2147483648, %ecx
    mov %edx, %ebx
    adc #include <stdint.h>

...

int64_t sum = (int64_t)lhs + (int64_t)rhs;
if (sum < INT_MIN || sum > INT_MAX) {
    // Overflow occurred!
}
else {
    return sum;
}
, %ebx
    cmp int sum(int n1, int n2)
{
  int result;
  if (n1 >= 0)
  {
    result = (n1 - INT_MAX)+n2; /* Can't overflow */
    if (result > 0) return INT_MAX; else return (result + INT_MAX);
  }
  else
  {
    result = (n1 - INT_MIN)+n2; /* Can't overflow */
    if (0 > result) return INT_MIN; else return (result + INT_MIN);
  }
}
, %ebx
    ja call_abort
    pop %ebx
    ret
call_abort:
    call abort

Answer 7

回答by Jonathan

You may have better luck converting to 64-bit integers and testing similar conditions like that. For example:

您可能会更幸运地转换为 64 位整数并测试类似的条件。例如：

int add(int lhs, int rhs) 
{ 
   int sum = (unsigned)lhs + (unsigned)rhs; 
   if ((lhs >= 0 && sum < rhs) || (lhs < 0 && sum > rhs)) { 
      /* an overflow has occurred */ 
      abort(); 
   } 
   return sum;  
}

You may want to take a closer look at how sign extension will work here, but I think it is correct.

你可能想仔细看看符号扩展在这里是如何工作的，但我认为这是正确的。

Answer 8

回答by supercat

How about:

怎么样：

static_assert(sizeof(long) == 2*sizeof(int), "");
long a, b;
int ai[2] = {int(a), int(a >> (8*sizeof(int)))};
int bi[2] = {int(b), int(b >> (8*sizeof(int))});
... use the 'long' type to add the elements of 'ai' and 'bi'

I think that should work for any legitimate INT_MINand INT_MAX(symmetrical or not); the function as shown clips, but it should be obvious how to get other behaviors).

我认为这应该适用于任何合法INT_MIN和INT_MAX（对称与否）；功能如所示剪辑，但应该很明显如何获得其他行为）。

Answer 9

回答by Chris Dodd

The obvious solution is to convert to unsigned, to get the well-defined unsigned overflow behavior:

显而易见的解决方案是转换为无符号，以获得明确定义的无符号溢出行为：

long a, b;
bool overflow;
#ifdef __amd64__
    asm (
        "addq %2, %0; seto %1"
        : "+r" (a), "=ro" (overflow)
        : "ro" (b)
    );
#else
    #error "unsupported CPU"
#endif
if(overflow) ...
// The result is stored in variable 'a'

This replaces the undefined signed overflow behavior with the implementation-defined conversion of out-of-range values between signed and unsigned, so you need to check your compiler's documentation to know exactly what will happen, but it should at least be well defined, and should do the right thing on any twos-complement machine that doesn't raise signals on conversions, which is pretty much every machine and C compiler built in the last 20 years.

这用实现定义的有符号和无符号之间的超出范围值的转换替换了未定义的有符号溢出行为，因此您需要检查编译器的文档以确切了解会发生什么，但它至少应该被很好地定义，并且应该在任何不会产生转换信号的二进制补码机器上做正确的事情，这几乎是过去 20 年中构建的每台机器和 C 编译器。

Answer 10

回答by atomsymbol

In case of adding two longvalues, portable code can split the longvalue into low and high intparts (or into shortparts in case longhas the same size as int):

在添加两个long值的情况下，可移植代码可以将long值拆分为低和高int部分（或大小与相同的short部分）：longint

##代码##

Using inline assembly is the fastest way if targeting a particular CPU:

如果针对特定 CPU，使用内联汇编是最快的方法：

##代码##

检测 C/C++ 中的有符号溢出

提问by Channel72

采纳答案by R.. GitHub STOP HELPING ICE

回答by Jens Gustedt

回答by JaredPar

回答by Shafik Yaghmour

回答by rook

回答by tbodt

回答by Jonathan

回答by supercat

回答by Chris Dodd

回答by atomsymbol

相关推荐

最近更新

标签

检测 C/C++ 中的有符号溢出

提问by Channel72

采纳答案by R.. GitHub STOP HELPING ICE

回答by Jens Gustedt

回答by JaredPar

回答by Shafik Yaghmour

回答by rook

回答by tbodt

回答by Jonathan

回答by supercat

回答by Chris Dodd

回答by atomsymbol

相关推荐

|= 在 C++ 中是什么意思

C++ 逐行读取文件，然后使用分隔符分割每一行

C++ 在路径中找不到 Eclipse g++：windows

C++中的整数字节交换

相关推荐

最近更新

标签