C++ 无符号与有符号整数的性能

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4712315/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 16:15:04  来源:igfitidea点击:

performance of unsigned vs signed integers

c++cintegerintunsigned

提问by Flexo

Is there any performance gain/loss by using unsigned integers over signed integers?

在有符号整数上使用无符号整数是否有任何性能增益/损失?

If so, does this goes for short and long as well?

如果是这样,这是否也适用于短期和长期?

回答by fredoverflow

Division by powers of 2 is faster with unsigned int, because it can be optimized into a single shift instruction. With signed int, it usually requires more machine instructions, because division rounds towards zero, but shifting to the right rounds down. Example:

使用 2 的幂除法更快unsigned int,因为它可以优化为单个移位指令。使用signed int,它通常需要更多的机器指令,因为除法向零舍入,但向右移动向下舍入。例子:

int foo(int x, unsigned y)
{
    x /= 8;
    y /= 8;
    return x + y;
}

Here is the relevant xpart (signed division):

这是相关x部分(签名部门):

movl 8(%ebp), %eax
leal 7(%eax), %edx
testl %eax, %eax
cmovs %edx, %eax
sarl , %eax

And here is the relevant ypart (unsigned division):

这是相关y部分(无符号除法):

movl 12(%ebp), %edx
shrl , %edx

回答by kbjorklu

In C++ (and C), signed integer overflow is undefined, whereas unsigned integer overflow is defined to wrap around. Notice that e.g. in gcc, you can use the -fwrapv flag to make signed overflow defined (to wrap around).

在 C++(和 C)中,有符号整数溢出是未定义的,而无符号整数溢出被定义为环绕。请注意,例如在 gcc 中,您可以使用 -fwrapv 标志来定义有符号溢出(环绕)。

Undefined signed integer overflow allows the compiler to assume that overflows don't happen, which may introduce optimization opportunities. See e.g. this blog postfor discussion.

未定义的有符号整数溢出允许编译器假设溢出不会发生,这可能会引入优化机会。请参阅例如此博客文章进行讨论。

回答by anatolyg

unsignedleads to the same or better performance than signed. Some examples:

unsigned导致与 相同或更好的性能signed。一些例子:

  • Division by a constant which is a power of 2 (see also the answer from FredOverflow)
  • Division by a constant number (for example, my compiler implements division by 13 using 2 asm instructions for unsigned, and 6 instructions for signed)
  • Checking whether a number is even (i have no idea why my MS Visual Studio compiler implements it with 4 instructions for signednumbers; gcc does it with 1 instruction, just like in the unsignedcase)
  • 除以 2 的幂的常数(另请参阅FredOverflow的答案)
  • 除以一个常数(例如,我的编译器使用 2 条 asm 指令用于无符号数和 6 条指令用于有符号数来实现除以 13)
  • 检查数字是否为偶数(我不知道为什么我的 MS Visual Studio 编译器用 4 条signed数字指令来实现它;gcc 用 1 条指令来实现,就像在这种unsigned情况下一样)

shortusually leads to the same or worse performance than int(assuming sizeof(short) < sizeof(int)). Performance degradation happens when you assign a result of an arithmetic operation (which is usually int, never short) to a variable of type short, which is stored in the processor's register (which is also of type int). All the conversions from shortto inttake time and are annoying.

short通常会导致与int(假设sizeof(short) < sizeof(int))相同或更差的性能。当您将算术运算的结果(通常是int,从不short)分配给类型为 的变量时,会发生性能下降,该变量short存储在处理器的寄存器(也是 类型int)中。从short到 的所有转换都int需要时间并且很烦人。

Note: some DSPs have fast multiplication instructions for the signed shorttype; in this specific case shortis faster than int.

注意:有些DSP对signed short类型有快速乘法指令;在这种特定情况下shortint.

As for the difference between intand long, i can only guess (i am not familiar with 64-bit architectures). Of course, if intand longhave the same size (on 32-bit platforms), their performance is also the same.

至于int和之间的区别long,我只能猜测(我不熟悉 64 位架构)。当然,如果intlong具有相同的大小(在 32 位平台上),它们的性能也是相同的。



A very important addition, pointed out by several people:

几个人指出的一个非常重要的补充:

What really matters for most applications is the memory footprint and utilized bandwidth. You should use the smallest necessary integers (short, maybe even signed/unsigned char) for large arrays.

对于大多数应用程序而言,真正重要的是内存占用和已用带宽。对于大型数组short,您应该使用最小的必要整数(,甚至可能signed/unsigned char)。

This will give better performance, but the gain is nonlinear (i.e. not by a factor of 2 or 4) and somewhat unpredictable - it depends on cache size and the relationship between calculations and memory transfers in your application.

这将提供更好的性能,但增益是非线性的(即不是 2 或 4 倍)并且有些不可预测 - 这取决于缓存大小以及应用程序中计算和内存传输之间的关系。

回答by sharptooth

This will depend on exact implementation. In most cases there will be no difference however. If you really care you have to try all the variants you consider and measure performance.

这将取决于具体的实施。但是,在大多数情况下,不会有任何区别。如果您真的很在意,则必须尝试您考虑的所有变体并衡量性能。

回答by Sebastian Paaske T?rholm

This is pretty much dependent on the specific processor.

这在很大程度上取决于特定的处理器。

On most processors, there are instructions for both signed and unsigned arithmetic, so the difference between using signed and unsigned integers comes down to which one the compiler uses.

在大多数处理器上,都有用于有符号和无符号算术的指令,因此使用有符号和无符号整数之间的区别归结为编译器使用的是哪一种。

If any of the two is faster, it's completely processor specific, and most likely the difference is miniscule, if it exists at all.

如果两者中的任何一个更快,那么它完全是特定于处理器的,并且很可能差异很小,如果它存在的话。

回答by David Stone

The performance difference between signed and unsigned integers is actually more general than the acceptance answer suggests. Division of an unsigned integer by any constant can be made faster than division of a signed integer by a constant, regardless of whether the constant is a power of two. See http://ridiculousfish.com/blog/posts/labor-of-division-episode-iii.html

有符号和无符号整数之间的性能差异实际上比接受答案所暗示的更普遍。无论常数是否为 2 的幂,将无符号整数除以任何常数都比将有符号整数除以常数更快。见http://ridiculousfish.com/blog/posts/labor-of-division-episode-iii.html

At the end of his post, he includes the following section:

在他的帖子末尾,他包括以下部分:

A natural question is whether the same optimization could improve signed division; unfortunately it appears that it does not, for two reasons:

The increment of the dividend must become an increase in the magnitude, i.e. increment if n > 0, decrement if n < 0. This introduces an additional expense.

The penalty for an uncooperative divisor is only about half as much in signed division, leaving a smaller window for improvements.

Thus it appears that the round-down algorithm could be made to work in signed division, but will underperform the standard round-up algorithm.

一个自然的问题是相同的优化是否可以改进有符号除法;不幸的是,它似乎没有,原因有二:

红利的增加必须变成幅度的增加,即如果 n > 0 则增加,如果 n < 0 则减少。这会引入额外的费用。

对不合作除数的惩罚只有在有符号除法中的一半左右,留下了一个更小的改进窗口。

因此,似乎可以使向下舍入算法在有符号除法中工作,但其性能将低于标准向上舍入算法。

回答by phuclv

Not only division by powers of 2 are faster with unsigned type, division by any other values are also faster with unsigned type. If you look at Agner Fog's Instruction tablesyou'll see that unsigned divisions have similar or better performance than signed versions

不仅无符号类型的 2 次幂除法速度更快,无符号类型的任何其他值的除法也更快。如果您查看Agner Fog 的指令表,您会发现未签名除法的性能与签名版本相似或更好

For example with the AMD K7

例如使用 AMD K7

╔═════════════╤══════════╤═════╤═════════╤═══════════════════════╗
║ Instruction │ Operands │ Ops │ Latency │ Reciprocal throughput ║
╠═════════════╪══════════╪═════╪═════════╪═══════════════════════╣
║ DIV         │ r8/m8    │ 32  │ 24      │ 23                    ║
║ DIV         │ r16/m16  │ 47  │ 24      │ 23                    ║
║ DIV         │ r32/m32  │ 79  │ 40      │ 40                    ║
║ IDIV        │ r8       │ 41  │ 17      │ 17                    ║
║ IDIV        │ r16      │ 56  │ 25      │ 25                    ║
║ IDIV        │ r32      │ 88  │ 41      │ 41                    ║
║ IDIV        │ m8       │ 42  │ 17      │ 17                    ║
║ IDIV        │ m16      │ 57  │ 25      │ 25                    ║
║ IDIV        │ m32      │ 89  │ 41      │ 41                    ║
╚═════════════╧══════════╧═════╧═════════╧═══════════════════════╝

The same thing applies to Intel Pentium

同样的事情适用于英特尔奔腾

╔═════════════╤══════════╤══════════════╗
║ Instruction │ Operands │ Clock cycles ║
╠═════════════╪══════════╪══════════════╣
║ DIV         │ r8/m8    │ 17           ║
║ DIV         │ r16/m16  │ 25           ║
║ DIV         │ r32/m32  │ 41           ║
║ IDIV        │ r8/m8    │ 22           ║
║ IDIV        │ r16/m16  │ 30           ║
║ IDIV        │ r32/m32  │ 46           ║
╚═════════════╧══════════╧══════════════╝

Of course those are quite ancient. Newer architectures with more transistors might close the gap but the basic things apply: generally you need more macro ops, more logic, more latency to do a signed division

当然,这些都是很古老的。具有更多晶体管的较新架构可能会缩小差距,但基本情况适用:通常您需要更多宏操作、更多逻辑、更多延迟来进行有符号除法

回答by íhor Mé

In short, don't bother before the fact. But do bother after.

简而言之,在事实面前不要打扰。但事后麻烦。

If you want to have performance you have to use performance optimizations of a compilerwhich may work against common sense. One thing to remember is that different compilers can compile code differently and they themselves have different sorts of optimizations. If we're talking about a g++compiler and talking about maxing out it's optimization level by using -Ofast, or at least an -O3flag, in my experience it can compile longtype into code with even better performance than any unsignedtype, or even just int.

如果您想获得性能,则必须使用可能违反常识的编译器的性能优化。要记住的一件事是,不同的编译器可以以不同的方式编译代码,并且它们本身具有不同类型的优化。如果我们在谈论g++编译器并谈论通过使用-Ofast或至少一个-O3标志来最大化它的优化级别,根据我的经验,它可以将long类型编译成代码,其性能甚至比任何unsigned类型都要好,甚至只是int.

This is from my own experience and I recommend you to first write your full program and care about such things only after that, when you have your actual code on your hands and you can compile it with optimizations to try and pick the types that actually perform best. This is also a good very general suggestion about code optimization for performance, write quickly first, try compiling with optimizations, tweak things to see what works best. And you should also try using different compilers to compile your program and choosing the one that outputs the most performant machine code.

这是根据我自己的经验,我建议您首先编写完整的程序,然后再关心这些事情,当您拥有实际代码并且可以通过优化编译它以尝试选择实际执行的类型时最好的事物。这也是关于代码优化性能的一个很好的非常普遍的建议,首先快速编写,尝试使用优化进行编译,调整一些东西以查看最有效的方法。并且您还应该尝试使用不同的编译器来编译您的程序并选择输出性能最高的机器代码的编译器。

An optimized multi-threaded linear algebra calculation program can easily have a >10x performance differencefinely optimized vs unoptimized. So this does matter.

一个优化的多线程线性代数计算程序很容易在精细优化与未优化之间产生 >10 倍的性能差异。所以这很重要。

Optimizer output contradicts logic in plenty of cases. For example, I had a case when a difference between a[x]+=band a[x]=bchanged program execution time almost 2x. And no, a[x]=bwasn't the faster one.

在很多情况下,优化器输出与逻辑相矛盾。例如,我有一个案例,其中的差异a[x]+=ba[x]=b更改的程序执行时间几乎是 2 倍。不,a[x]=b不是更快的。

Here's for example NVidia statingthat for programming their GPUs:

例如,NVidia 声明要对其 GPU 进行编程:

Note: As was already the recommended best practice, signed arithmetic should be preferred over unsigned arithmetic wherever possible for best throughput on SMM. The C language standard places more restrictions on overflow behavior for unsigned math, limiting compiler optimization opportunities.

注意:正如已经推荐的最佳实践,为了在 SMM 上获得最佳吞吐量,应尽可能首选有符号算术而不是无符号算术。C 语言标准对无符号数学的溢出行为设置了更多限制,从而限制了编译器优化的机会。

回答by íhor Mé

Signed and unsigned integers will always both operate as single clock instructions and have the same read-write performance but according to Dr Andrei Alexandrescuunsigned is preferred over signed. The reason for this is you can fit twice the amount of numbers in the same number of bits because you're not wasting the sign bit and you will use fewer instructions checking for negative numbers yielding performance increases from the decreased ROM. In my experience with the Kabuki VM, which features an ultra-high-performance ScriptImplementation, it is rare that you actually require a signed number when working with memory. I've spend may years doing pointer arithmetic with signed and unsigned numbers and I've found no benefit to the signed when no sign bit is needed.

有符号和无符号整数将始终作为单时钟指令运行并具有相同的读写性能,但根据Andrei Alexandrescu 博士的说法,无符号整数比有符号整数更受欢迎。这样做的原因是您可以在相同数量的位中容纳两倍数量的数字,因为您不会浪费符号位,并且您将使用更少的指令检查负数,从而通过减少的 ROM 提高性能。根据我对Kabuki VM 的体验,它具有超高性能的脚本实现,在使用内存时,您实际上很少需要带符号的数字。我花了很多年的时间用有符号数和无符号数做指针运算,当不需要符号位时,我发现有符号数没有任何好处。

Where signed may be preferred is when using bit shifting to perform multiplication and division of powers of 2 because you may perform negative powers of 2 division with signed 2's complement integers. Please see some more YouTube videos from Andreifor more optimization techniques. You can also find some good info in my article about the the world's fastest Integer-to-String conversion algorithm.

在使用位移执行 2 的幂的乘法和除法时,有符号可能是首选的地方,因为您可以使用有符号 2 的补码整数执行 2 除法的负幂。请观看更多来自 Andrei 的 YouTube 视频以了解更多优化技术。您还可以在我关于世界上最快的整数到字符串转换算法的文章中找到一些很好的信息。

回答by CAFxX

IIRC, on x86 signed/unsigned shouldn't make any difference. Short/long, on the other hand, is a different story, since the amount of data that has to be moved to/from RAM is bigger for longs (other reasons may include cast operations like extending a short to long).

IIRC,在 x86 上签名/未签名应该没有任何区别。另一方面,短/长则是另一回事,因为对于长而言,必须移入/移出 RAM 的数据量更大(其他原因可能包括转换操作,例如将短扩展为长)。