C++ < 比 <= 快吗？

Question

提问by snoopy

Is if( a < 901 )faster than if( a <= 900 ).

是if( a < 901 )不是更快if( a <= 900 )。

Not exactly as in this simple example, but there are slight performance changes on loop complex code. I suppose this has to do something with generated machine code in case it's even true.

与这个简单示例中的不完全一样，但循环复杂代码的性能略有变化。我想这必须对生成的机器代码做一些事情，以防万一。

Answer 1

回答by Jonathon Reinhart

No, it will not be faster on most architectures. You didn't specify, but on x86, all of the integral comparisons will be typically implemented in two machine instructions:

不，在大多数架构上它不会更快。您没有指定，但在 x86 上，所有积分比较通常都将在两条机器指令中实现：

A testor cmpinstruction, which sets EFLAGS
And a Jcc(jump) instruction, depending on the comparison type (and code layout):
- jne- Jump if not equal --> ZF = 0
- jz- Jump if zero (equal) --> ZF = 1
- jg- Jump if greater --> ZF = 0 and SF = OF
- (etc...)

A testorcmp指令，它设置EFLAGS
还有一个Jcc（跳转）指令，取决于比较类型（和代码布局）：
- jne- 不相等则跳转 --> ZF = 0
- jz- 如果为零（等于）则跳转 --> ZF = 1
- jg- 如果更大则跳转 --> ZF = 0 and SF = OF
- （等等...）

Example(Edited for brevity) Compiled with $ gcc -m32 -S -masm=intel test.c

示例（为简洁起见编辑）编译$ gcc -m32 -S -masm=intel test.c

    if (a < b) {
        // Do something 1
    }

Compiles to:

编译为：

    mov     eax, DWORD PTR [esp+24]      ; a
    cmp     eax, DWORD PTR [esp+28]      ; b
    jge     .L2                          ; jump if a is >= b
    ; Do something 1
.L2:

And

和

    if (a <= b) {
        // Do something 2
    }

Compiles to:

编译为：

    mov     eax, DWORD PTR [esp+24]      ; a
    cmp     eax, DWORD PTR [esp+28]      ; b
    jg      .L5                          ; jump if a is > b
    ; Do something 2
.L5:

So the only difference between the two is a jgversus a jgeinstruction. The two will take the same amount of time.

因此，两者之间的唯一区别是指令jg与jge指令。两者将花费相同的时间。

I'd like to address the comment that nothing indicates that the different jump instructions take the same amount of time. This one is a little tricky to answer, but here's what I can give: In the Intel Instruction Set Reference, they are all grouped together under one common instruction, Jcc(Jump if condition is met). The same grouping is made together under the Optimization Reference Manual, in Appendix C. Latency and Throughput.

我想解决以下评论，即没有任何内容表明不同的跳转指令需要相同的时间。这个回答有点棘手，但我可以给出以下内容：在英特尔指令集参考中，它们都组合在一个通用指令下Jcc（如果满足条件则跳转）。附录 C. 延迟和吞吐量中的优化参考手册下进行了相同的分组。

Latency— The number of clock cycles that are required for the execution core to complete the execution of all of the μops that form an instruction.
Throughput— The number of clock cycles required to wait before the issue ports are free to accept the same instruction again. For many instructions, the throughput of an instruction can be significantly less than its latency

延迟— 执行内核完成构成指令的所有 μop 的执行所需的时钟周期数。
吞吐量— 在发出端口可以自由地再次接受相同指令之前需要等待的时钟周期数。对于许多指令，一条指令的吞吐量可以显着小于其延迟

The values for Jccare:

的值为Jcc：

      Latency   Throughput
Jcc     N/A        0.5

with the following footnote on Jcc:

带有以下脚注Jcc：

7) Selection of conditional jump instructions should be based on the recommendation of section Section 3.4.1, “Branch Prediction Optimization,” to improve the predictability of branches. When branches are predicted successfully, the latency of jccis effectively zero.

7) 条件跳转指令的选择应参考第 3.4.1 节“分支预测优化”的建议，以提高分支的可预测性。当成功预测分支时，延迟jcc实际上为零。

So, nothing in the Intel docs ever treats one Jccinstruction any differently from the others.

因此，英特尔文档中没有任何内容将一条Jcc指令与其他指令区别对待。

If one thinks about the actual circuitry used to implement the instructions, one can assume that there would be simple AND/OR gates on the different bits in EFLAGS, to determine whether the conditions are met. There is then, no reason that an instruction testing two bits should take any more or less time than one testing only one (Ignoring gate propagation delay, which is much less than the clock period.)

如果考虑用于实现指令的实际电路，可以假设在中的不同位上会有简单的与/或门EFLAGS，以确定是否满足条件。那么，没有理由测试两个位的指令比只测试一个位花费更多或更少的时间（忽略门传播延迟，它远小于时钟周期。）

Edit: Floating Point

编辑：浮点数

This holds true for x87 floating point as well: (Pretty much same code as above, but with doubleinstead of int.)

这也适用于 x87 浮点数：（与上面的代码几乎相同，但用double代替int。）

        fld     QWORD PTR [esp+32]
        fld     QWORD PTR [esp+40]
        fucomip st, st(1)              ; Compare ST(0) and ST(1), and set CF, PF, ZF in EFLAGS
        fstp    st(0)
        seta    al                     ; Set al if above (CF=0 and ZF=0).
        test    al, al
        je      .L2
        ; Do something 1
.L2:

        fld     QWORD PTR [esp+32]
        fld     QWORD PTR [esp+40]
        fucomip st, st(1)              ; (same thing as above)
        fstp    st(0)
        setae   al                     ; Set al if above or equal (CF=0).
        test    al, al
        je      .L5
        ; Do something 2
.L5:
        leave
        ret

Answer 2

回答by Lucas

Historically (we're talking the 1980s and early 1990s), there were somearchitectures in which this was true. The root issue is that integer comparison is inherently implemented via integer subtractions. This gives rise to the following cases.

从历史上看（我们说的是 1980 年代和 1990 年代初期），在某些架构中确实如此。根本问题是整数比较本质上是通过整数减法实现的。这导致了以下情况。

Comparison     Subtraction
----------     -----------
A < B      --> A - B < 0
A = B      --> A - B = 0
A > B      --> A - B > 0

Now, when A < Bthe subtraction has to borrow a high-bit for the subtraction to be correct, just like you carry and borrow when adding and subtracting by hand. This "borrowed" bit was usually referred to as the carry bitand would be testable by a branch instruction. A second bit called the zero bitwould be set if the subtraction were identically zero which implied equality.

现在，当A < B减法必须借高位才能使减法正确时，就像手动加法和减法时进位和借位一样。这个“借用”位通常被称为进位位，可以通过分支指令进行测试。如果减法相同为零，这意味着相等，则将设置称为零位的第二位。

There were usually at least two conditional branch instructions, one to branch on the carry bit and one on the zero bit.

通常至少有两条条件分支指令，一条分支到进位位，一条分支到零位。

Now, to get at the heart of the matter, let's expand the previous table to include the carry and zero bit results.

现在，为了了解问题的核心，让我们扩展前面的表格以包括进位和零位结果。

Comparison     Subtraction  Carry Bit  Zero Bit
----------     -----------  ---------  --------
A < B      --> A - B < 0    0          0
A = B      --> A - B = 0    1          1
A > B      --> A - B > 0    1          0

So, implementing a branch for A < Bcan be done in one instruction, because the carry bit is clear onlyin this case, , that is,

所以，A < B可以在一条指令中实现一个分支，因为进位位只有在这种情况下才清零，即，

;; Implementation of "if (A < B) goto address;"
cmp  A, B          ;; compare A to B
bcz  address       ;; Branch if Carry is Zero to the new address

But, if we want to do a less-than-or-equal comparison, we need to do an additional check of the zero flag to catch the case of equality.

但是，如果我们想要做一个小于或等于的比较，我们需要对零标志做一个额外的检查来捕捉相等的情况。

;; Implementation of "if (A <= B) goto address;"
cmp A, B           ;; compare A to B
bcz address        ;; branch if A < B
bzs address        ;; also, Branch if the Zero bit is Set

So, on some machines, using a "less than" comparison mightsave one machine instruction. This was relevant in the era of sub-megahertz processor speed and 1:1 CPU-to-memory speed ratios, but it is almost totally irrelevant today.

因此，在某些机器上，使用“小于”比较可能会节省一条机器指令。这在亚兆赫处理器速度和 1:1 CPU 与内存速度比的时代是相关的，但在今天几乎完全无关紧要。

Answer 3

回答by David Schwartz

Assuming we're talking about internal integer types, there's no possible way one could be faster than the other. They're obviously semantically identical. They both ask the compiler to do precisely the same thing. Only a horribly broken compiler would generate inferior code for one of these.

假设我们在谈论内部整数类型，那么不可能比另一个更快。它们在语义上显然是相同的。他们都要求编译器做同样的事情。只有严重损坏的编译器才会为其中之一生成劣质代码。

If there was some platform where <was faster than <=for simple integer types, the compiler should alwaysconvert <=to <for constants. Any compiler that didn't would just be a bad compiler (for that platform).

如果有一些平台，<是速度比<=为简单的整数类型，编译器应该总是转换<=到<为常数。任何不这样做的编译器都将是一个糟糕的编译器（对于该平台）。

Answer 4

回答by Adrian Cornish

I see that neither is faster. The compiler generates the same machine code in each condition with a different value.

我看到两者都不是更快。编译器在每个条件下生成相同的机器代码，但具有不同的值。

if(a < 901)
cmpl  0, -4(%rbp)
jg .L2

if(a <=901)
cmpl  1, -4(%rbp)
jg .L3

My example ifis from GCC on x86_64 platform on Linux.

我的示例if来自 Linux 上 x86_64 平台上的 GCC。

Compiler writers are pretty smart people, and they think of these things and many others most of us take for granted.

编译器编写者是非常聪明的人，他们会想到这些以及我们大多数人认为理所当然的许多其他事情。

I noticed that if it is not a constant, then the same machine code is generated in either case.

我注意到如果它不是常量，那么在任何一种情况下都会生成相同的机器代码。

int b;
if(a < b)
cmpl  -4(%rbp), %eax
jge   .L2

if(a <=b)
cmpl  -4(%rbp), %eax
jg .L3

Answer 5

回答by ridiculous_fish

For floating point code, the <= comparison may indeed be slower (by one instruction) even on modern architectures. Here's the first function:

对于浮点代码，即使在现代架构上，<= 比较确实可能会变慢（通过一条指令）。这是第一个函数：

int compare_strict(double a, double b) { return a < b; }

On PowerPC, first this performs a floating point comparison (which updates cr, the condition register), then moves the condition register to a GPR, shifts the "compared less than" bit into place, and then returns. It takes four instructions.

在 PowerPC 上，首先执行浮点比较（更新cr条件寄存器），然后将条件寄存器移动到 GPR，将“比较小于”位移到位，然后返回。它需要四个指令。

Now consider this function instead:

现在考虑这个函数：

int compare_loose(double a, double b) { return a <= b; }

This requires the same work as compare_strictabove, but now there's two bits of interest: "was less than" and "was equal to." This requires an extra instruction (cror- condition register bitwise OR) to combine these two bits into one. So compare_looserequires five instructions, while compare_strictrequires four.

这需要与compare_strict上面相同的工作，但现在有两个有趣的地方：“小于”和“等于”。这需要一个额外的指令（cror- 条件寄存器按位或）将这两位合并为一个。所以compare_loose需要五个指令，而compare_strict需要四个。

You might think that the compiler could optimize the second function like so:

您可能认为编译器可以像这样优化第二个函数：

int compare_loose(double a, double b) { return ! (a > b); }

However this will incorrectly handle NaNs. NaN1 <= NaN2and NaN1 > NaN2need to both evaluate to false.

但是，这将错误地处理 NaN。NaN1 <= NaN2并且都NaN1 > NaN2需要评估为假。

Answer 6

回答by glglgl

Maybe the author of that unnamed book has read that a > 0runs faster than a >= 1and thinks that is true universally.

也许那本未命名的书的作者读过a > 0比它跑得更快的书，a >= 1并认为这是普遍适用的。

But it is because a 0is involved (because CMPcan, depending on the architecture, replaced e.g. with OR) and not because of the <.

但这是因为0涉及 a（因为CMPcan，取决于架构，例如替换为OR）而不是因为<.

Answer 7

回答by Eliot Ball

At the very least, if this were true a compiler could trivially optimise a <= b to !(a > b), and so even if the comparison itself were actually slower, with all but the most naive compiler you would not notice a difference.

至少，如果这是真的，编译器可以简单地将 a <= b 优化为 !(a > b)，因此即使比较本身实际上更慢，除了最幼稚的编译器之外，您也不会注意到差异.

Answer 8

回答by Masoud

They have the same speed. Maybe in some special architecture what he/she said is right, but in the x86 family at least I know they are the same. Because for doing this the CPU will do a substraction (a - b) and then check the flags of the flag register. Two bits of that register are called ZF (zero Flag) and SF (sign flag), and it is done in one cycle, because it will do it with one mask operation.

它们具有相同的速度。也许在一些特殊的架构中他/她说的是对的，但在 x86 系列中至少我知道它们是相同的。因为为此，CPU 将执行减法 (a - b)，然后检查标志寄存器的标志。该寄存器的两位称为 ZF（零标志）和 SF（符号标志），它在一个周期内完成，因为它将通过一次屏蔽操作完成。

Answer 9

回答by Telgin

This would be highly dependent on the underlying architecture that the C is compiled to. Some processors and architectures might have explicit instructions for equal to, or less than and equal to, which execute in different numbers of cycles.

这将高度依赖于 C 被编译到的底层架构。某些处理器和体系结构可能具有等于或小于等于的显式指令，它们以不同的周期数执行。

That would be pretty unusual though, as the compiler could work around it, making it irrelevant.

不过这很不寻常，因为编译器可以解决它，使其无关紧要。

Answer 10

回答by Mark Booth

TL;DRanswer

TL; DR答案

For most combinations of architecture, compiler and language it will not be quicker.

对于架构、编译器和语言的大多数组合，它不会更快。

Full answer

完整答案

Other answers have concentrated on x86architecture, and I don't know the ARMarchitecture (which your example assembler seems to be) well enough to comment specifically on the code generated, but this is an example of a micro-optimisationwhich isvery architecture specific, and is as likely to be an anti-optimisation as it is to be an optimisation.

其他的答案都集中在x86的架构，我不知道ARM架构（其中的例子汇编好像是）不够好，特别是对生成的代码进行评论，但是这是一个示例微优化这是非常架构具体的，并且很可能是一个反优化，因为它是一个优化。

As such, I would suggest that this sort of micro-optimisationis an example of cargo cultprogramming rather than best software engineering practice.

因此，我建议这种微优化是货物崇拜编程的一个例子，而不是最佳软件工程实践。

There are probably somearchitectures where this is an optimisation, but I know of at least one architecture where the opposite may be true. The venerable Transputerarchitecture only had machine code instructions for equal toand greater than or equal to, so all comparisons had to be built from these primitives.

可能在某些架构中这是一种优化，但我知道至少有一种架构可能恰恰相反。古老的Transputer架构只有等于和大于或等于的机器代码指令，因此所有比较都必须从这些原语构建。

Even then, in almost all cases, the compiler could order the evaluation instructions in such a way that in practice, no comparison had any advantage over any other. Worst case though, it might need to add a reverse instruction (REV) to swap the top two items on the operand stack. This was a single byte instruction which took a single cycle to run, so had the smallest overhead possible.

即便如此，在几乎所有情况下，编译器都可以以这样一种方式对求值指令进行排序，即在实践中，没有任何比较比任何其他比较有任何优势。最坏的情况是，它可能需要添加一个反向指令 (REV) 来交换操作数堆栈上的前两项。这是一条单字节指令，需要一个周期才能运行，因此开销尽可能小。

Whether or not a micro-optimisation like this is an optimisationor an anti-optimisationdepends on the specific architecture you are using, so it is usually a bad idea to get into the habit of using architecture specific micro-optimisations, otherwise you might instinctively use one when it is inappropriate to do so, and it looks like this is exactly what the book you are reading is advocating.

像这样的微优化是优化还是反优化取决于您使用的特定架构，因此养成使用特定于架构的微优化的习惯通常是一个坏主意，否则您可能会本能地在不合适的时候使用一个，看起来这正是你正在阅读的书所提倡的。

C++ < 比 <= 快吗？

提问by snoopy

回答by Jonathon Reinhart

回答by Lucas

回答by David Schwartz

回答by Adrian Cornish

回答by ridiculous_fish

回答by glglgl

回答by Eliot Ball

回答by Masoud

回答by Telgin

回答by Mark Booth

TL;DRanswer

TL; DR答案

Full answer

完整答案

相关推荐

最近更新

标签

C++ < 比 <= 快吗？

提问by snoopy

回答by Jonathon Reinhart

回答by Lucas

回答by David Schwartz

回答by Adrian Cornish

回答by ridiculous_fish

回答by glglgl

回答by Eliot Ball

回答by Masoud

回答by Telgin

回答by Mark Booth

TL;DRanswer

TL; DR答案

Full answer

完整答案

相关推荐

C++ 如何查看是否选中了 MFC 复选框

在 C++ 中生成独立于平台的 GUID？

C++ 从析构函数调用虚函数

C++ 解释复制构造函数示例

相关推荐

最近更新

标签