C语言尽可能避免使用 mod 运算符更好吗？

Question

提问by limp_chimp

I assume that calculating the modulus of a number is a somewhat expensive operation, at least compared to simple arithmetic tests (such as seeing if a number exceeds the length of an array). If this is indeed the case, is it more efficient to replace, for example, the following code:

我认为计算一个数字的模数是一个有点昂贵的操作，至少与简单的算术测试相比（例如查看一个数字是否超过数组的长度）。如果确实是这种情况，是否更有效地替换，例如，以下代码：

res = array[(i + 1) % len];

with the following? :

与以下？：

res = array[(i + 1 == len) ? 0 : i + 1];

The first one is easier on the eyes, but I wonder if the second might be more efficient. If so, might I expect an optimizing compiler to replace the first snippet with the second, when a compiled language is used?

第一个看起来更容易，但我想知道第二个是否更有效。如果是这样，当使用编译语言时，我是否希望优化编译器将第一个片段替换为第二个片段？

Of course, this "optimization" (if it is indeed an optimization) doesn't work in all cases (in this case, it only works if i+1is never more than len).

当然，这种“优化”（如果它确实是优化）并非在所有情况下都有效（在这种情况下，它仅在i+1不超过时才有效len）。

Answer 1

回答by NPE

My general advice is as follows. Use whichever version you think is easier on the eye, and then profile your entire system. Only optimize those parts of the code that the profiler flags up as bottlenecks. I'll bet my bottom dollar that the modulo operator isn't going to be among them.

我的一般建议如下。使用您认为更容易理解的任何版本，然后分析您的整个系统。仅优化分析器标记为瓶颈的代码部分。我敢打赌，模运算符不会在其中。

As far as the specific example goes, only benchmarking can tell which is faster on your specific architecture using your specific compiler. You are potentially replacing modulo with branching, and it's anything but obvious which would be faster.

就具体示例而言，只有基准测试才能判断使用特定编译器在特定架构上哪个更快。您可能会用branching替换 modulo ，而且显然哪个会更快。

Answer 2

回答by dpi

Some simple measurement:

一些简单的测量：

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    int test = atoi(argv[1]);
    int divisor = atoi(argv[2]);
    int iterations = atoi(argv[3]);

    int a = 0;

    if (test == 0) {
        for (int i = 0; i < iterations; i++)
            a = (a + 1) % divisor;
    } else if (test == 1) {
        for (int i = 0; i < iterations; i++)
            a = a + 1 == divisor ? 0 : a + 1;
    }

    printf("%d\n", a);
}

Compiling with either gcc or clang with -O3, and running time ./a.out 0 42 1000000000(modulo version) or time ./a.out 1 42 1000000000(comparison version) results in

使用 gcc 或 clang 进行编译-O3，并运行time ./a.out 0 42 1000000000(modulo version) 或time ./a.out 1 42 1000000000(comparison version) 结果

6.25 secondsuser runtime for the modulo version,
1.03 secondsfor the comparison version.

模数版本的用户运行时间为6.25 秒，
比较版本为1.03 秒。

(using gcc 5.2.1 or clang 3.6.2; Intel Core i5-4690K @ 3.50GHz; 64-bit Linux)

（使用 gcc 5.2.1 或 clang 3.6.2；英特尔酷睿 i5-4690K @ 3.50GHz；64 位 Linux）

This means that it is probably a good idea to use the comparison version.

这意味着使用比较版本可能是个好主意。

Answer 3

回答by Michel Billaud

Well, have a look at 2 ways to get the next value of a "modulo 3" cyclic counter.

好吧，看一下获取“模 3”循环计数器的下一个值的 2 种方法。

int next1(int n) {
    return (n + 1) % 3;
}

int next2(int n) {
    return n == 2 ? 0 : n + 1;
}

I've compiled it with gcc -O3 option (for the common x64 architecture), and -s to get the assembly code.

我已经用 gcc -O3 选项（对于常见的 x64 架构）和 -s 编译它以获取汇编代码。

The code for the first function does some unexplainable magic (*) to avoid a division, using a multiplication anyway:

第一个函数的代码做了一些无法解释的魔法 (*) 来避免除法，无论如何都使用乘法：

addl    , %edi
movl    31655766, %edx
movl    %edi, %eax
imull   %edx
movl    %edi, %eax
sarl    , %eax
subl    %eax, %edx
leal    (%rdx,%rdx,2), %eax
subl    %eax, %edi
movl    %edi, %eax
ret

And is much longer (and I bet slower) than the second function:

并且比第二个函数长得多（我敢打赌更慢）：

leal    1(%rdi), %eax
cmpl    , %edi
movl    addl    , %edi
movl    %edi, %edx
sarl    , %edx
shrl    , %edx
leal    (%rdi,%rdx), %eax
andl    , %eax
subl    %edx, %eax
ret
, %edx
cmove   %edx, %eax
ret

So it is not always true that "the (modern) compiler does a better job than you anyway".

因此，“（现代）编译器无论如何都比您做得更好”并不总是正确的。

Interestingly, the same experiment with 4 instead of 3 leads to a and-masking for the first function

有趣的是，同样的实验用 4 而不是 3 导致第一个函数的 and-masking

int next3(int n) {
    return (n + 1) & 3;;
}

but it is still, and by large, inferior to the second version.

但总的来说，它仍然不如第二个版本。

Being more explicit about proper ways to do the things

更明确地说明做事的正确方法

leal    1(%rdi), %eax
andl    , %eax
ret

yields much better results :

产生更好的结果：

#include <stdio.h>
#include <stdlib.h>

#define modulo

int main()
{
    int iterations = 1000000000;
    int size = 16;
    int a[size];
    unsigned long long res = 0;
    int i, j;

    for (i=0;i<size;i++)
        a[i] = i;

    for (i=0,j=0;i<iterations;i++)
    {
        j++;
        #ifdef modulo
            j %= size;
        #else
            if (j >= size)
                j = 0;
        #endif
        res += a[j];
    }

    printf("%llu\n", res);
}

(*) well, not that complicated. Multiplication by reciprocical. Compute the integer constant K = (2^N)/3, for some large enough value of N. Now, when you want the value of X/3, instead of a division by 3, compute X*K, and shift it N positions to the right.

(*) 好吧，没那么复杂。乘以倒数。计算整数常数 K = (2^N)/3，对于一些足够大的 N 值。现在，当你想要 X/3 的值时，而不是除以 3，计算 X*K，并将其移位 N位置在右边。

Answer 4

回答by J. Doe

If 'len' in your code is big enough, then the conditional will be faster, as the branch predictors will nearly always guess correctly.

如果代码中的“len”足够大，那么条件会更快，因为分支预测器几乎总是能正确猜测。

If not, then I believe this is closely connected to circular queues, where it is often the case that the length is a power of 2. This will enable the compiler to substitute modulo with a simple AND.

如果不是，那么我相信这与循环队列密切相关，在这种情况下，长度通常是 2 的幂。这将使编译器能够用简单的 AND 代替模数。

The code is the following:

代码如下：

##代码##

size=15:

大小=15：

modulo: 4,868s
cond: 1,291s

模数：4,868s
条件：1,291 秒

size=16:

大小=16：

modulo: 1,067s
cond: 1,599s

模数：1,067s
条件：1,599 秒

Compiled in gcc 7.3.0 , with -O3 optimization. The machine is an i7 920.

在 gcc 7.3.0 中编译，使用 -O3 优化。机器是i7 920。

Answer 5

回答by seand

Modulo can be done with a single processor instruction on most architectures (ex. DIV on x86). However it's likely a premature optimization for what you need.

在大多数体系结构（例如 x86 上的 DIV）上，模数可以通过单个处理器指令完成。但是，这可能是您需要的过早优化。

C语言尽可能避免使用 mod 运算符更好吗？

提问by limp_chimp

回答by NPE

回答by dpi

回答by Michel Billaud

回答by J. Doe

回答by seand

相关推荐

最近更新

标签

C语言 尽可能避免使用 mod 运算符更好吗？

提问by limp_chimp

回答by NPE

回答by dpi

回答by Michel Billaud

回答by J. Doe

回答by seand

相关推荐

C语言 使用 fgets 达到 EOF

C语言 在 C 中声明和初始化数组

C语言 如何检查C字符串是否为空

C语言 在 C 上正确使用 Stat

相关推荐

最近更新

标签

C语言尽可能避免使用 mod 运算符更好吗？

C语言使用 fgets 达到 EOF

C语言在 C 中声明和初始化数组

C语言如何检查C字符串是否为空

C语言在 C 上正确使用 Stat