C语言在 C 中计算模量的最优化方法

Question

提问by hasanatkazmi

I have minimize cost of calculating modulus in C. say I have a number x and n is the number which will divide x

我已经最小化了在 C 中计算模数的成本。说我有一个数字 x，n 是将 x 相除的数字

when n == 65536 (which happens to be 2^16):

当 n == 65536（恰好是 2^16）时：

mod = x % n (11 assembly instructions as produced by GCC) or
mod = x & 0xffff which is equal to mod = x & 65535 (4 assembly instructions)

mod = x % n（由 GCC 生成的 11 条汇编指令）或
mod = x & 0xffff 等于 mod = x & 65535（4 条汇编指令）

so, GCC doesn't optimize it to this extent.

所以，GCC 并没有优化到这个程度。

In my case n is not x^(int) but is largest prime less than 2^16 which is 65521

在我的情况下，n 不是 x^(int) 而是小于 2^16 的最大素数，即 65521

as I showed for n == 2^16, bit-wise operations can optimize the computation. What bit-wise operations can I preform when n == 65521 to calculate modulus.

正如我为 n == 2^16 展示的那样，按位运算可以优化计算。当 n == 65521 计算模数时，我可以执行哪些按位运算。

Answer 1

回答by Michael Burr

First, make sure you're looking at optimized code before drawing conclusion about what GCC is producing (and make sure this particular expression really needs to be optimized). Finally - don't count instructions to draw your conclusions; it may be that an 11 instruction sequence might be expected to perform better than a shorter sequence that includes a div instruction.

首先，在得出关于 GCC 产生什么的结论之前，确保你正在查看优化的代码（并确保这个特定的表达式确实需要优化）。最后——不要指望得出结论的指令；可能期望 11 条指令序列比包含 div 指令的较短序列执行得更好。

Also, you can't conclude that because x mod 65536can be calculated with a simple bit mask that any mod operation can be implemented that way. Consider how easy dividing by 10 in decimal is as opposed to dividing by an arbitrary number.

此外，您无法得出结论，因为x mod 65536可以使用简单的位掩码计算出任何模运算都可以通过这种方式实现。想想用十进制除以 10 与除以任意数字相比是多么容易。

With all that out of the way, you may be able to use some of the 'magic number' techniques from Henry Warren's Hacker's Delight book:

有了所有这些，您也许可以使用 Henry Warren 的 Hacker's Delight 书中的一些“幻数”技术：

There's an added chapter on the websitethat contains "two methods of computing the remainder of division without computing the quotient!", which you may find of some use. The 1st technique applies only to a limited set of divisors, so it won't work for your particular instance. I haven't actually read the online chapter, so I don't know exactly how applicable the other technique might be for you.

网站上有一个附加章节，其中包含“两种计算除法余数而不计算商的方法！”，您可能会发现它有些用处。第一种技术仅适用于一组有限的除数，因此它不适用于您的特定实例。我还没有真正阅读在线章节，所以我不知道其他技术对你的适用性如何。

Answer 2

回答by caf

x mod 65536 is only equivalent to x & 0xffff if x is unsigned - for signed x, it gives the wrong result for negative numbers. For unsigned x, gcc does indeed optimise x % 65536to a bitwise and with 65535 (even on -O0, in my tests).

x mod 65536 仅等效于 x & 0xffff 如果 x 是无符号的 - 对于有符号的 x，它会给出错误的负数结果。对于无符号 x，gcc 确实优化x % 65536为按位和 65535（即使在 -O0 上，在我的测试中）。

Because 65521 is not a power of 2, x mod 65521 can't be calculated so simply. gcc 4.3.2 on -O3 calculates it using x - (x / 65521) * 65521; the integer division by a constant is done using integer multiplication by a related constant.

因为 65521 不是 2 的幂，所以不能这么简单地计算 x mod 65521。-O3 上的 gcc 4.3.2 使用x - (x / 65521) * 65521; 整数除以常数是使用整数乘以相关常数来完成的。

Answer 3

回答by Accipitridae

rIf you don't have to fully reduce your integers modulo 65521, then you can use the fact that 65521 is close to 2**16. I.e. if x is an unsigned int you want to reduce then you can do the following:

r如果您不必以 65521 为模完全减少整数，那么您可以使用 65521 接近 2**16 的事实。即，如果 x 是您想要减少的无符号整数，那么您可以执行以下操作：

unsigned int low = x &0xffff;
unsigned int hi = (x >> 16);
x = low + 15 * hi;

This uses that 2**16 % 65521 == 15. Note that this is not a full reduction. I.e. starting with a 32-bit input, you only are guaranteed that the result is at most 20 bits and that it is of course congruent to the input modulo 65521.

这使用了 2**16 % 65521 == 15。请注意，这不是完全减少。即从 32 位输入开始，您只能保证结果最多为 20 位，并且它当然与输入模 65521 一致。

This trick can be used in applications where there are many operations that have to be reduced modulo the same constant, and where intermediary results do not have to be the smallest element in its residue class.

这个技巧可用于有许多操作必须以相同的常数为模减少的应用程序，并且中间结果不必是其剩余类中的最小元素。

E.g. one application is the implementation of Adler-32, which uses the modulus 65521. This hash function does a lot of operations modulo 65521. To implement it efficiently one would only do modular reductions after a carefully computed number of additions. A reduction shown as above is enough and only the computation of the hash will need a full modulo operation.

例如，一个应用程序是 Adler-32 的实现，它使用模数 65521。这个散列函数做了很多以 65521 为模的运算。为了有效地实现它，只有在仔细计算了加法次数后才能进行模归约。如上所示的减少就足够了，只有散列的计算需要完整的模运算。

Answer 4

回答by Danvil

The bitwise operation only works well if the divisor is of the form 2^n. In the general case, there is no such bit-wise operation.

仅当除数的形式为时，按位运算才有效2^n。在一般情况下，没有这种按位操作。

Answer 5

回答by blondiepassesby

If the constant with which you want to take the modulo is known at compile time andyou have a decent compiler (e.g. gcc), tis usually best to let the compiler work its magic. Just declare the modulo const.

如果您想要取模的常量在编译时是已知的，并且您有一个不错的编译器（例如 gcc），通常最好让编译器发挥它的魔力。只需声明模常量。

If you don't know the constant at compile time, but you are going to take - say - a billion modulos with the same number, then use this http://libdivide.com/

如果您在编译时不知道常量，但您将采用 - 比如说 - 十亿个相同数字的模数，那么使用这个http://libdivide.com/

Answer 6

回答by gabi tomuta

As an approach when we deal with powers of 2, can be considered this one (mostly C flavored):

作为我们处理 2 的幂的一种方法，可以考虑这个（主要是 C 风格的）：

.
.

#define THE_DIVISOR    0x8U;  /* The modulo value (POWER OF 2). */
.
.
uint8 CheckIfModulo(const sint32 TheDividend)
{
    uint8 RetVal = 1; /* TheDividend is not modulus THE_DIVISOR. */

    if (0 == (TheDividend & (THE_DIVISOR - 1)))
    {
        /* code if modulo is satisfied */
        RetVal = 0; /* TheDividend IS modulus THE_DIVISOR. */
    }
    else
    {
        /* code if modulo is NOT satisfied */
    }
    return RetVal;
}

Answer 7

回答by David

If xis an increasing index, and the increment iis known to be less than n(e.g. when iterating over a circular array of length n), avoid the modulus completely. A loop going

如果x是递增索引，并且i已知增量小于n（例如，在迭代长度为n的圆形数组时），则完全避免模数。一个循环

x += i; if (x >= n) x -= n;

is way faster than

比

x = (x + i) % n;

which you unfortunately find in many text books...

不幸的是，您可以在许多教科书中找到它...

If you really need an expression (e.g. because you are using it in a forstatement), you can use the ugly but efficient

如果你真的需要一个表达式（例如因为你在for语句中使用它），你可以使用丑陋但高效的

x = x + (x+i < n ? i : i-n)

Answer 8

回答by Krystian

idiv — Integer Division

idiv - 整数除法

The idiv instruction divides the contents of the 64 bit integer EDX:EAX (constructed by viewing EDX as the most significant four bytes and EAX as the least significant four bytes) by the specified operand value. The quotient result of the division is stored into EAX, while the remainder is placed in EDX.

idiv 指令将 64 位整数 EDX:EAX（通过将 EDX 视为最高有效的四个字节而 EAX 视为最低有效的四个字节而构造）的内容除以指定的操作数值。除法的商结果存入 EAX ，余数存入EDX。

source: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html

来源：http: //www.cs.virginia.edu/~evans/cs216/guides/x86.html

C语言在 C 中计算模量的最优化方法

提问by hasanatkazmi

回答by Michael Burr

回答by caf

回答by Accipitridae

回答by Danvil

回答by blondiepassesby

回答by gabi tomuta

回答by David

回答by Krystian

相关推荐

最近更新

标签

C语言 在 C 中计算模量的最优化方法

提问by hasanatkazmi

回答by Michael Burr

回答by caf

回答by Accipitridae

回答by Danvil

回答by blondiepassesby

回答by gabi tomuta

回答by David

回答by Krystian

相关推荐

C语言 如何从 C 中的函数返回多个值？

C语言 C 中的位掩码 - 如何获取字节的第一位？

C语言 如何从 VIM 运行 C 程序？

C语言 如何在C中的二维数组中输入字符串？

相关推荐

最近更新

标签

C语言在 C 中计算模量的最优化方法

C语言如何从 C 中的函数返回多个值？

C语言如何从 VIM 运行 C 程序？

C语言如何在C中的二维数组中输入字符串？