除了在 C/C++ 中使用 %(模数),还有其他选择吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48053/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is there any alternative to using % (modulus) in C/C++?
提问by JeffV
I read somewhere once that the modulus operator is inefficient on small embedded devices like 8 bit micro-controllers that do not have integer division instruction. Perhaps someone can confirm this but I thought the difference is 5-10 time slower than with an integer division operation.
我曾经在某处读到模数运算符在小型嵌入式设备上效率低下,例如没有整数除法指令的 8 位微控制器。也许有人可以确认这一点,但我认为差异比整数除法运算慢 5-10 倍。
Is there another way to do this other than keeping a counter variable and manually overflowing to 0 at the mod point?
除了保持计数器变量并在 mod 点手动溢出到 0 之外,还有其他方法可以做到这一点吗?
const int FIZZ = 6;
for(int x = 0; x < MAXCOUNT; x++)
{
if(!(x % FIZZ)) print("Fizz\n"); // slow on some systems
}
vs:
对比:
The way I am currently doing it:
我目前的做法是:
const int FIZZ = 6;
int fizzcount = 1;
for(int x = 1; x < MAXCOUNT; x++)
{
if(fizzcount >= FIZZ)
{
print("Fizz\n");
fizzcount = 0;
}
}
回答by Adam Davis
Ah, the joys of bitwise arithmetic. A side effect of many division routines is the modulus - so in few cases should division actually be faster than modulus. I'm interested to see the source you got this information from. Processors with multipliers have interesting division routines using the multiplier, but you can get from division result to modulus with just another two steps (multiply and subtract) so it's still comparable. If the processor has a built in division routine you'll likely see it also provides the remainder.
啊,按位算术的乐趣。许多除法例程的副作用是模数 - 所以在少数情况下,除法实际上应该比模数快。我有兴趣查看您从中获取此信息的来源。带有乘法器的处理器有使用乘法器的有趣的除法例程,但是您可以通过另外两个步骤(乘法和减法)从除法结果获得模数,因此它仍然具有可比性。如果处理器具有内置的除法例程,您可能会看到它也提供余数。
Still, there is a small branch of number theory devoted to Modular Arithmeticwhich requires study if you really want to understand how to optimize a modulus operation. Modular arithmatic, for instance, is very handy for generating magic squares.
尽管如此,如果您真的想了解如何优化模数运算,则数论的一个小分支专门用于模算术。例如,模块化算术对于生成幻方非常方便。
So, in that vein, here's a very low level lookat the math of modulus for an example of x, which should show you how simple it can be compared to division:
因此,在这种情况下,这里有一个非常低层次的关于 x 示例的模数数学,它应该向您展示它与除法相比是多么简单:
Maybe a better way to think about the problem is in terms of number bases and modulo arithmetic. For example, your goal is to compute DOW mod 7 where DOW is the 16-bit representation of the day of the week. You can write this as:
也许考虑这个问题的更好方法是根据数基和模算术。例如,您的目标是计算 DOW mod 7,其中 DOW 是星期几的 16 位表示。你可以这样写:
DOW = DOW_HI*256 + DOW_LO
DOW%7 = (DOW_HI*256 + DOW_LO) % 7
= ((DOW_HI*256)%7 + (DOW_LO % 7)) %7
= ((DOW_HI%7 * 256%7) + (DOW_LO%7)) %7
= ((DOW_HI%7 * 4) + (DOW_LO%7)) %7
Expressed in this manner, you can separately compute the modulo 7 result for the high and low bytes. Multiply the result for the high by 4 and add it to the low and then finally compute result modulo 7.
以这种方式表示,您可以分别计算高字节和低字节的模 7 结果。将高的结果乘以 4 并将其添加到低,然后最后计算结果模 7。
Computing the mod 7 result of an 8-bit number can be performed in a similar fashion. You can write an 8-bit number in octal like so:
可以以类似的方式计算 8 位数字的 mod 7 结果。你可以用八进制写一个 8 位数字,如下所示:
X = a*64 + b*8 + c
Where a, b, and c are 3-bit numbers.
其中 a、b 和 c 是 3 位数字。
X%7 = ((a%7)*(64%7) + (b%7)*(8%7) + c%7) % 7
= (a%7 + b%7 + c%7) % 7
= (a + b + c) % 7
since 64%7 = 8%7 = 1
自从 64%7 = 8%7 = 1
Of course, a, b, and c are
当然,a、b、c 是
c = X & 7
b = (X>>3) & 7
a = (X>>6) & 7 // (actually, a is only 2-bits).
The largest possible value for a+b+c
is 7+7+3 = 17
. So, you'll need
one more octal step. The complete (untested) C version could be
written like:
为最大可能值a+b+c
是7+7+3 = 17
。因此,您还需要一个八进制步骤。完整的(未经测试的)C 版本可以这样写:
unsigned char Mod7Byte(unsigned char X)
{
X = (X&7) + ((X>>3)&7) + (X>>6);
X = (X&7) + (X>>3);
return X==7 ? 0 : X;
}
I spent a few moments writing a PIC version. The actual implementation is slightly different than described above
我花了一些时间写了一个 PIC 版本。实际实现与上面描述的略有不同
Mod7Byte:
movwf temp1 ;
andlw 7 ;W=c
movwf temp2 ;temp2=c
rlncf temp1,F ;
swapf temp1,W ;W= a*8+b
andlw 0x1F
addwf temp2,W ;W= a*8+b+c
movwf temp2 ;temp2 is now a 6-bit number
andlw 0x38 ;get the high 3 bits == a'
xorwf temp2,F ;temp2 now has the 3 low bits == b'
rlncf WREG,F ;shift the high bits right 4
swapf WREG,F ;
addwf temp2,W ;W = a' + b'
; at this point, W is between 0 and 10
addlw -7
bc Mod7Byte_L2
Mod7Byte_L1:
addlw 7
Mod7Byte_L2:
return
Here's a liitle routine to test the algorithm
这是测试算法的小程序
clrf x
clrf count
TestLoop:
movf x,W
RCALL Mod7Byte
cpfseq count
bra fail
incf count,W
xorlw 7
skpz
xorlw 7
movwf count
incfsz x,F
bra TestLoop
passed:
Finally, for the 16-bit result (which I have not tested), you could write:
最后,对于 16 位结果(我没有测试过),你可以这样写:
uint16 Mod7Word(uint16 X)
{
return Mod7Byte(Mod7Byte(X & 0xff) + Mod7Byte(X>>8)*4);
}
Scott
斯科特
回答by Matthew Crumley
If you are calculating a number mod some power of two, you can use the bit-wise and operator. Just subtract one from the second number. For example:
如果您正在计算一个数模 2 的幂,您可以使用按位和运算符。只需从第二个数字中减去一个。例如:
x % 8 == x & 7
x % 256 == x & 255
A few caveats:
一些注意事项:
- This only worksif the second number is a power of two.
- It's only equivalent if the modulus is always positive. The C and C++ standards don't specify the sign of the modulus when the first number is negative (until C++11, which doesguarantee it will be negative, which is what most compilers were already doing). A bit-wise and gets rid of the sign bit, so it will always be positive (i.e. it's a true modulus, not a remainder). It sounds like that's what you want anyway though.
- Your compiler probably already does this when it can, so in most cases it's not worth doing it manually.
- 这仅在第二个数字是 2 的幂时才有效。
- 仅当模数始终为正时才等效。当第一个数字为负时,C 和 C++ 标准不指定模数的符号(直到 C++11,它确实保证它是负数,这是大多数编译器已经在做的事情)。按位并去掉符号位,所以它总是正的(即它是一个真正的模数,而不是余数)。听起来这就是你想要的。
- 您的编译器可能已经在可能的情况下执行此操作,因此在大多数情况下不值得手动执行此操作。
回答by itj
There is an overhead most of the time in using modulo that are not powers of 2. This is regardless of the processor as (AFAIK) even processors with modulus operators are a few cycles slower for divide as opposed to mask operations.
大多数情况下,使用不是 2 的幂的模数会产生开销。这与处理器无关,因为(AFAIK)即使具有模数运算符的处理器在除法方面比掩码操作慢几个周期。
For most cases this is not an optimisation that is worth considering, and certainly not worth calculating your own shortcut operation (especially if it still involves divide or multiply).
在大多数情况下,这不是值得考虑的优化,当然也不值得计算您自己的快捷操作(特别是如果它仍然涉及除法或乘法)。
However, one rule of thumb is to select array sizes etc. to be powers of 2.
但是,一个经验法则是选择数组大小等为 2 的幂。
so if calculating day of week, may as well use %7 regardless if setting up a circular buffer of around 100 entries... why not make it 128. You can then write % 128 and most (all) compilers will make this & 0x7F
因此,如果计算星期几,无论是否设置大约 100 个条目的循环缓冲区,都可以使用 %7 ......为什么不将其设为 128。然后您可以编写 % 128 并且大多数(所有)编译器都会将其设为 & 0x7F
回答by shmuelp
Unless you really need high performance on multiple embedded platforms, don't change how you code for performance reasons until you profile!
除非您真的需要在多个嵌入式平台上获得高性能,否则在您分析之前不要出于性能原因更改您的编码方式!
Code that's written awkwardly to optimize for performance is hard to debug and hard to maintain. Write a test case, and profile it on your target. Once you know the actual cost of modulus, then decide if the alternate solution is worth coding.
为优化性能而笨拙地编写的代码很难调试和维护。编写一个测试用例,并在您的目标上对其进行分析。一旦知道了模数的实际成本,就可以决定替代解决方案是否值得编码。
回答by hoyhoy
@Matthew is right. Try this:
@马修是对的。尝试这个:
int main() {
int i;
for(i = 0; i<=1024; i++) {
if (!(i & 0xFF)) printf("& i = %d\n", i);
if (!(i % 0x100)) printf("mod i = %d\n", i);
}
}
回答by Zeeshan
x%y == (x-(x/y)*y)
Hope this helps.
希望这可以帮助。
回答by Rob Rolnick
Do you have access to any programmable hardware on the embedded device? Like counters and such? If so, you might be able to write a hardware based mod unit, instead of using the simulated %. (I did that once in VHDL. Not sure if I still have the code though.)
您是否可以访问嵌入式设备上的任何可编程硬件?像柜台之类的?如果是这样,您也许可以编写一个基于硬件的模块单元,而不是使用模拟的 %。(我在 VHDL 中做过一次。但不确定我是否还有代码。)
Mind you, you did say that division was 5-10 times faster. Have you considered doing a division, multiplication, and subtraction to simulated the mod? (Edit: Misunderstood the original post. I did think it was odd that division was faster than mod, they are the same operation.)
请注意,您确实说过除法要快 5-10 倍。您是否考虑过进行除法、乘法和减法来模拟 mod?(编辑:误解了原帖。我确实认为除法比 mod 快很奇怪,它们是相同的操作。)
In your specific case, though, you are checking for a mod of 6. 6 = 2*3. So you could MAYBE get some small gains if you first checked if the least significant bit was a 0. Something like:
但是,在您的特定情况下,您正在检查 6 的模数。6 = 2*3。因此,如果您首先检查最低有效位是否为 0,您可能会获得一些小收益。例如:
if((!(x & 1)) && (x % 3))
{
print("Fizz\n");
}
If you do that, though, I'd recommend confirming that you get any gains, yay for profilers. And doing some commenting. I'd feel bad for the next guy who has to look at the code otherwise.
但是,如果您这样做,我建议您确认您获得了任何收益,对分析人员来说是的。并做一些评论。我会为下一个不得不查看代码的人感到难过。
回答by Vincent Robert
You should really check the embedded device you need. All the assembly language I have seen (x86, 68000) implement the modulus using a division.
你真的应该检查你需要的嵌入式设备。我见过的所有汇编语言(x86、68000)都使用除法实现模数。
Actually, the division assembly operation returns the result of the division and the remaining in two different registers.
实际上,除法汇编操作将除法的结果和剩余的结果返回到两个不同的寄存器中。
回答by Paul Tomblin
In the embedded world, the "modulus" operations you need to do are often the ones that break down nicely into bit operations that you can do with '&' and '|' and sometimes '>>'.
在嵌入式世界中,您需要执行的“模数”运算通常可以很好地分解为您可以使用 '&' 和 '|' 进行的位运算。有时是“>>”。
回答by Matt Sheppard
Not that this is necessarily better, but you could have an inner loop which always goes up to FIZZ, and an outer loop which repeats it all some certain number of times. You've then perhaps got to special case the final few steps if MAXCOUNT is not evenly divisible by FIZZ.
并不是说这一定更好,但是您可以有一个始终上升到 FIZZ 的内循环和一个重复一定次数的外循环。如果 MAXCOUNT 不能被 FIZZ 整除,那么您可能会在最后几步中遇到特殊情况。
That said, I'd suggest doing some research and performance profiling on your intended platforms to get a clear idea of the performance constraints you're under. There may be much more productive places to spend your optimisation effort.
也就是说,我建议在您想要的平台上进行一些研究和性能分析,以清楚地了解您所面临的性能限制。可能有更高效的地方可以花费您的优化工作。