C++ 确定整数是否在具有已知值集的两个整数(含)之间的最快方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17095324/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Fastest way to determine if an integer is between two integers (inclusive) with known sets of values
提问by jjxtra
Is there a faster way than x >= start && x <= end
in C or C++ to test if an integer is between two integers?
有没有比x >= start && x <= end
C 或 C++更快的方法来测试一个整数是否在两个整数之间?
UPDATE: My specific platform is iOS. This is part of a box blur function that restricts pixels to a circle in a given square.
更新:我的特定平台是 iOS。这是框模糊功能的一部分,该功能将像素限制为给定正方形中的圆形。
UPDATE: After trying the accepted answer, I got an order of magnitude speedup on the one line of code over doing it the normal x >= start && x <= end
way.
更新:在尝试接受的答案后,我在一行代码上获得了一个数量级的加速,而不是按照正常x >= start && x <= end
方式进行。
UPDATE: Here is the after and before code with assembler from XCode:
更新:这是使用 XCode 汇编器的前后代码:
NEW WAY
新方法
// diff = (end - start) + 1
#define POINT_IN_RANGE_AND_INCREMENT(p, range) ((p++ - range.start) < range.diff)
Ltmp1313:
ldr r0, [sp, #176] @ 4-byte Reload
ldr r1, [sp, #164] @ 4-byte Reload
ldr r0, [r0]
ldr r1, [r1]
sub.w r0, r9, r0
cmp r0, r1
blo LBB44_30
OLD WAY
老路
#define POINT_IN_RANGE_AND_INCREMENT(p, range) (p <= range.end && p++ >= range.start)
Ltmp1301:
ldr r1, [sp, #172] @ 4-byte Reload
ldr r1, [r1]
cmp r0, r1
bls LBB44_32
mov r6, r0
b LBB44_33
LBB44_32:
ldr r1, [sp, #188] @ 4-byte Reload
adds r6, r0, #1
Ltmp1302:
ldr r1, [r1]
cmp r0, r1
bhs LBB44_36
Pretty amazing how reducing or eliminating branching can provide such a dramatic speed up.
令人惊讶的是,减少或消除分支可以提供如此显着的加速。
回答by Jerry Coffin
There's an old trick to do this with only one comparison/branch. Whether it'll really improve speed may be open to question, and even if it does, it's probably too little to notice or care about, but when you're only starting with two comparisons, the chances of a huge improvement are pretty remote. The code looks like:
有一个古老的技巧可以只用一个比较/分支来做到这一点。它是否真的会提高速度可能是值得商榷的,即使它确实如此,也可能很少注意到或关心,但是当你只开始进行两次比较时,巨大改进的机会非常渺茫。代码如下:
// use a < for an inclusive lower bound and exclusive upper bound
// use <= for an inclusive lower bound and inclusive upper bound
// alternatively, if the upper bound is inclusive and you can pre-calculate
// upper-lower, simply add + 1 to upper-lower and use the < operator.
if ((unsigned)(number-lower) <= (upper-lower))
in_range(number);
With a typical, modern computer (i.e., anything using twos complement), the conversion to unsigned is really a nop -- just a change in how the same bits are viewed.
对于典型的现代计算机(即,任何使用二进制补码的计算机),转换为无符号实际上是一个 nop —— 只是对相同位的看法的改变。
Note that in a typical case, you can pre-compute upper-lower
outside a (presumed) loop, so that doesn't normally contribute any significant time. Along with reducing the number of branch instructions, this also (generally) improves branch prediction. In this case, the same branch is taken whether the number is below the bottom end or above the top end of the range.
请注意,在典型情况下,您可以upper-lower
在(假定的)循环之外进行预计算,因此这通常不会占用大量时间。除了减少分支指令的数量外,这也(通常)改进了分支预测。在这种情况下,无论数字是低于范围的底端还是高于范围的顶端,都会采用相同的分支。
As to how this works, the basic idea is pretty simple: a negative number, when viewed as an unsigned number, will be larger than anything that started out as a positive number.
至于它是如何工作的,基本思想非常简单:当被视为无符号数时,负数将比任何以正数开头的数字都大。
In practice this method translates number
and the interval to the point of origin and checks if number
is in the interval [0, D]
, where D = upper - lower
. If number
below lower bound: negative, and if above upper bound: larger than D
.
在实践中,此方法将number
区间转换为原点,并检查是否number
在区间内[0, D]
,其中D = upper - lower
。如果number
低于下限:negative,如果高于上限:大于D
。
回答by Andrew Prock
It depends on how many times you want to perform the test over the same data.
这取决于您要对相同数据执行多少次测试。
If you are performing the test a single time, there probably isn't a meaningful way to speed up the algorithm.
如果您只执行一次测试,则可能没有一种有意义的方法来加速算法。
If you are doing this for a very finite set of values, then you could create a lookup table. Performing the indexing might be more expensive, but if you can fit the entire table in cache, then you can remove all branching from the code, which should speed things up.
如果您对一组非常有限的值执行此操作,那么您可以创建一个查找表。执行索引可能更昂贵,但如果您可以将整个表放入缓存中,那么您可以从代码中删除所有分支,这应该会加快速度。
For your data the lookup table would be 128^3 = 2,097,152. If you can control one of the three variables so you consider all instances where start = N
at one time, then the size of the working set drops down to 128^2 = 16432
bytes, which should fit well in most modern caches.
对于您的数据,查找表将为 128^3 = 2,097,152。如果您可以控制三个变量中的一个,那么您可以同时考虑所有实例start = N
,那么工作集的大小将下降到128^2 = 16432
字节,这应该适合大多数现代缓存。
You would still have to benchmark the actual code to see if a branchless lookup table is sufficiently faster than the obvious comparisons.
您仍然需要对实际代码进行基准测试,以查看无分支查找表是否比明显的比较快得多。
回答by Ben Hymanson
It's rare to be able to do significant optimizations to code on such a small scale. Big performance gains come from observing and modifying the code from a higher level. You may be able to eliminate the need for the range test altogether, or only do O(n) of them instead of O(n^2). You may be able to re-order the tests so that one side of the inequality is always implied. Even if the algorithm is ideal, gains are more likely to come when you see how this code does the range test 10 million times and you find a way to batch them up and use SSE to do many tests in parallel.
很少能对如此小规模的代码进行重大优化。巨大的性能提升来自于从更高层次观察和修改代码。您可能能够完全消除对范围测试的需要,或者只执行 O(n) 而不是 O(n^2)。您可以对测试重新排序,以便始终隐含不等式的一侧。即使算法是理想的,当您看到此代码如何进行 1000 万次范围测试并找到一种方法将它们批量化并使用 SSE 并行执行许多测试时,更有可能获得收益。
回答by rezeli
This answer is to report on a testing done with the accepted answer. I performed a closed range test on a large vector of sorted random integer and to my surprise the basic method of ( low <= num && num <= high) is in fact faster than the accepted answer above! Test was done on HP Pavilion g6 (AMD A6-3400APU with 6GB ram. Here's the core code used for testing:
这个答案是报告用接受的答案完成的测试。我对一个大的排序随机整数向量进行了封闭范围测试,令我惊讶的是 (low <= num && num <= high) 的基本方法实际上比上面接受的答案快!测试是在 HP Pavilion g6(AMD A6-3400APU,6GB 内存)上完成的。以下是用于测试的核心代码:
int num = rand(); // num to compare in consecutive ranges.
chrono::time_point<chrono::system_clock> start, end;
auto start = chrono::system_clock::now();
int inBetween1{ 0 };
for (int i = 1; i < MaxNum; ++i)
{
if (randVec[i - 1] <= num && num <= randVec[i])
++inBetween1;
}
auto end = chrono::system_clock::now();
chrono::duration<double> elapsed_s1 = end - start;
compared with the following which is the accepted answer above:
与以下接受的答案相比:
int inBetween2{ 0 };
for (int i = 1; i < MaxNum; ++i)
{
if (static_cast<unsigned>(num - randVec[i - 1]) <= (randVec[i] - randVec[i - 1]))
++inBetween2;
}
Pay attention that randVec is a sorted vector. For any size of MaxNum the first method beats the second one on my machine!
注意 randVec 是一个排序向量。对于任何大小的 MaxNum,第一种方法在我的机器上胜过第二种方法!
回答by icedwater
Is it not possible to just perform a bitwise operation on the integer?
不能只对整数执行按位运算吗?
Since it has to be between 0 and 128, if the 8th bit is set (2^7) it is 128 or more. The edge case will be a pain, though, since you want an inclusive comparison.
由于它必须在 0 到 128 之间,如果设置了第 8 位(2^7),则它是 128 或更多。但是,边缘情况会很痛苦,因为您需要进行包容性比较。