java 查找 int 数组是否包含数字的最快方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7152145/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Fastest way to find if int array contains a number
提问by Kyle Emmerich
This is an odd question. I have an integer array in Java, where each int represents a color. They will either be 0xFFFFFFFF or 0x0. What would be the FASTEST way to find if this array contains ANY values equal to 0xFFFFFFFF?
这是一个奇怪的问题。我在 Java 中有一个整数数组,其中每个 int 代表一种颜色。它们将是 0xFFFFFFFF 或 0x0。查找此数组是否包含任何等于 0xFFFFFFFF 的值的最快方法是什么?
This is my current code:
这是我当前的代码:
int length = w * h;
for (int i = 0; i < length; i++) {
if (pixels[i] == 0xFFFFFFFF) {
return true;
}
}
I have no clue if there is a faster way to do this or not. I imagine you vets could have a trick or two though.
我不知道是否有更快的方法来做到这一点。我想你们的兽医可以有一两个技巧。
EDIT: Seeing as it is just a dumb array of pixels from Bitmap.getPixels(), there's no way it would be sorted or transformed to another storage structure. Thanks for the input, everyone, it seems like looping through is the best way in this case.
编辑:看到它只是来自 Bitmap.getPixels() 的一个愚蠢的像素数组,它无法被排序或转换为另一个存储结构。感谢大家的输入,在这种情况下,循环似乎是最好的方法。
回答by Paul
No, there is no faster way unless the array of integers is already sorted, which I doubt given it's an array of colours.
不,除非整数数组已经排序,否则没有更快的方法,我怀疑这是一个颜色数组。
To scan through an unsorted array takes linear time "O(n)". That's what you do, and you exit the method as soon as a match is found which is good too.
扫描未排序的数组需要线性时间“O(n)”。这就是您所做的,一旦找到匹配项,您就退出该方法,这也很好。
回答by templatetypedef
Without switching to some other data structure, no, there is no better way to find whether the array contains that value. You have to look at all the array elements to see if it's there, since if you don't check some particular location you might miss the one copy of that pixel color.
不切换到其他数据结构,不,没有更好的方法来查找数组是否包含该值。您必须查看所有数组元素以查看它是否存在,因为如果您不检查某个特定位置,您可能会错过该像素颜色的一个副本。
That said, there are alternative ways that you could solve this problem. Here are a few thoughts on how to speed this up:
也就是说,有其他方法可以解决这个问题。以下是有关如何加快速度的一些想法:
If every value is guaranteed to be either white or black, you could store two extra boolean values alongside the array representing whether there are white or black pixels. That way, once you've run the scan once, you could just read the booleans back. You could also store a count of the number of white and black pixels along with the array, and then whenever you write a pixel update the count by decrementing the number of pixels of the original color and incrementing the number of pixels of the new color. This would then give you the ability to check if a pixel of a given color exists in O(1) by just seeing if the correct counter is nonzero.
Alternatively, if you happen to know something about the image (perhaps where the white and black pixels ought to be), you could consider doing the iteration in a different order. For example, if the pixels you're looking for tend to be clustered in the center of the image, rewriting the loop to check there first might be a good idea since if there are any pixels of that type you'll find them more rapidly. This still has the same worst-case behavior, but for "realistic" images might be much faster.
If you have multiple threads available and the array is really huge (millions of elements), you could consider having multiple threads each search a part of the array for the value. This would only be feasible if you had a reason to suspect that most of the image was not white.
Since in most realistic images you might assume that the image is a mixture of colors and you're just looking for something of one color, then you might want to consider storing the image as a sparse array, where you store a list of the pixels that happen to be of one color (say, white) and then assume everything else is black. If you expect most images to be a solid color with a few outliers, this might be a very good representation. Additionally, it would give you constant-time lookup of whether any black or white pixels exist - just check if the list of set pixels is empty or consists of the entire image.
If the order doesn't matter, you could also store the elements in some container like a hash table, which could give you O(1) lookup of whether or not the element is there. You could also sort the array and then just check the endpoints.
As a microoptimization, you could consider always appending to the real image two values - one white pixel and one black pixel - so that you could always iterate until you find the value. This eliminates one of the comparisons from the loop (the check to see if you're in-bounds) and is recommended by some authorsfor very large arrays.
If you assume that most images are a nice mixture of white and black and are okay with getting the wrong answer a small fraction of the time, you could consider probing a few random locations and checking if any of them are the right color. If so, then clearly a pixel of the correct color exists and you're done. Otherwise, run the full linear scan. For images that are a nice blend of colors, this could save you an enormous amount of time, since you could probe some small number of locations (say, O(log n) of them) and end up avoiding a huge linear scan in many cases. This is exponentially faster than before.
If every value is either white or black, you could also consider storing the image in a bitvector. This would compress the size of the array by a factor of the machine word size (probably between 32-128x compression) You could then iterate across the compressed array and see if any value is not identically equal to 0 to see if any of the pixels are white. This also saves a huge amount of space, and I'd actually suggest doing this since it makes a lot of other operations easy as well.
如果每个值都保证是白色或黑色,您可以在数组旁边存储两个额外的布尔值,表示是否有白色或黑色像素。这样,一旦你运行了一次扫描,你就可以读回布尔值。您还可以将白色和黑色像素的数量与数组一起存储,然后每当您写入像素时,通过减少原始颜色的像素数量并增加新颜色的像素数量来更新计数。这将使您能够通过查看正确的计数器是否非零来检查给定颜色的像素是否存在于 O(1) 中。
或者,如果您碰巧知道有关图像的某些信息(也许白色和黑色像素应该在何处),您可以考虑以不同的顺序进行迭代。例如,如果您要查找的像素倾向于聚集在图像的中心,则重写循环以首先检查那里可能是一个好主意,因为如果有该类型的任何像素,您会更快地找到它们. 这仍然具有相同的最坏情况行为,但对于“真实”图像可能要快得多。
如果您有多个可用线程并且数组确实很大(数百万个元素),您可以考虑让多个线程分别在数组的一部分中搜索值。只有当您有理由怀疑大部分图像不是白色时,这才可行。
由于在大多数逼真的图像中,您可能会假设图像是颜色的混合,而您只是在寻找一种颜色的东西,那么您可能需要考虑将图像存储为稀疏数组,在其中存储像素列表碰巧是一种颜色(例如白色),然后假设其他所有颜色都是黑色。如果您希望大多数图像是带有一些异常值的纯色,这可能是一个很好的表示。此外,它还可以让您恒定时间查找是否存在任何黑色或白色像素 - 只需检查设置像素列表是否为空或由整个图像组成。
如果顺序无关紧要,您还可以将元素存储在一些容器中,如哈希表,这可以为您提供 O(1) 查找元素是否存在。您也可以对数组进行排序,然后只检查端点。
作为一种微优化,您可以考虑始终将两个值附加到真实图像上 - 一个白色像素和一个黑色像素 - 这样您就可以始终迭代,直到找到该值。这消除了循环中的比较之一(检查您是否在边界内),并且一些作者推荐用于非常大的数组。
如果您假设大多数图像都是白色和黑色的完美混合,并且在一小部分时间得到错误答案也没关系,那么您可以考虑探测一些随机位置并检查它们中的任何一个是否是正确的颜色。如果是这样,那么显然存在正确颜色的像素,您就完成了。否则,运行完整的线性扫描。对于颜色很好混合的图像,这可以为您节省大量时间,因为您可以探测少量位置(例如,它们的 O(log n))并最终避免在许多位置进行巨大的线性扫描案件。这比以前快了几倍。
如果每个值都是白色或黑色,您还可以考虑将图像存储在bitvector 中。这会将数组的大小压缩为机器字大小的一个因子(可能在 32-128x 之间压缩)然后您可以遍历压缩的数组并查看是否有任何值不完全等于 0 以查看是否有任何像素是白色的。这也节省了大量空间,我实际上建议这样做,因为它也使许多其他操作变得容易。
Hope this helps!
希望这可以帮助!
回答by Ernest Friedman-Hill
It doesn't matter at the bytecode level, but at the native-code level,
在字节码级别无关紧要,但在本机代码级别,
if (pixels[i] != 0)
is likely to be a bit faster, given that you're sure only these two values can appear.
可能会快一点,因为您确定只能出现这两个值。
回答by Connor Doyle
If your array is really big, it might be worth it to divide and conquer. That is, assign segments of the data to multiple threads (probably t
threads where t
is the number of available processor cores). With a sufficiently large data set, the parallelism may amortize the thread startup cost.
如果你的数组真的很大,那么分而治之可能是值得的。也就是说,将数据段分配给多个线程(可能是可用处理器内核数的t
线程t
)。对于足够大的数据集,并行性可能会分摊线程启动成本。
回答by Zoran Horvat
Here is the simple optimization that helps on large arrays: put the requested value at the end of the array and thus eliminate array bounds check. (templatetypedef has already mentioned this optimization.) This solution saves 25% of loop running time and it is good for large arrays:
这是对大型数组有帮助的简单优化:将请求的值放在数组的末尾,从而消除数组边界检查。(templatetypedef 已经提到了这个优化。)这个解决方案节省了 25% 的循环运行时间,对于大数组很有好处:
tmp = a[n - 1]
a[n - 1] = 0xFFFFFFFF
pos = 0
while a[pos] != 0xFFFFFFFF
pos = pos + 1
a[n - 1] = tmp
if a[pos] = 0xFFFFFFFF then
return pos
return -1
There is the C# implementation with running time analysis on thisaddress.
在这个地址上有带有运行时间分析的 C# 实现。
回答by Ronnie
The only scope for improving the performance is the comparison. I feel bitwise operator would be a bit faster than the conditional operator.
You could do this
提高性能的唯一范围是比较。我觉得按位运算符会比条件运算符快一点。
你可以这样做
int length = w * h;
for (int i = 0; i < length; i++) {
if (pixels[i] & 0xFFFFFFFF) {
return true;
}
}
回答by Alesqui
Can't you check when you insert the color into the array? If so, you could store the index of the array's element which contains the 0xFFFFFFFF color. Since you want "ANY" entry that has such value, this should do the trick :D
将颜色插入数组时不能检查吗?如果是这样,您可以存储包含 0xFFFFFFFF 颜色的数组元素的索引。由于您想要具有此类价值的“任何”条目,这应该可以解决问题:D
If not, your answer has the complexity of O(n) which is the best it could be, since the array isn't (and cannot be, as you say) ordered.
如果不是,您的答案的复杂度为 O(n),这是最好的,因为数组不是(也不能像您说的那样)有序。
回答by Diego Duarte
Arrays.asList(...).contains(...)
回答by ratchet freak
using the build-in foreach is a tad faster than the indexed for as id eliminates a bound check
使用内置的 foreach 比索引的 foreach 快一点,因为 id 消除了边界检查
for(int pix:pixels){
if(pix!=0)
return true;
}