Python 如何检测图像上的物体?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14767594/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to detect object on images?
提问by Alex
I need python solution.
我需要 python 解决方案。
I have 40-60 images (Happy Holiday set). I need to detect object on all these images.
我有 40-60 张图片(Happy Holiday set)。我需要在所有这些图像上检测对象。
I don't know object size, form, location on image, I don't have any object template. I know only one thing: this object is present in almost all images. I called it UFO.
我不知道图像上的对象大小、形式、位置,我没有任何对象模板。我只知道一件事:这个物体几乎出现在所有图像中。我叫它飞碟。
Example:




例子:




As seen in example, from image to image everything changes except UFO. After detection I need to get:
如示例所示,从图像到图像,除了 UFO 之外,一切都在变化。检测后我需要得到:
X coordinate of the top left corner
左上角的 X 坐标
Y coordinate of the top left corner
左上角的 Y 坐标
width of blue object region (i marked region on example as red rectangle)
蓝色对象区域的宽度(我将示例中的区域标记为红色矩形)
height of blue object region
蓝色物体区域的高度
采纳答案by Thorsten Kranz
When you have the image data as array, you can use built-in numpy function to do this easily and fast:
当您将图像数据作为数组时,您可以使用内置的 numpy 函数轻松快速地执行此操作:
import numpy as np
import PIL
image = PIL.Image.open("14767594_in.png")
image_data = np.asarray(image)
image_data_blue = image_data[:,:,2]
median_blue = np.median(image_data_blue)
non_empty_columns = np.where(image_data_blue.max(axis=0)>median_blue)[0]
non_empty_rows = np.where(image_data_blue.max(axis=1)>median_blue)[0]
boundingBox = (min(non_empty_rows), max(non_empty_rows), min(non_empty_columns), max(non_empty_columns))
print boundingBox
will give you, for the first image:
会给你,对于第一张图片:
(78, 156, 27, 166)
So your desired data are:
所以你想要的数据是:
- top-left corner is (x,y):
(27, 78) - width:
166 - 27 = 139 - height:
156 - 78 = 78
- 左上角是 (x,y):
(27, 78) - 宽度:
166 - 27 = 139 - 高度:
156 - 78 = 78
I chose that "every pixel with a blue-value larger than the median of all blue values" belongs to your object. I expect this to work for you; if not, try something else or provide some examples where this doesn't work.
我选择了“蓝色值大于所有蓝色值中位数的每个像素”都属于您的对象。我希望这对你有用;如果没有,请尝试其他方法或提供一些不起作用的示例。
EDITI reworked my code to be more general. As two images, with same shape-color, are not general enough (as your comment indicates) I create more samples synthetically.
编辑我重新编写了我的代码,使其更加通用。由于具有相同形状颜色的两个图像不够通用(如您的评论所示),因此我综合创建了更多样本。
def create_sample_set(mask, N=36, shape_color=[0,0,1.,1.]):
rv = np.ones((N, mask.shape[0], mask.shape[1], 4),dtype=np.float)
mask = mask.astype(bool)
for i in range(N):
for j in range(3):
current_color_layer = rv[i,:,:,j]
current_color_layer[:,:] *= np.random.random()
current_color_layer[mask] = np.ones((mask.sum())) * shape_color[j]
return rv
Here, the color of the shape is adjustable. For each of the N=26 images, a random background color is chosen. It would also be possible to put noise in the background, this wouldn't change the result.
在这里,形状的颜色是可调的。对于 N=26 图像中的每一个,随机选择背景颜色。也可以在背景中加入噪音,这不会改变结果。
Then, I read your sample image, create a shape-mask from it and use it to create sample images. I plot them on a grid.
然后,我读取您的示例图像,从中创建一个形状蒙版并使用它来创建示例图像。我将它们绘制在网格上。
# create set of sample image and plot them
image = PIL.Image.open("14767594_in.png")
image_data = np.asarray(image)
image_data_blue = image_data[:,:,2]
median_blue = np.median(image_data_blue)
sample_images = create_sample_set(image_data_blue>median_blue)
plt.figure(1)
for i in range(36):
plt.subplot(6,6,i+1)
plt.imshow(sample_images[i,...])
plt.axis("off")
plt.subplots_adjust(0,0,1,1,0,0)


For another value of shape_color(parameter to create_sample_set(...)), this might look like:
对于shape_color(parameter to create_sample_set(...)) 的另一个值,这可能如下所示:


Next, I'll determine the per-pixel variability usind the standard deviation. As you told, the object is on (almost) all images at the same position. So the variabiliy in these images will be low, while for the other pixels, it will be significantly higher.
接下来,我将使用标准偏差确定每个像素的可变性。正如您所说,该对象(几乎)位于同一位置的所有图像上。所以这些图像的可变性会很低,而对于其他像素,它会明显更高。
# determine per-pixel variablility, std() over all images
variability = sample_images.std(axis=0).sum(axis=2)
# show image of these variabilities
plt.figure(2)
plt.imshow(variability, cmap=plt.cm.gray, interpolation="nearest", origin="lower")
Finally, like in my first code snippet, determine the bounding box. Now I also provide a plot of it.
最后,就像在我的第一个代码片段中一样,确定边界框。现在我也提供了一个情节。
# determine bounding box
mean_variability = variability.mean()
non_empty_columns = np.where(variability.min(axis=0)<mean_variability)[0]
non_empty_rows = np.where(variability.min(axis=1)<mean_variability)[0]
boundingBox = (min(non_empty_rows), max(non_empty_rows), min(non_empty_columns), max(non_empty_columns))
# plot and print boundingBox
bb = boundingBox
plt.plot([bb[2], bb[3], bb[3], bb[2], bb[2]],
[bb[0], bb[0],bb[1], bb[1], bb[0]],
"r-")
plt.xlim(0,variability.shape[1])
plt.ylim(variability.shape[0],0)
print boundingBox
plt.show()


That's it. I hope it is general enough this time.
就是这样。我希望这次足够通用。
Complete script for copy and paste:
用于复制和粘贴的完整脚本:
import numpy as np
import PIL
import matplotlib.pyplot as plt
def create_sample_set(mask, N=36, shape_color=[0,0,1.,1.]):
rv = np.ones((N, mask.shape[0], mask.shape[1], 4),dtype=np.float)
mask = mask.astype(bool)
for i in range(N):
for j in range(3):
current_color_layer = rv[i,:,:,j]
current_color_layer[:,:] *= np.random.random()
current_color_layer[mask] = np.ones((mask.sum())) * shape_color[j]
return rv
# create set of sample image and plot them
image = PIL.Image.open("14767594_in.png")
image_data = np.asarray(image)
image_data_blue = image_data[:,:,2]
median_blue = np.median(image_data_blue)
sample_images = create_sample_set(image_data_blue>median_blue)
plt.figure(1)
for i in range(36):
plt.subplot(6,6,i+1)
plt.imshow(sample_images[i,...])
plt.axis("off")
plt.subplots_adjust(0,0,1,1,0,0)
# determine per-pixel variablility, std() over all images
variability = sample_images.std(axis=0).sum(axis=2)
# show image of these variabilities
plt.figure(2)
plt.imshow(variability, cmap=plt.cm.gray, interpolation="nearest", origin="lower")
# determine bounding box
mean_variability = variability.mean()
non_empty_columns = np.where(variability.min(axis=0)<mean_variability)[0]
non_empty_rows = np.where(variability.min(axis=1)<mean_variability)[0]
boundingBox = (min(non_empty_rows), max(non_empty_rows), min(non_empty_columns), max(non_empty_columns))
# plot and print boundingBox
bb = boundingBox
plt.plot([bb[2], bb[3], bb[3], bb[2], bb[2]],
[bb[0], bb[0],bb[1], bb[1], bb[0]],
"r-")
plt.xlim(0,variability.shape[1])
plt.ylim(variability.shape[0],0)
print boundingBox
plt.show()
回答by Thorsten Kranz
I create a second answer instead of extending my first answer even more. I use the same approach, but on your new examples. The only difference is: I use a set of fixed thresholds instead of determining it automatically. If you can play around with it, this should suffice.
我创建了第二个答案,而不是进一步扩展我的第一个答案。我使用相同的方法,但在您的新示例上。唯一的区别是:我使用一组固定的阈值而不是自动确定它。如果你可以玩它,这应该就足够了。
import numpy as np
import PIL
import matplotlib.pyplot as plt
import glob
filenames = glob.glob("14767594/*.jpg")
images = [np.asarray(PIL.Image.open(fn)) for fn in filenames]
sample_images = np.concatenate([image.reshape(1,image.shape[0], image.shape[1],image.shape[2])
for image in images], axis=0)
plt.figure(1)
for i in range(sample_images.shape[0]):
plt.subplot(2,2,i+1)
plt.imshow(sample_images[i,...])
plt.axis("off")
plt.subplots_adjust(0,0,1,1,0,0)
# determine per-pixel variablility, std() over all images
variability = sample_images.std(axis=0).sum(axis=2)
# show image of these variabilities
plt.figure(2)
plt.imshow(variability, cmap=plt.cm.gray, interpolation="nearest", origin="lower")
# determine bounding box
thresholds = [5,10,20]
colors = ["r","b","g"]
for threshold, color in zip(thresholds, colors): #variability.mean()
non_empty_columns = np.where(variability.min(axis=0)<threshold)[0]
non_empty_rows = np.where(variability.min(axis=1)<threshold)[0]
boundingBox = (min(non_empty_rows), max(non_empty_rows), min(non_empty_columns), max(non_empty_columns))
# plot and print boundingBox
bb = boundingBox
plt.plot([bb[2], bb[3], bb[3], bb[2], bb[2]],
[bb[0], bb[0],bb[1], bb[1], bb[0]],
"%s-"%![enter image description here][1]color,
label="threshold %s" % threshold)
print boundingBox
plt.xlim(0,variability.shape[1])
plt.ylim(variability.shape[0],0)
plt.legend()
plt.show()
Produced plots:
产生的地块:




Your requirements are closely related to ERPin cognitive neuroscience. The more input images you have, the better this approach will work as the signal-to-noise ratio increases.
您的需求与认知神经科学中的ERP密切相关。您拥有的输入图像越多,随着信噪比的增加,这种方法的效果就越好。

