Python 如何检测图像上的物体？

Question

提问by Alex

I need python solution.

我需要 python 解决方案。

I have 40-60 images (Happy Holiday set). I need to detect object on all these images.

我有 40-60 张图片（Happy Holiday set）。我需要在所有这些图像上检测对象。

I don't know object size, form, location on image, I don't have any object template. I know only one thing: this object is present in almost all images. I called it UFO.

我不知道图像上的对象大小、形式、位置，我没有任何对象模板。我只知道一件事：这个物体几乎出现在所有图像中。我叫它飞碟。

Example: enter image description here

例子：在此处输入图片说明

As seen in example, from image to image everything changes except UFO. After detection I need to get:

如示例所示，从图像到图像，除了 UFO 之外，一切都在变化。检测后我需要得到：

X coordinate of the top left corner

左上角的 X 坐标

Y coordinate of the top left corner

左上角的 Y 坐标

width of blue object region (i marked region on example as red rectangle)

蓝色对象区域的宽度（我将示例中的区域标记为红色矩形）

height of blue object region

蓝色物体区域的高度

Answer 1

采纳答案by Thorsten Kranz

When you have the image data as array, you can use built-in numpy function to do this easily and fast:

当您将图像数据作为数组时，您可以使用内置的 numpy 函数轻松快速地执行此操作：

import numpy as np
import PIL

image = PIL.Image.open("14767594_in.png")

image_data = np.asarray(image)
image_data_blue = image_data[:,:,2]

median_blue = np.median(image_data_blue)

non_empty_columns = np.where(image_data_blue.max(axis=0)>median_blue)[0]
non_empty_rows = np.where(image_data_blue.max(axis=1)>median_blue)[0]

boundingBox = (min(non_empty_rows), max(non_empty_rows), min(non_empty_columns), max(non_empty_columns))

print boundingBox

will give you, for the first image:

会给你，对于第一张图片：

(78, 156, 27, 166)

So your desired data are:

所以你想要的数据是：

top-left corner is (x,y): (27, 78)
width: 166 - 27 = 139
height: 156 - 78 = 78

左上角是 (x,y)： (27, 78)
宽度： 166 - 27 = 139
高度： 156 - 78 = 78

I chose that "every pixel with a blue-value larger than the median of all blue values" belongs to your object. I expect this to work for you; if not, try something else or provide some examples where this doesn't work.

我选择了“蓝色值大于所有蓝色值中位数的每个像素”都属于您的对象。我希望这对你有用；如果没有，请尝试其他方法或提供一些不起作用的示例。

EDITI reworked my code to be more general. As two images, with same shape-color, are not general enough (as your comment indicates) I create more samples synthetically.

编辑我重新编写了我的代码，使其更加通用。由于具有相同形状颜色的两个图像不够通用（如您的评论所示），因此我综合创建了更多样本。

def create_sample_set(mask, N=36, shape_color=[0,0,1.,1.]):
    rv = np.ones((N, mask.shape[0], mask.shape[1], 4),dtype=np.float)
    mask = mask.astype(bool)
    for i in range(N):
        for j in range(3):
            current_color_layer = rv[i,:,:,j]
            current_color_layer[:,:] *= np.random.random()
            current_color_layer[mask] = np.ones((mask.sum())) * shape_color[j]
    return rv

Here, the color of the shape is adjustable. For each of the N=26 images, a random background color is chosen. It would also be possible to put noise in the background, this wouldn't change the result.

在这里，形状的颜色是可调的。对于 N=26 图像中的每一个，随机选择背景颜色。也可以在背景中加入噪音，这不会改变结果。

Then, I read your sample image, create a shape-mask from it and use it to create sample images. I plot them on a grid.

然后，我读取您的示例图像，从中创建一个形状蒙版并使用它来创建示例图像。我将它们绘制在网格上。

# create set of sample image and plot them
image = PIL.Image.open("14767594_in.png")
image_data = np.asarray(image)
image_data_blue = image_data[:,:,2]
median_blue = np.median(image_data_blue)
sample_images = create_sample_set(image_data_blue>median_blue)
plt.figure(1)
for i in range(36):
    plt.subplot(6,6,i+1)
    plt.imshow(sample_images[i,...])
    plt.axis("off")
plt.subplots_adjust(0,0,1,1,0,0)

Blue shapes

蓝色形状

For another value of shape_color(parameter to create_sample_set(...)), this might look like:

对于shape_color(parameter to create_sample_set(...)) 的另一个值，这可能如下所示：

Green shapes

绿色形状

Next, I'll determine the per-pixel variability usind the standard deviation. As you told, the object is on (almost) all images at the same position. So the variabiliy in these images will be low, while for the other pixels, it will be significantly higher.

接下来，我将使用标准偏差确定每个像素的可变性。正如您所说，该对象（几乎）位于同一位置的所有图像上。所以这些图像的可变性会很低，而对于其他像素，它会明显更高。

# determine per-pixel variablility, std() over all images
variability = sample_images.std(axis=0).sum(axis=2)

# show image of these variabilities
plt.figure(2)
plt.imshow(variability, cmap=plt.cm.gray, interpolation="nearest", origin="lower")

Finally, like in my first code snippet, determine the bounding box. Now I also provide a plot of it.

最后，就像在我的第一个代码片段中一样，确定边界框。现在我也提供了一个情节。

# determine bounding box
mean_variability = variability.mean()
non_empty_columns = np.where(variability.min(axis=0)<mean_variability)[0]
non_empty_rows = np.where(variability.min(axis=1)<mean_variability)[0]
boundingBox = (min(non_empty_rows), max(non_empty_rows), min(non_empty_columns), max(non_empty_columns))

# plot and print boundingBox
bb = boundingBox
plt.plot([bb[2], bb[3], bb[3], bb[2], bb[2]],
         [bb[0], bb[0],bb[1], bb[1], bb[0]],
         "r-")
plt.xlim(0,variability.shape[1])
plt.ylim(variability.shape[0],0)

print boundingBox
plt.show()

BoundingBox and extracted shape

BoundingBox 和提取的形状

That's it. I hope it is general enough this time.

就是这样。我希望这次足够通用。

Complete script for copy and paste:

用于复制和粘贴的完整脚本：

import numpy as np
import PIL
import matplotlib.pyplot as plt


def create_sample_set(mask, N=36, shape_color=[0,0,1.,1.]):
    rv = np.ones((N, mask.shape[0], mask.shape[1], 4),dtype=np.float)
    mask = mask.astype(bool)
    for i in range(N):
        for j in range(3):
            current_color_layer = rv[i,:,:,j]
            current_color_layer[:,:] *= np.random.random()
            current_color_layer[mask] = np.ones((mask.sum())) * shape_color[j]
    return rv

# create set of sample image and plot them
image = PIL.Image.open("14767594_in.png")
image_data = np.asarray(image)
image_data_blue = image_data[:,:,2]
median_blue = np.median(image_data_blue)
sample_images = create_sample_set(image_data_blue>median_blue)
plt.figure(1)
for i in range(36):
    plt.subplot(6,6,i+1)
    plt.imshow(sample_images[i,...])
    plt.axis("off")
plt.subplots_adjust(0,0,1,1,0,0)

# determine per-pixel variablility, std() over all images
variability = sample_images.std(axis=0).sum(axis=2)

# show image of these variabilities
plt.figure(2)
plt.imshow(variability, cmap=plt.cm.gray, interpolation="nearest", origin="lower")

# determine bounding box
mean_variability = variability.mean()
non_empty_columns = np.where(variability.min(axis=0)<mean_variability)[0]
non_empty_rows = np.where(variability.min(axis=1)<mean_variability)[0]
boundingBox = (min(non_empty_rows), max(non_empty_rows), min(non_empty_columns), max(non_empty_columns))

# plot and print boundingBox
bb = boundingBox
plt.plot([bb[2], bb[3], bb[3], bb[2], bb[2]],
         [bb[0], bb[0],bb[1], bb[1], bb[0]],
         "r-")
plt.xlim(0,variability.shape[1])
plt.ylim(variability.shape[0],0)

print boundingBox
plt.show()

Answer 2

回答by Thorsten Kranz

I create a second answer instead of extending my first answer even more. I use the same approach, but on your new examples. The only difference is: I use a set of fixed thresholds instead of determining it automatically. If you can play around with it, this should suffice.

我创建了第二个答案，而不是进一步扩展我的第一个答案。我使用相同的方法，但在您的新示例上。唯一的区别是：我使用一组固定的阈值而不是自动确定它。如果你可以玩它，这应该就足够了。

import numpy as np
import PIL
import matplotlib.pyplot as plt
import glob

filenames = glob.glob("14767594/*.jpg")
images = [np.asarray(PIL.Image.open(fn)) for fn in filenames]
sample_images = np.concatenate([image.reshape(1,image.shape[0], image.shape[1],image.shape[2]) 
                            for image in images], axis=0)

plt.figure(1)
for i in range(sample_images.shape[0]):
    plt.subplot(2,2,i+1)
    plt.imshow(sample_images[i,...])
    plt.axis("off")
plt.subplots_adjust(0,0,1,1,0,0)

# determine per-pixel variablility, std() over all images
variability = sample_images.std(axis=0).sum(axis=2)

# show image of these variabilities
plt.figure(2)
plt.imshow(variability, cmap=plt.cm.gray, interpolation="nearest", origin="lower")

# determine bounding box
thresholds = [5,10,20]
colors = ["r","b","g"]
for threshold, color in zip(thresholds, colors): #variability.mean()
    non_empty_columns = np.where(variability.min(axis=0)<threshold)[0]
    non_empty_rows = np.where(variability.min(axis=1)<threshold)[0]
    boundingBox = (min(non_empty_rows), max(non_empty_rows), min(non_empty_columns), max(non_empty_columns))

    # plot and print boundingBox
    bb = boundingBox
    plt.plot([bb[2], bb[3], bb[3], bb[2], bb[2]],
             [bb[0], bb[0],bb[1], bb[1], bb[0]],
             "%s-"%![enter image description here][1]color, 
             label="threshold %s" % threshold)
    print boundingBox

plt.xlim(0,variability.shape[1])
plt.ylim(variability.shape[0],0)
plt.legend()

plt.show()

Produced plots:

产生的地块：

Input images Outputs

输入图像

Your requirements are closely related to ERPin cognitive neuroscience. The more input images you have, the better this approach will work as the signal-to-noise ratio increases.

您的需求与认知神经科学中的ERP密切相关。您拥有的输入图像越多，随着信噪比的增加，这种方法的效果就越好。

Python 如何检测图像上的物体？

提问by Alex

采纳答案by Thorsten Kranz

回答by Thorsten Kranz

相关推荐

最近更新

标签

Python 如何检测图像上的物体？

提问by Alex

采纳答案by Thorsten Kranz

回答by Thorsten Kranz

相关推荐

Python 把字典分成两半？

如何使用 yum python API 列出、添加和删除存储库？

python 2.7 string.join() 与 unicode

Python 如何在不停止程序的情况下打印完整的回溯？

相关推荐

最近更新

标签