Python 计算边界框重叠的百分比，用于图像检测器评估

Question

提问by user961627

In testing an object detection algorithm in large images, we check our detected bounding boxes against the coordinates given for the ground truth rectangles.

在大图像中测试对象检测算法时，我们根据为地面实况矩形给出的坐标检查检测到的边界框。

According to the Pascal VOC challenges, there's this:

根据 Pascal VOC 挑战，有这样的：

A predicted bounding box is considered correct if it overlaps more than 50% with a ground-truth bounding box, otherwise the bounding box is considered a false positive detection. Multiple detections are penalized. If a system predicts several bounding boxes that overlap with a single ground-truth bounding box, only one prediction is considered correct, the others are considered false positives.

如果预测的边界框与真实边界框的重叠超过 50%，则认为它是正确的，否则该边界框被认为是误报检测。多次检测会受到惩罚。如果系统预测多个边界框与单个真实边界框重叠，则只有一个预测被认为是正确的，其他的被认为是误报。

This means that we need to calculate the percentage of overlap. Does this mean that the ground truth box is 50% covered by the detected boundary box? Or that 50% of the bounding box is absorbed by the ground truth box?

这意味着我们需要计算重叠的百分比。这是否意味着地面实况框被检测到的边界框覆盖了 50%？或者边界框的 50% 被地面实况框吸收了？

I've searched but I haven't found a standard algorithm for this - which is surprising because I would have thought that this is something pretty common in computer vision. (I'm new to it). Have I missed it? Does anyone know what the standard algorithm is for this type of problem?

我已经搜索过，但我还没有找到一个标准算法——这很令人惊讶，因为我认为这在计算机视觉中很常见。（我是新手）。我错过了吗？有谁知道这类问题的标准算法是什么？

Answer 1

采纳答案by user961627

I found that the conceptual answer is here: http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2012/htmldoc/devkit_doc.html#SECTION00054000000000000000

我发现概念性答案在这里：http: //pascallin.ecs.soton.ac.uk/challenges/VOC/voc2012/htmldoc/devkit_doc.html#SECTION00054000000000000000

from this thread: Compare two bounding boxes with each other Matlab

来自此线程：将两个边界框相互比较 Matlab

I should be able to code this in python!

我应该能够在 python 中编码它！

Answer 2

回答by Stefan van der Walt

In the snippet below, I construct a polygon along the edges of the first box. I then use Matplotlib to clip the polygon to the second box. The resulting polygon contains four vertices, but we are only interested in the top left and bottom right corners, so I take the max and the min of the coordinates to get a bounding box, which is returned to the user.

在下面的片段中，我沿着第一个框的边缘构造了一个多边形。然后我使用 Matplotlib 将多边形剪切到第二个框。结果多边形包含四个顶点，但我们只对左上角和右下角感兴趣，所以我取坐标的最大值和最小值来得到一个边界框，返回给用户。

import numpy as np
from matplotlib import path, transforms

def clip_boxes(box0, box1):
    path_coords = np.array([[box0[0, 0], box0[0, 1]],
                            [box0[1, 0], box0[0, 1]],
                            [box0[1, 0], box0[1, 1]],
                            [box0[0, 0], box0[1, 1]]])

    poly = path.Path(np.vstack((path_coords[:, 0],
                                path_coords[:, 1])).T, closed=True)
    clip_rect = transforms.Bbox(box1)

    poly_clipped = poly.clip_to_bbox(clip_rect).to_polygons()[0]

    return np.array([np.min(poly_clipped, axis=0),
                     np.max(poly_clipped, axis=0)])

box0 = np.array([[0, 0], [1, 1]])
box1 = np.array([[0, 0], [0.5, 0.5]])

print clip_boxes(box0, box1)

Answer 3

回答by Martin Thoma

For axis-aligned bounding boxes it is relatively simple. "Axis-aligned" means that the bounding box isn't rotated; or in other words that the boxes lines are parallel to the axes. Here's how to calculate the IoU of two axis-aligned bounding boxes.

对于轴对齐的边界框，它相对简单。“轴对齐”表示边界框不旋转；或者换句话说，框线平行于轴。下面是如何计算两个轴对齐的边界框的 IoU。

def get_iou(bb1, bb2):
    """
    Calculate the Intersection over Union (IoU) of two bounding boxes.

    Parameters
    ----------
    bb1 : dict
        Keys: {'x1', 'x2', 'y1', 'y2'}
        The (x1, y1) position is at the top left corner,
        the (x2, y2) position is at the bottom right corner
    bb2 : dict
        Keys: {'x1', 'x2', 'y1', 'y2'}
        The (x, y) position is at the top left corner,
        the (x2, y2) position is at the bottom right corner

    Returns
    -------
    float
        in [0, 1]
    """
    assert bb1['x1'] < bb1['x2']
    assert bb1['y1'] < bb1['y2']
    assert bb2['x1'] < bb2['x2']
    assert bb2['y1'] < bb2['y2']

    # determine the coordinates of the intersection rectangle
    x_left = max(bb1['x1'], bb2['x1'])
    y_top = max(bb1['y1'], bb2['y1'])
    x_right = min(bb1['x2'], bb2['x2'])
    y_bottom = min(bb1['y2'], bb2['y2'])

    if x_right < x_left or y_bottom < y_top:
        return 0.0

    # The intersection of two axis-aligned bounding boxes is always an
    # axis-aligned bounding box
    intersection_area = (x_right - x_left) * (y_bottom - y_top)

    # compute the area of both AABBs
    bb1_area = (bb1['x2'] - bb1['x1']) * (bb1['y2'] - bb1['y1'])
    bb2_area = (bb2['x2'] - bb2['x1']) * (bb2['y2'] - bb2['y1'])

    # compute the intersection over union by taking the intersection
    # area and dividing it by the sum of prediction + ground-truth
    # areas - the interesection area
    iou = intersection_area / float(bb1_area + bb2_area - intersection_area)
    assert iou >= 0.0
    assert iou <= 1.0
    return iou

Explanation

解释

enter image description here

在此处输入图片说明

Images are from this answer

图片来自这个答案

Answer 4

回答by Jindil

For the intersection distance, shouldn't we add a +1 so as to have

对于相交距离，我们不应该添加+1以便有

intersection_area = (x_right - x_left + 1) * (y_bottom - y_top + 1)

(same for the AABB)
Like on this pyimage search post

（与 AABB 相同）
喜欢这个pyimage 搜索帖子

I agree (x_right - x_left) x (y_bottom - y_top)works in mathematics with point coordinates but since we deal with pixels it is I think different.

我同意(x_right - x_left) x (y_bottom - y_top)在数学中使用点坐标，但由于我们处理像素，所以我认为不同。

Consider a 1D example :
- 2 points : x1 = 1and x2 = 3, the distance is indeed x2-x1 = 2
- 2 pixels of index : i1 = 1and i2 = 3, the segment from pixel i1 to i2 contains 3 pixels ie l = i2 - i1 + 1

考虑一维示例：
- 2 个点：x1 = 1和x2 = 3，距离确实是x2-x1 = 2
- 索引的 2 个像素：i1 = 1和i2 = 3，从像素 i1 到 i2 的段包含 3 个像素即l = i2 - i1 + 1

Answer 5

回答by Reno Fiedler

how about this approach? Could be extended to any number of unioned shapes

这种方法怎么样？可以扩展到任意数量的联合形状

surface = np.zeros([1024,1024])
surface[1:1+10, 1:1+10] += 1
surface[100:100+500, 100:100+100] += 1
unionArea = (surface==2).sum()
print(unionArea)

Answer 6

回答by Uzzal Podder

A Simpleway

一个简单的方法

(Image is not drawn to scale)

（图像未按比例绘制）

from shapely.geometry import Polygon


def calculate_iou(box_1, box_2):
    poly_1 = Polygon(box_1)
    poly_2 = Polygon(box_2)
    iou = poly_1.intersection(poly_2).area / poly_1.union(poly_2).area
    return iou


box_1 = [[511, 41], [577, 41], [577, 76], [511, 76]]
box_2 = [[544, 59], [610, 59], [610, 94], [544, 94]]

print(calculate_iou(box_1, box_2))

The result will be 0.138211...which means 13.82%.

结果将是0.138211...这意味着13.82%。

Answer 7

回答by Mitch McMabers

The top-voted answerhas a mathematical error if you are working with screen (pixel) coordinates! I submitted an edita few weeks ago with a long explanation for all readers so that they would understand the math. But that edit wasn't understood by the reviewers and was removed, so I've submitted the same edit again, but more briefly summarized this time. (Update: Rejected 2vs1because it was deemed a "substantial change", heh).

如果您使用的是屏幕（像素）坐标，则最高投票的答案存在数学错误！几周前我提交了一个编辑，为所有读者提供了很长的解释，以便他们理解数学。但是那个编辑没有被审稿人理解并被删除，所以我再次提交了相同的编辑，但这次更简要地总结了。（更新：拒绝 2vs1因为它被认为是“实质性的变化”，呵呵）。

So I will completely explain the BIG problem with its math here in this separate answer.

所以我将在这个单独的答案中用它的数学完全解释这个大问题。

So, yes, in general, the top-voted answer is correct and is a good way to calculate the IoU. But (as other people have pointed out too) its math is completely incorrect for computer screens. You cannot just do (x2 - x1) * (y2 - y1), since that will not produce the correct area calculations whatsoever. Screen indexing starts at pixel 0,0and ends at width-1,height-1. The range of screen coordinates is inclusive:inclusive(inclusive on both ends), so a range from 0to 10in pixel coordinates is actually 11 pixels wide, because it includes 0 1 2 3 4 5 6 7 8 9 10(11 items). So, to calculate the area of screen coordinates, you MUST therefore add +1 to each dimension, as follows: (x2 - x1 + 1) * (y2 - y1 + 1).

所以，是的，一般来说，投票最多的答案是正确的，是计算 IoU 的好方法。但是（正如其他人也指出的那样）它的数学计算对于计算机屏幕来说是完全不正确的。你不能只做(x2 - x1) * (y2 - y1)，因为这不会产生正确的面积计算。屏幕索引从像素开始，0,0到结束width-1,height-1。屏幕坐标的范围是inclusive:inclusive（包括两端），所以像素坐标中从0到的范围10实际上是11个像素宽，因为它包括0 1 2 3 4 5 6 7 8 9 10（11项）。因此，计算屏幕坐标的区域，您必须因此+1按钮添加到每个维度，具体如下：(x2 - x1 + 1) * (y2 - y1 + 1)。

If you're working in some other coordinate system where the range is not inclusive (such as an inclusive:exclusivesystem where 0to 10means "elements 0-9 but not 10"), then this extra math would NOT be necessary. But most likely, you are processing pixel-based bounding boxes. Well, screen coordinates start at 0,0and go up from there.

如果你在其他一些工作所在的坐标系的范围是不包括（如inclusive:exclusive系统中0，以10手段“元素0-9，但不是10”），那么这个额外的数学就没有必要。但最有可能的是，您正在处理基于像素的边界框。好吧，屏幕坐标0,0从那里开始并从那里上升。

A 1920x1080screen is indexed from 0(first pixel) to 1919(last pixel horizontally) and from 0(first pixel) to 1079(last pixel vertically).

甲1920x1080屏幕从索引0（第一像素）到1919（最后一个像素水平地），并从0（第一像素）到1079（最后一个像素垂直地）。

So if we have a rectangle in "pixel coordinate space", to calculate its area we mustadd 1 in each direction. Otherwise, we get the wrong answer for the area calculation.

所以如果我们在“像素坐标空间”中有一个矩形，要计算它的面积，我们必须在每个方向上加 1。否则，我们会得到面积计算的错误答案。

Imagine that our 1920x1080screen has a pixel-coordinate based rectangle with left=0,top=0,right=1919,bottom=1079(covering all pixels on the whole screen).

想象一下，我们的1920x1080屏幕有一个基于像素坐标的矩形left=0,top=0,right=1919,bottom=1079（覆盖整个屏幕上的所有像素）。

Well, we know that 1920x1080pixels is 2073600pixels, which is the correct area of a 1080p screen.

好吧，我们知道1920x1080像素就是2073600像素，它是 1080p 屏幕的正确区域。

But with the wrong math area = (x_right - x_left) * (y_bottom - y_top), we would get: (1919 - 0) * (1079 - 0)= 1919 * 1079= 2070601pixels! That's wrong!

但是如果数学错误area = (x_right - x_left) * (y_bottom - y_top)，我们会得到：(1919 - 0) * (1079 - 0)= 1919 * 1079=2070601像素！那是错误的！

That is why we must add +1to each calculation, which gives us the following corrected math: area = (x_right - x_left + 1) * (y_bottom - y_top + 1), giving us: (1919 - 0 + 1) * (1079 - 0 + 1)= 1920 * 1080= 2073600pixels! And that's indeed the correct answer!

这就是为什么我们必须添加+1到每个计算，这为我们提供了以下修正数学：area = (x_right - x_left + 1) * (y_bottom - y_top + 1)，给我们：(1919 - 0 + 1) * (1079 - 0 + 1)= 1920 * 1080=2073600像素！这确实是正确的答案！

The shortest possible summary is: Pixel coordinate ranges are inclusive:inclusive, so we must add + 1to each axis if we want the true area of a pixel coordinate range.

最短的总结是：像素坐标范围是inclusive:inclusive，所以+ 1如果我们想要像素坐标范围的真实面积，我们必须添加到每个轴。

For a few more details about why +1is needed, see Jindil's answer: https://stackoverflow.com/a/51730512/8874388

有关为什么+1需要的更多详细信息，请参阅 Jindil 的回答：https://stackoverflow.com/a/51730512/8874388

As well as this pyimagesearch article: https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/

以及这篇 pyimagesearch 文章：https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/

And this GitHub comment: https://github.com/AlexeyAB/darknet/issues/3995#issuecomment-535697357

而这个 GitHub 评论：https: //github.com/AlexeyAB/darknet/issues/3995#issuecomment-535697357

Since the fixed math wasn't approved, anyone who copies the code from the top-voted answer hopefully sees this answer, and will be able to bugfix it themselves, by simply copying the bugfixed assertions and area-calculation lines below, which have been fixed for inclusive:inclusive(pixel) coordinate ranges:

由于固定数学未获批准，任何从最高投票答案中复制代码的人都希望看到此答案，并且能够通过简单地复制下面的错误修正断言和面积计算行来自行修复它固定inclusive:inclusive（像素）坐标范围：

    assert bb1['x1'] <= bb1['x2']
    assert bb1['y1'] <= bb1['y2']
    assert bb2['x1'] <= bb2['x2']
    assert bb2['y1'] <= bb2['y2']

................................................

    # The intersection of two axis-aligned bounding boxes is always an
    # axis-aligned bounding box.
    # NOTE: We MUST ALWAYS add +1 to calculate area when working in
    # screen coordinates, since 0,0 is the top left pixel, and w-1,h-1
    # is the bottom right pixel. If we DON'T add +1, the result is wrong.
    intersection_area = (x_right - x_left + 1) * (y_bottom - y_top + 1)

    # compute the area of both AABBs
    bb1_area = (bb1['x2'] - bb1['x1'] + 1) * (bb1['y2'] - bb1['y1'] + 1)
    bb2_area = (bb2['x2'] - bb2['x1'] + 1) * (bb2['y2'] - bb2['y1'] + 1)

Python 计算边界框重叠的百分比，用于图像检测器评估

提问by user961627

采纳答案by user961627

回答by Stefan van der Walt

回答by Martin Thoma

Explanation

解释

回答by Jindil

回答by Reno Fiedler

回答by Uzzal Podder

回答by Mitch McMabers

相关推荐

最近更新

标签

Python 计算边界框重叠的百分比，用于图像检测器评估

提问by user961627

采纳答案by user961627

回答by Stefan van der Walt

回答by Martin Thoma

Explanation

解释

回答by Jindil

回答by Reno Fiedler

回答by Uzzal Podder

回答by Mitch McMabers

相关推荐

Python：导入时出现语法错误

从 AJAX 或 JQuery 运行 Python 脚本

Python 将字符串添加到文件中的每一行

Python 将大型 DataFrame 输出到 CSV 文件的最快方法是什么？

相关推荐

最近更新

标签