Python numpy 数组的边界框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31400769/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 09:56:54  来源:igfitidea点击:

bounding box of numpy array

pythonarraysnumpytransformation

提问by a.smiet

Suppose you have a 2D numpy array with some random values and surrounding zeros.

假设您有一个 2D numpy 数组,其中包含一些随机值和周围的零。

Example "tilted rectangle":

示例“倾斜的矩形”:

import numpy as np
from skimage import transform

img1 = np.zeros((100,100))
img1[25:75,25:75] = 1.
img2 = transform.rotate(img1, 45)

Now I want to find the smallest bounding rectangle for all the nonzero data. For example:

现在我想为所有非零数据找到最小的边界矩形。例如:

a = np.where(img2 != 0)
bbox = img2[np.min(a[0]):np.max(a[0])+1, np.min(a[1]):np.max(a[1])+1]

What would be the fastestway to achieve this result? I am sure there is a better way since the np.where function takes quite a time if I am e.g. using 1000x1000 data sets.

实现这一结果的最快方法是什么?我确信有更好的方法,因为如果我使用 1000x1000 数据集,np.where 函数需要相当长的时间。

Edit: Should also work in 3D...

编辑:也应该在 3D 中工作......

采纳答案by ali_m

You can roughly halve the execution time by using np.anyto reduce the rows and columns that contain non-zero values to 1D vectors, rather than finding the indices of all non-zero values using np.where:

通过使用np.any将包含非零值的行和列减少为一维向量,而不是使用以下方法查找所有非零值的索引,您可以大致将执行时间减半np.where

def bbox1(img):
    a = np.where(img != 0)
    bbox = np.min(a[0]), np.max(a[0]), np.min(a[1]), np.max(a[1])
    return bbox

def bbox2(img):
    rows = np.any(img, axis=1)
    cols = np.any(img, axis=0)
    rmin, rmax = np.where(rows)[0][[0, -1]]
    cmin, cmax = np.where(cols)[0][[0, -1]]

    return rmin, rmax, cmin, cmax

Some benchmarks:

一些基准:

%timeit bbox1(img2)
10000 loops, best of 3: 63.5 μs per loop

%timeit bbox2(img2)
10000 loops, best of 3: 37.1 μs per loop

Extending this approach to the 3D case just involves performing the reduction along each pair of axes:

将此方法扩展到 3D 案例只涉及沿每对轴执行缩减:

def bbox2_3D(img):

    r = np.any(img, axis=(1, 2))
    c = np.any(img, axis=(0, 2))
    z = np.any(img, axis=(0, 1))

    rmin, rmax = np.where(r)[0][[0, -1]]
    cmin, cmax = np.where(c)[0][[0, -1]]
    zmin, zmax = np.where(z)[0][[0, -1]]

    return rmin, rmax, cmin, cmax, zmin, zmax

It's easy to generalize this to Ndimensions by using itertools.combinationsto iterate over each unique combination of axes to perform the reduction over:

通过使用迭代每个唯一的轴组合来执行缩减,很容易将其推广到Nitertools.combinations

import itertools

def bbox2_ND(img):
    N = img.ndim
    out = []
    for ax in itertools.combinations(reversed(range(N)), N - 1):
        nonzero = np.any(img, axis=ax)
        out.extend(np.where(nonzero)[0][[0, -1]])
    return tuple(out)


If you know the coordinates of the corners of the original bounding box, the angle of rotation, and the centre of rotation, you could get the coordinates of the transformed bounding box corners directly by computing the corresponding affine transformation matrixand dotting it with the input coordinates:

如果知道原始边界框的角坐标、旋转角度和旋转中心,则可以直接通过计算相应的仿射变换矩阵并用输入点点来得到变换后的边界框角的坐标坐标:

def bbox_rotate(bbox_in, angle, centre):

    rmin, rmax, cmin, cmax = bbox_in

    # bounding box corners in homogeneous coordinates
    xyz_in = np.array(([[cmin, cmin, cmax, cmax],
                        [rmin, rmax, rmin, rmax],
                        [   1,    1,    1,    1]]))

    # translate centre to origin
    cr, cc = centre
    cent2ori = np.eye(3)
    cent2ori[:2, 2] = -cr, -cc

    # rotate about the origin
    theta = np.deg2rad(angle)
    rmat = np.eye(3)
    rmat[:2, :2] = np.array([[ np.cos(theta),-np.sin(theta)],
                             [ np.sin(theta), np.cos(theta)]])

    # translate from origin back to centre
    ori2cent = np.eye(3)
    ori2cent[:2, 2] = cr, cc

    # combine transformations (rightmost matrix is applied first)
    xyz_out = ori2cent.dot(rmat).dot(cent2ori).dot(xyz_in)

    r, c = xyz_out[:2]

    rmin = int(r.min())
    rmax = int(r.max())
    cmin = int(c.min())
    cmax = int(c.max())

    return rmin, rmax, cmin, cmax

This works out to be very slightly faster than using np.anyfor your small example array:

这比np.any用于您的小示例数组要快得多:

%timeit bbox_rotate([25, 75, 25, 75], 45, (50, 50))
10000 loops, best of 3: 33 μs per loop

However, since the speed of this method is independent of the size of the input array, it can be quite a lot faster for larger arrays.

但是,由于此方法的速度与输入数组的大小无关,因此对于较大的数组,速度可能会快很多。

Extending the transformation approach to 3D is slightly more complicated, in that the rotation now has three different components (one about the x-axis, one about the y-axis and one about the z-axis), but the basic method is the same:

将变换方法扩展到 3D 稍微复杂一点,因为旋转现在有三个不同的分量(一个绕 x 轴,一个绕 y 轴,一个绕 z 轴),但基本方法是相同的:

def bbox_rotate_3d(bbox_in, angle_x, angle_y, angle_z, centre):

    rmin, rmax, cmin, cmax, zmin, zmax = bbox_in

    # bounding box corners in homogeneous coordinates
    xyzu_in = np.array(([[cmin, cmin, cmin, cmin, cmax, cmax, cmax, cmax],
                         [rmin, rmin, rmax, rmax, rmin, rmin, rmax, rmax],
                         [zmin, zmax, zmin, zmax, zmin, zmax, zmin, zmax],
                         [   1,    1,    1,    1,    1,    1,    1,    1]]))

    # translate centre to origin
    cr, cc, cz = centre
    cent2ori = np.eye(4)
    cent2ori[:3, 3] = -cr, -cc -cz

    # rotation about the x-axis
    theta = np.deg2rad(angle_x)
    rmat_x = np.eye(4)
    rmat_x[1:3, 1:3] = np.array([[ np.cos(theta),-np.sin(theta)],
                                 [ np.sin(theta), np.cos(theta)]])

    # rotation about the y-axis
    theta = np.deg2rad(angle_y)
    rmat_y = np.eye(4)
    rmat_y[[0, 0, 2, 2], [0, 2, 0, 2]] = (
        np.cos(theta), np.sin(theta), -np.sin(theta), np.cos(theta))

    # rotation about the z-axis
    theta = np.deg2rad(angle_z)
    rmat_z = np.eye(4)
    rmat_z[:2, :2] = np.array([[ np.cos(theta),-np.sin(theta)],
                               [ np.sin(theta), np.cos(theta)]])

    # translate from origin back to centre
    ori2cent = np.eye(4)
    ori2cent[:3, 3] = cr, cc, cz

    # combine transformations (rightmost matrix is applied first)
    tform = ori2cent.dot(rmat_z).dot(rmat_y).dot(rmat_x).dot(cent2ori)
    xyzu_out = tform.dot(xyzu_in)

    r, c, z = xyzu_out[:3]

    rmin = int(r.min())
    rmax = int(r.max())
    cmin = int(c.min())
    cmax = int(c.max())
    zmin = int(z.min())
    zmax = int(z.max())

    return rmin, rmax, cmin, cmax, zmin, zmax

I've essentially just modified the function above using the rotation matrix expressions from here- I haven't had time to write a test-case yet, so use with caution.

我基本上只是使用这里的旋转矩阵表达式修改了上面的函数- 我还没有时间编写测试用例,所以请谨慎使用。

回答by rth

Here is an algorithm to calculate the bounding box for N dimensional arrays,

这是一个计算 N 维数组边界框的算法,

def get_bounding_box(x):
    """ Calculates the bounding box of a ndarray"""
    mask = x == 0
    bbox = []
    all_axis = np.arange(x.ndim)
    for kdim in all_axis:
        nk_dim = np.delete(all_axis, kdim)
        mask_i = mask.all(axis=tuple(nk_dim))
        dmask_i = np.diff(mask_i)
        idx_i = np.nonzero(dmask_i)[0]
        if len(idx_i) != 2:
            raise ValueError('Algorithm failed, {} does not have 2 elements!'.format(idx_i))
        bbox.append(slice(idx_i[0]+1, idx_i[1]+1))
    return bbox

which can be used with 2D, 3D, etc arrays as follows,

可用于 2D、3D 等数组,如下所示,

In [1]: print((img2!=0).astype(int))
   ...: bbox = get_bounding_box(img2)
   ...: print((img2[bbox]!=0).astype(int))
   ...: 
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0]
 [0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0]
 [0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0]
 [0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0]
 [0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0]
 [0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0]
 [0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0]
 [0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
[[0 0 0 0 0 0 1 1 0 0 0 0 0 0]
 [0 0 0 0 0 1 1 1 1 0 0 0 0 0]
 [0 0 0 0 1 1 1 1 1 1 0 0 0 0]
 [0 0 0 1 1 1 1 1 1 1 1 0 0 0]
 [0 0 1 1 1 1 1 1 1 1 1 1 0 0]
 [0 1 1 1 1 1 1 1 1 1 1 1 1 0]
 [1 1 1 1 1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1 1 1 1 1]
 [0 1 1 1 1 1 1 1 1 1 1 1 1 0]
 [0 0 1 1 1 1 1 1 1 1 1 1 0 0]
 [0 0 0 1 1 1 1 1 1 1 1 0 0 0]
 [0 0 0 0 1 1 1 1 1 1 0 0 0 0]
 [0 0 0 0 0 1 1 1 1 0 0 0 0 0]
 [0 0 0 0 0 0 1 1 0 0 0 0 0 0]]

Although replacing the np.diffand np.nonzerocalls by one np.wheremight be better.

尽管将np.diffnp.nonzero电话替换为一个np.where可能会更好。

回答by Allan Zelener

I was able to squeeze out a little more performance by replacing np.wherewith np.argmaxand working on a boolean mask.

我能够通过更换挤出一点更多的性能np.wherenp.argmax和工作的一个布尔面具。

def bbox(img):
    img = (img > 0)
    rows = np.any(img, axis=1)
    cols = np.any(img, axis=0)
    rmin, rmax = np.argmax(rows), img.shape[0] - 1 - np.argmax(np.flipud(rows))
    cmin, cmax = np.argmax(cols), img.shape[1] - 1 - np.argmax(np.flipud(cols))
    return rmin, rmax, cmin, cmax

This was about 10μs faster for me than the bbox2 solution above on the same benchmark. There should also be a way to just use the result of argmax to find the non-zero rows and columns, avoiding the extra search done by using np.any, but this may require some tricky indexing that I wasn't able to get working efficiently with simple vectorized code.

在相同的基准测试中,这比上面的 bbox2 解决方案快了大约 10 微秒。还应该有一种方法可以只使用 argmax 的结果来查找非零行和列,避免使用 完成额外的搜索np.any,但这可能需要一些棘手的索引,我无法使用简单的矢量化代码。