Python 对二维 numpy 数组进行子集化

Question

提问by CrossEntropy

I have looked into documentations and also other questions here, but it seems I have not got the hang of subsetting in numpy arrays yet.

我在这里查看了文档和其他问题，但似乎我还没有掌握 numpy 数组中的子集。

I have a numpy array, and for the sake of argument, let it be defined as follows:

我有一个 numpy 数组，为了论证，让它定义如下：

import numpy as np
a = np.arange(100)
a.shape = (10,10)
# array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
#        [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
#        [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
#        [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
#        [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
#        [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
#        [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
#        [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
#        [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
#        [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

now I want to choose rows and columns of aspecified by vectors n1and n2. As an example:

现在我想选择a由向量n1和指定的行和列n2。举个例子：

n1 = range(5)
n2 = range(5)

But when I use:

但是当我使用：

b = a[n1,n2]
# array([ 0, 11, 22, 33, 44])

Then only the first fifth diagonal elements are chosen, not the whole 5x5 block. The solution I have found is to do it like this:

然后只选择第一个第五对角线元素，而不是整个 5x5 块。我找到的解决方案是这样做：

b = a[n1,:]
b = b[:,n2]
# array([[ 0,  1,  2,  3,  4],
#        [10, 11, 12, 13, 14],
#        [20, 21, 22, 23, 24],
#        [30, 31, 32, 33, 34],
#        [40, 41, 42, 43, 44]])

But I am sure there should be a way to do this simple task in just one command.

但我确信应该有一种方法可以在一个命令中完成这个简单的任务。

Answer 1

采纳答案by Joe Kington

You've gotten a handful of nice examples of how to do what you want. However, it's also useful to understand the what's happening and why things work the way they do. There are a few simple rules that will help you in the future.

你已经得到了一些很好的例子来说明如何做你想做的事。但是，了解正在发生的事情以及事情为什么会这样运作也很有用。有一些简单的规则可以在将来对您有所帮助。

There's a big difference between "fancy" indexing (i.e. using a list/sequence) and "normal" indexing (using a slice). The underlying reason has to do with whether or not the array can be "regularly strided", and therefore whether or not a copy needs to be made. Arbitrary sequences therefore have to be treated differently, if we want to be able to create "views" without making copies.

“花式”索引（即使用列表/序列）和“正常”索引（使用切片）之间存在很大差异。根本原因与数组是否可以“定期跨步”有关，因此是否需要制作副本。因此，如果我们希望能够在不制作副本的情况下创建“视图”，则必须区别对待任意序列。

In your case:

在你的情况下：

import numpy as np

a = np.arange(100).reshape(10,10)
n1, n2 = np.arange(5), np.arange(5)

# Not what you want
b = a[n1, n2]  # array([ 0, 11, 22, 33, 44])

# What you want, but only for simple sequences
# Note that no copy of *a* is made!! This is a view.
b = a[:5, :5]

# What you want, but probably confusing at first. (Also, makes a copy.)
# np.meshgrid and np.ix_ are basically equivalent to this.
b = a[n1[:,None], n2[None,:]]

Fancy indexing with 1D sequences is basically equivalent to zipping them together and indexing with the result.

使用 1D 序列进行花式索引基本上等同于将它们压缩在一起并使用结果进行索引。

print "Fancy Indexing:"
print a[n1, n2]

print "Manual indexing:"
for i, j in zip(n1, n2):
    print a[i, j]

However, if the sequences you're indexing with match the dimensionality of the array you're indexing (2D, in this case), The indexing is treated differently. Instead of "zipping the two together", numpy uses the indices like a mask.

但是，如果您索引的序列与您索引的数组的维度（在本例中为 2D）相匹配，则索引的处理方式不同。numpy 不是“将两者压缩在一起”，而是像掩码一样使用索引。

In other words, a[[[1, 2, 3]], [[1],[2],[3]]]is treated completely differently than a[[1, 2, 3], [1, 2, 3]], because the sequences/arrays that you're passing in are two-dimensional.

换句话说，a[[[1, 2, 3]], [[1],[2],[3]]]的处理方式与完全不同a[[1, 2, 3], [1, 2, 3]]，因为您传入的序列/数组是二维的。

In [4]: a[[[1, 2, 3]], [[1],[2],[3]]]
Out[4]:
array([[11, 21, 31],
       [12, 22, 32],
       [13, 23, 33]])

In [5]: a[[1, 2, 3], [1, 2, 3]]
Out[5]: array([11, 22, 33])

To be a bit more precise,

更准确地说，

a[[[1, 2, 3]], [[1],[2],[3]]]

is treated exactly like:

完全按照以下方式处理：

i = [[1, 1, 1],
     [2, 2, 2],
     [3, 3, 3]])
j = [[1, 2, 3],
     [1, 2, 3],
     [1, 2, 3]]
a[i, j]

In other words, whether the input is a row/column vector is a shorthand for how the indices should repeat in the indexing.

换句话说，输入是否是行/列向量是索引应如何在索引中重复的简写。

np.meshgridand np.ix_are just convienent ways to turn your 1D sequences into their 2D versions for indexing:

np.meshgrid并且np.ix_只是将您的 1D 序列转换为用于索引的 2D 版本的便捷方法：

In [6]: np.ix_([1, 2, 3], [1, 2, 3])
Out[6]:
(array([[1],
       [2],
       [3]]), array([[1, 2, 3]]))

Similarly (the sparseargument would make it identical to ix_above):

类似地（该sparse参数将使其与ix_上述相同）：

In [7]: np.meshgrid([1, 2, 3], [1, 2, 3], indexing='ij')
Out[7]:
[array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 3]]),
 array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])]

Answer 2

回答by unutbu

You could use np.meshgridto give the n1, n2arrays the proper shape to perform the desired indexing:

您可以使用np.meshgrid给n1,n2数组适当的形状来执行所需的索引：

In [104]: a[np.meshgrid(n1,n2, sparse=True, indexing='ij')]
Out[104]: 
array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])

Or, without meshgrid:

或者，没有网格：

In [117]: a[np.array(n1)[:,np.newaxis], np.array(n2)[np.newaxis,:]]
Out[117]: 
array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])

There is a similar example with an explanation of how this integer array indexingworks in the docs.

在文档中有一个类似的例子，解释了这个整数数组索引是如何工作的。

See also the Cookbook recipe Picking out rows and columns.

另请参阅食谱食谱挑选行和列。

Answer 3

回答by Alex Riley

Another quick way to build the desired index is to use the np.ix_function:

构建所需索引的另一种快速方法是使用该np.ix_函数：

>>> a[np.ix_(n1, n2)]
array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])

This provides a convenient way to construct an open mesh from sequences of indices.

这提供了一种从索引序列构建开放网格的便捷方法。

Answer 4

回答by mkultra

It seems that a use case for your particular question would deal with image manipulation. To the extent that you are using your example to edit numpy arrays arising from images, you can use the Python Imaging Library (PIL).

似乎您的特定问题的用例将处理图像处理。如果您使用示例编辑由图像产生的 numpy 数组，则可以使用 Python 成像库 (PIL)。

# Import Pillow:
from PIL import Image

# Load the original image:
img = Image.open("flowers.jpg")

# Crop the image
img2 = img.crop((0, 0, 5, 5))

The img2 object is a numpy array of the resulting cropped image.

img2 对象是生成的裁剪图像的 numpy 数组。

You can read more about image manipulation here with the Pillow package(a user friendly fork on the PIL package):

您可以在此处使用Pillow 包（PIL 包上的用户友好分支）阅读有关图像处理的更多信息：

Python 对二维 numpy 数组进行子集化

提问by CrossEntropy

采纳答案by Joe Kington

回答by unutbu

回答by Alex Riley

回答by mkultra

相关推荐

最近更新

标签

Python 对二维 numpy 数组进行子集化

提问by CrossEntropy

采纳答案by Joe Kington

回答by unutbu

回答by Alex Riley

回答by mkultra

相关推荐

Python：list() 作为字典的默认值

Python 是否有一种简洁的方法可以仅显示当前命令的 Pandas 中的所有行？

Python 如何打印没有括号、逗号和引号的整数列表？

Python PySpark 从 TimeStampType 列向 DataFrame 添加一列

相关推荐

最近更新

标签