Python NumPy 2d 数组的切片,或者如何从 nxn 数组 (n>m) 中提取 mxm 子矩阵?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4257394/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 14:57:29  来源:igfitidea点击:

Slicing of a NumPy 2d array, or how do I extract an mxm submatrix from an nxn array (n>m)?

pythonnumpyslice

提问by levesque

I want to slice a NumPy nxn array. I want to extract an arbitraryselection of m rows and columns of that array (i.e. without any pattern in the numbers of rows/columns), making it a new, mxm array. For this example let us say the array is 4x4 and I want to extract a 2x2 array from it.

我想切片一个 NumPy nxn 数组。我想提取该数组的 m 行和列的任意选择(即行/列数没有任何模式),使其成为一个新的 mxm 数组。在这个例子中,假设数组是 4x4,我想从中提取一个 2x2 的数组。

Here is our array:

这是我们的数组:

from numpy import *
x = range(16)
x = reshape(x,(4,4))

print x
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]

The line and columns to remove are the same. The easiest case is when I want to extract a 2x2 submatrix that is at the beginning or at the end, i.e. :

要删除的行和列是相同的。最简单的情况是当我想提取位于开头或结尾的 2x2 子矩阵时,即:

In [33]: x[0:2,0:2]
Out[33]: 
array([[0, 1],
       [4, 5]])

In [34]: x[2:,2:]
Out[34]: 
array([[10, 11],
       [14, 15]])

But what if I need to remove another mixture of rows/columns? What if I need to remove the first and third lines/rows, thus extracting the submatrix [[5,7],[13,15]]? There can be any composition of rows/lines. I read somewhere that I just need to index my array using arrays/lists of indices for both rows and columns, but that doesn't seem to work:

但是如果我需要删除另一种行/列的混合呢?如果我需要删除第一行和第三行/行,从而提取子矩阵[[5,7],[13,15]]怎么办?可以有任何行/行组合。我在某处读到我只需要使用行和列的索引数组/列表来索引我的数组,但这似乎不起作用:

In [35]: x[[1,3],[1,3]]
Out[35]: array([ 5, 15])

I found one way, which is:

我找到了一种方法,即:

    In [61]: x[[1,3]][:,[1,3]]
Out[61]: 
array([[ 5,  7],
       [13, 15]])

First issue with this is that it is hardly readable, although I can live with that. If someone has a better solution, I'd certainly like to hear it.

第一个问题是它几乎不可读,尽管我可以忍受。如果有人有更好的解决方案,我当然想听听。

Other thing is I read on a forumthat indexing arrays with arrays forces NumPy to make a copy of the desired array, thus when treating with large arrays this could become a problem. Why is that so / how does this mechanism work?

另一件事是我在一个论坛上读到用数组索引数组会强制 NumPy 制作所需数组的副本,因此在处理大型数组时,这可能会成为一个问题。为什么会这样/这个机制是如何工作的?

采纳答案by Justin Peel

As Sven mentioned, x[[[0],[2]],[1,3]]will give back the 0 and 2 rows that match with the 1 and 3 columns while x[[0,2],[1,3]]will return the values x[0,1] and x[2,3] in an array.

正如 Sven 提到的,x[[[0],[2]],[1,3]]将返回与 1 和 3 列匹配的 0 和 2 行,同时x[[0,2],[1,3]]返回数组中的值 x[0,1] 和 x[2,3]。

There is a helpful function for doing the first example I gave, numpy.ix_. You can do the same thing as my first example with x[numpy.ix_([0,2],[1,3])]. This can save you from having to enter in all of those extra brackets.

有一个有用的函数来做我给出的第一个例子,numpy.ix_. 您可以使用x[numpy.ix_([0,2],[1,3])]. 这可以使您不必输入所有这些额外的括号。

回答by jsbueno

With numpy, you can pass a slice for each component of the index - so, your x[0:2,0:2]example above works.

使用 numpy,您可以为索引的每个组件传递一个切片 - 因此,x[0:2,0:2]上面的示例有效。

If you just want to evenly skip columns or rows, you can pass slices with three components (i.e. start, stop, step).

如果你只是想均匀地跳过列或行,你可以通过三个组件(即开始、停止、步骤)传递切片。

Again, for your example above:

同样,对于上面的示例:

>>> x[1:4:2, 1:4:2]
array([[ 5,  7],
       [13, 15]])

Which is basically: slice in the first dimension, with start at index 1, stop when index is equal or greater than 4, and add 2 to the index in each pass. The same for the second dimension. Again: this only works for constant steps.

这基本上是:在第一个维度中切片,从索引 1 开始,当索引等于或大于 4 时停止,并在每次传递中向索引添加 2。对于第二维也是如此。再次:这仅适用于恒定步骤。

The syntax you got to do something quite different internally - what x[[1,3]][:,[1,3]]actually does is create a new array including only rows 1 and 3 from the original array (done with the x[[1,3]]part), and then re-slice that - creating a third array - including only columns 1 and 3 of the previous array.

你必须在内部做一些完全不同的语法 -x[[1,3]][:,[1,3]]实际上做的是创建一个新数组,只包含来自原始数组的第 1 行和第 3 行(用x[[1,3]]部分完成),然后重新切片 - 创建第三个数组 - 仅包括前一个数组的第 1 列和第 3 列。

回答by Dat Chu

I don't think that x[[1,3]][:,[1,3]]is hardly readable. If you want to be more clear on your intent, you can do:

我不认为这x[[1,3]][:,[1,3]]很难读。如果你想更清楚你的意图,你可以这样做:

a[[1,3],:][:,[1,3]]

I am not an expert in slicing but typically, if you try to slice into an array and the values are continuous, you get back a view where the stride value is changed.

我不是切片方面的专家,但通常情况下,如果您尝试将数组切片并且值是连续的,则会返回步幅值已更改的视图。

e.g. In your inputs 33 and 34, although you get a 2x2 array, the stride is 4. Thus, when you index the next row, the pointer moves to the correct position in memory.

例如,在您的输入 33 和 34 中,虽然您得到一个 2x2 数组,但步幅为 4。因此,当您索引下一行时,指针将移动到内存中的正确位置。

Clearly, this mechanism doesn't carry well into the case of an array of indices. Hence, numpy will have to make the copy. After all, many other matrix math function relies on size, stride and continuous memory allocation.

显然,这种机制不适用于索引数组的情况。因此,numpy 将不得不进行复制。毕竟,许多其他矩阵数学函数依赖于大小、步幅和连续内存分配。

回答by Sven Marnach

To answer this question, we have to look at how indexing a multidimensional array works in Numpy. Let's first say you have the array xfrom your question. The buffer assigned to xwill contain 16 ascending integers from 0 to 15. If you access one element, say x[i,j], NumPy has to figure out the memory location of this element relative to the beginning of the buffer. This is done by calculating in effect i*x.shape[1]+j(and multiplying with the size of an int to get an actual memory offset).

要回答这个问题,我们必须看看在 Numpy 中索引多维数组是如何工作的。让我们首先说你x从你的问题中得到了数组。分配给的缓冲区x将包含 16 个从 0 到 15 的升序整数。如果您访问一个元素,例如x[i,j],NumPy 必须找出该元素相对于缓冲区开头的内存位置。这是通过有效计算i*x.shape[1]+j(并乘以 int 的大小以获得实际内存偏移量)来完成的。

If you extract a subarray by basic slicing like y = x[0:2,0:2], the resulting object will share the underlying buffer with x. But what happens if you acces y[i,j]? NumPy can't use i*y.shape[1]+jto calculate the offset into the array, because the data belonging to yis not consecutive in memory.

如果您通过基本切片(如 )提取子y = x[0:2,0:2]数组,则生成的对象将与 共享底层缓冲区x。但是,如果您访问会发生什么y[i,j]?NumPy 不能i*y.shape[1]+j用来计算数组的偏移量,因为属于的数据y在内存中是不连续的。

NumPy solves this problem by introducing strides. When calculating the memory offset for accessing x[i,j], what is actually calculated is i*x.strides[0]+j*x.strides[1](and this already includes the factor for the size of an int):

NumPy的解决了通过引入这一问题的进展。在计算用于访问的内存偏移量时x[i,j],实际计算的是i*x.strides[0]+j*x.strides[1](并且这已经包括了 int 大小的因素):

x.strides
(16, 4)

When yis extracted like above, NumPy does not create a new buffer, but it doescreate a new array object referencing the same buffer (otherwise ywould just be equal to x.) The new array object will have a different shape then xand maybe a different starting offset into the buffer, but will share the strides with x(in this case at least):

y像上面那样提取时,NumPy 不会创建一个新的缓冲区,但它创建一个引用相同缓冲区的新数组对象(否则y将等于x。)新的数组对象将具有不同的形状,然后x可能会有不同的开始偏移到缓冲区中,但将与x(至少在这种情况下)共享步幅:

y.shape
(2,2)
y.strides
(16, 4)

This way, computing the memory offset for y[i,j]will yield the correct result.

这样,计算内存偏移量y[i,j]就会得到正确的结果。

But what should NumPy do for something like z=x[[1,3]]? The strides mechanism won't allow correct indexing if the original buffer is used for z. NumPy theoretically couldadd some more sophisticated mechanism than the strides, but this would make element access relatively expensive, somehow defying the whole idea of an array. In addition, a view wouldn't be a really lightweight object anymore.

但是 NumPy 应该为类似的东西做什么z=x[[1,3]]?如果原始缓冲区用于z. NumPy 理论上可以添加一些比strides更复杂的机制,但这会使元素访问相对昂贵,不知何故违背了数组的整体理念。此外,视图将不再是真正的轻量级对象。

This is covered in depth in the NumPy documentation on indexing.

关于索引的 NumPy 文档对此进行了深入介绍

Oh, and nearly forgot about your actual question: Here is how to make the indexing with multiple lists work as expected:

哦,几乎忘记了您的实际问题:以下是如何使多个列表的索引按预期工作:

x[[[1],[3]],[1,3]]

This is because the index arrays are broadcastedto a common shape. Of course, for this particular example, you can also make do with basic slicing:

这是因为索引数组被广播到一个共同的形状。当然,对于这个特定的例子,你也可以使用基本的切片:

x[1::2, 1::2]

回答by unutbu

If you want to skip every other row and every other column, then you can do it with basic slicing:

如果您想跳过每隔一行和每隔一列,那么您可以使用基本切片来完成:

In [49]: x=np.arange(16).reshape((4,4))
In [50]: x[1:4:2,1:4:2]
Out[50]: 
array([[ 5,  7],
       [13, 15]])

This returns a view, not a copy of your array.

这将返回一个视图,而不是数组的副本。

In [51]: y=x[1:4:2,1:4:2]

In [52]: y[0,0]=100

In [53]: x   # <---- Notice x[1,1] has changed
Out[53]: 
array([[  0,   1,   2,   3],
       [  4, 100,   6,   7],
       [  8,   9,  10,  11],
       [ 12,  13,  14,  15]])

while z=x[(1,3),:][:,(1,3)]uses advanced indexing and thus returns a copy:

whilez=x[(1,3),:][:,(1,3)]使用高级索引,因此返回一个副本:

In [58]: x=np.arange(16).reshape((4,4))
In [59]: z=x[(1,3),:][:,(1,3)]

In [60]: z
Out[60]: 
array([[ 5,  7],
       [13, 15]])

In [61]: z[0,0]=0

Note that xis unchanged:

请注意,x不变:

In [62]: x
Out[62]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

If you wish to select arbitrary rows and columns, then you can't use basic slicing. You'll have to use advanced indexing, using something like x[rows,:][:,columns], where rowsand columnsare sequences. This of course is going to give you a copy, not a view, of your original array. This is as one should expect, since a numpy array uses contiguous memory (with constant strides), and there would be no way to generate a view with arbitrary rows and columns (since that would require non-constant strides).

如果您希望选择任意行和列,则不能使用基本切片。您必须使用高级索引,使用诸如x[rows,:][:,columns], whererowscolumnsare 序列之类的东西。这当然会为您提供原始数组的副本,而不是视图。正如人们所期望的那样,因为 numpy 数组使用连续内存(具有恒定步幅),并且无法生成具有任意行和列的视图(因为这需要非常量步幅)。

回答by Rafael Valero

I have a similar question here: Writting in sub-ndarray of a ndarray in the most pythonian way. Python 2 .

我在这里有一个类似的问题:Writting in sub-ndarray of a ndarray in the most pythonian way。蟒蛇 2

Following the solution of previous post for your case the solution looks like:

按照上一篇文章针对您的案例的解决方案,解决方案如下所示:

columns_to_keep = [1,3] 
rows_to_keep = [1,3]

An using ix_:

使用 ix_:

x[np.ix_(rows_to_keep, columns_to_keep)] 

Which is:

这是:

array([[ 5,  7],
       [13, 15]])

回答by Valery Marcel

I'm not sure how efficient this is but you can use range() to slice in both axis

我不确定这有多有效,但您可以使用 range() 在两个轴上切片

 x=np.arange(16).reshape((4,4))
 x[range(1,3), :][:,range(1,3)]