Python 如何访问 NumPy 多维数组的第 i 列?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4455076/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 15:49:13  来源:igfitidea点击:

How to access the ith column of a NumPy multidimensional array?

pythonarraysnumpy

提问by lpl

Suppose I have:

假设我有:

test = numpy.array([[1, 2], [3, 4], [5, 6]])

test[i]gets me ithline of the array (eg [1, 2]). How can I access the ithcolumn? (eg [1, 3, 5]). Also, would this be an expensive operation?

test[i]获取数组的第 i行(例如[1, 2])。如何访问第i列?(例如[1, 3, 5])。另外,这会是一项昂贵的操作吗?

采纳答案by mtrw

>>> test[:,0]
array([1, 3, 5])

Similarly,

相似地,

>>> test[1,:]
array([3, 4])

lets you access rows. This is covered in Section 1.4 (Indexing) of the NumPy reference. This is quick, at least in my experience. It's certainly much quicker than accessing each element in a loop.

允许您访问行。这在NumPy 参考的第 1.4 节(索引)中有介绍。这很快,至少根据我的经验。这肯定比访问循环中的每个元素要快得多。

回答by Akavall

And if you want to access more than one column at a time you could do:

如果您想一次访问多个列,您可以执行以下操作:

>>> test = np.arange(9).reshape((3,3))
>>> test
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
>>> test[:,[0,2]]
array([[0, 2],
       [3, 5],
       [6, 8]])

回答by Cloud

>>> test[:,0]
array([1, 3, 5])

this command gives you a row vector, if you just want to loop over it, it's fine, but if you want to hstack with some other array with dimension 3xN, you will have

这个命令给你一个行向量,如果你只是想循环它,那很好,但如果你想与其他维度为 3xN 的数组 hstack,你将有

ValueError: all the input arrays must have same number of dimensions
ValueError: all the input arrays must have same number of dimensions

while

尽管

>>> test[:,[0]]
array([[1],
       [3],
       [5]])

gives you a column vector, so that you can do concatenate or hstack operation.

为您提供一个列向量,以便您可以进行连接或 hstack 操作。

e.g.

例如

>>> np.hstack((test, test[:,[0]]))
array([[1, 2, 1],
       [3, 4, 3],
       [5, 6, 5]])

回答by mac

>>> test
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

>>> ncol = test.shape[1]
>>> ncol
5L

Then you can select the 2nd - 4th column this way:

然后您可以通过这种方式选择第 2 - 4 列:

>>> test[0:, 1:(ncol - 1)]
array([[1, 2, 3],
       [6, 7, 8]])

回答by Hotschke

You could also transpose and return a row:

您还可以转置并返回一行:

In [4]: test.T[0]
Out[4]: array([1, 3, 5])

回答by Alberto Perez

To get several and indepent columns, just:

要获得多个独立的列,只需:

> test[:,[0,2]]

you will get colums 0 and 2

你会得到 0 和 2 列

回答by X ? A-12

Although the question has been answered, let me mention some nuances.

虽然这个问题已经回答了,但让我提一些细微差别。

Let's say you are interested in the first column of the array

假设您对数组的第一列感兴趣

arr = numpy.array([[1, 2],
                   [3, 4],
                   [5, 6]])

As you already know from other answers, to get it in the form of "row vector" (array of shape (3,)), you use slicing:

正如您从其他答案中已经知道的那样,要以“行向量”(形状数组(3,))的形式获取它,您可以使用切片:

arr_c1_ref = arr[:, 1]  # creates a reference to the 1st column of the arr
arr_c1_copy = arr[:, 1].copy()  # creates a copy of the 1st column of the arr

To check if an array is a view or a copy of another array you can do the following:

要检查数组是视图还是另一个数组的副本,您可以执行以下操作:

arr_c1_ref.base is arr  # True
arr_c1_copy.base is arr  # False

see ndarray.base.

ndarray.base

Besides the obvious difference between the two (modifying arr_c1_refwill affect arr), the number of byte-steps for traversing each of them is different:

除了两者之间的明显区别(修改arr_c1_ref会影响arr),遍历它们的字节步数是不同的:

arr_c1_ref.strides[0]  # 8 bytes
arr_c1_copy.strides[0]  # 4 bytes

see strides. Why is this important? Imagine that you have a very big array Ainstead of the arr:

大步。为什么这很重要?想象一下,您有一个非常大的数组A而不是arr

A = np.random.randint(2, size=(10000,10000), dtype='int32')
A_c1_ref = A[:, 1] 
A_c1_copy = A[:, 1].copy()

and you want to compute the sum of all the elements of the first column, i.e. A_c1_ref.sum()or A_c1_copy.sum(). Using the copied version is much faster:

并且您想计算第一列的所有元素的总和,即A_c1_ref.sum()or A_c1_copy.sum()。使用复制版本要快得多:

%timeit A_c1_ref.sum()  # ~248 μs
%timeit A_c1_copy.sum()  # ~12.8 μs

This is due to the different number of strides mentioned before:

这是由于前面提到的步幅数不同造成的:

A_c1_ref.strides[0]  # 40000 bytes
A_c1_copy.strides[0]  # 4 bytes

Although it might seem that using column copies is better, it is not always true for the reason that making a copy takes time and uses more memory (in this case it took me approx. 200 μs to create the A_c1_copy). However if we need the copy in the first place, or we need to do many different operations on a specific column of the array and we are ok with sacrificing memory for speed, then making a copy is the way to go.

虽然看起来使用列副本更好,但并非总是如此,因为制作副本需要时间并使用更多内存(在这种情况下,我花了大约 200 微秒来创建A_c1_copy)。但是,如果我们首先需要复制,或者我们需要对数组的特定列执行许多不同的操作,并且我们可以牺牲内存来提高速度,那么制作复制是可行的方法。

In the case that we are interested in working mostly with columns, it could be a good idea to create our array in column-major ('F') order instead of the row-major ('C') order (which is the default), and then do the slicing as before to get a column without copying it:

在我们主要对列感兴趣的情况下,以列主 ('F') 顺序而不是行主 ('C') 顺序(这是默认值)创建数组可能是个好主意),然后像以前一样进行切片以获取一列而不复制它:

A = np.asfortranarray(A)  # or np.array(A, order='F')
A_c1_ref = A[:, 1]
A_c1_ref.strides[0]  # 4 bytes
%timeit A_c1_ref.sum()  # ~12.6 μs vs ~248 μs

Now, performing the sum operation (or any other) on a column-view is much faster.

现在,在列视图上执行求和操作(或任何其他操作)要快得多。

Finally let me note that transposing an array and using row-slicing is the same as using the column-slicing on the original array, because transposing is done by just swapping the shape and the strides of the original array.

最后让我注意,转置数组并使用行切片与在原始数组上使用列切片相同,因为转置只是通过交换原始数组的形状和步幅来完成的。

A.T[1,:].strides[0]  # 40000