Python NumPy 使用索引列表选择每行特定的列索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23435782/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:58:39  来源:igfitidea点击:

NumPy selecting specific column index per row by using a list of indexes

pythonpython-2.7numpy

提问by Zee

I'm struggling to select the specific columns per row of a NumPymatrix.

我正在努力选择NumPy矩阵每行的特定列。

Suppose I have the following matrix which I would call X:

假设我有以下矩阵,我会称之为X

[1, 2, 3]
[4, 5, 6]
[7, 8, 9]

I also have a listof column indexes per every row which I would call Y:

list每行都有一个列索引,我称之为Y

[1, 0, 2]

I need to get the values:

我需要获取值:

[2]
[4]
[9]

Instead of a listwith indexes Y, I can also produce a matrix with the same shape as Xwhere every column is a bool/ intin the range 0-1 value, indicating whether this is the required column.

除了listwith 索引Y,我还可以生成一个形状与X其中每一列都是0-1 值范围内的bool/相同形状的矩阵int,指示这是否是所需的列。

[0, 1, 0]
[1, 0, 0]
[0, 0, 1]

I know this can be done with iterating over the array and selecting the column values I need. However, this will be executed frequently on big arrays of data and that's why it has to run as fast as it can.

我知道这可以通过迭代数组并选择我需要的列值来完成。但是,这将在大量数据上频繁执行,这就是它必须尽可能快地运行的原因。

I was thus wondering if there is a better solution?

因此我想知道是否有更好的解决方案?

Thank you.

谢谢你。

采纳答案by Slater Victoroff

If you've got a boolean array you can do direct selection based on that like so:

如果你有一个布尔数组,你可以根据它进行直接选择,如下所示:

>>> a = np.array([True, True, True, False, False])
>>> b = np.array([1,2,3,4,5])
>>> b[a]
array([1, 2, 3])

To go along with your initial example you could do the following:

为了配合您的初始示例,您可以执行以下操作:

>>> a = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> b = np.array([[False,True,False],[True,False,False],[False,False,True]])
>>> a[b]
array([2, 4, 9])

You can also add in an arangeand do direct selection on that, though depending on how you're generating your boolean array and what your code looks like YMMV.

您还可以添加一个arange并直接选择它,但这取决于您生成布尔数组的方式以及您的代码看起来像 YMMV。

>>> a = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> a[np.arange(len(a)), [1,0,2]]
array([2, 4, 9])

Hope that helps, let me know if you've got any more questions.

希望有帮助,如果您还有其他问题,请告诉我。

回答by Ashwini Chaudhary

You can do something like this:

你可以这样做:

In [7]: a = np.array([[1, 2, 3],
   ...: [4, 5, 6],
   ...: [7, 8, 9]])

In [8]: lst = [1, 0, 2]

In [9]: a[np.arange(len(a)), lst]
Out[9]: array([2, 4, 9])

More on indexing multi-dimensional arrays: http://docs.scipy.org/doc/numpy/user/basics.indexing.html#indexing-multi-dimensional-arrays

有关索引多维数组的更多信息:http: //docs.scipy.org/doc/numpy/user/basics.indexing.html#indexing-multi-dimensional-arrays

回答by Kei Minagawa

You can do it by using iterator. Like this:

您可以使用迭代器来完成。像这样:

np.fromiter((row[index] for row, index in zip(X, Y)), dtype=int)

Time:

时间:

N = 1000
X = np.zeros(shape=(N, N))
Y = np.arange(N)

#@A?wini ?haudhary
%timeit X[np.arange(len(X)), Y]
10000 loops, best of 3: 30.7 us per loop

#mine
%timeit np.fromiter((row[index] for row, index in zip(X, Y)), dtype=int)
1000 loops, best of 3: 1.15 ms per loop

#mine
%timeit np.diag(X.T[Y])
10 loops, best of 3: 20.8 ms per loop

回答by Dhaval Mayatra

A simple way might look like:

一种简单的方法可能如下所示:

In [1]: a = np.array([[1, 2, 3],
   ...: [4, 5, 6],
   ...: [7, 8, 9]])

In [2]: y = [1, 0, 2]  #list of indices we want to select from matrix 'a'

range(a.shape[0])will return array([0, 1, 2])

range(a.shape[0])将返回 array([0, 1, 2])

In [3]: a[range(a.shape[0]), y] #we're selecting y indices from every row
Out[3]: array([2, 4, 9])

回答by Thomas Devoogdt

Another clever way is to first transpose the array and index it thereafter. Finally, take the diagonal, its always the right answer.

另一个聪明的方法是先转置数组,然后再索引它。最后,取对角线,它总是正确的答案。

X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
Y = np.array([1, 0, 2, 2])

np.diag(X.T[Y])

Step by step:

一步步:

Original arrays:

原始数组:

>>> X
array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

>>> Y
array([1, 0, 2, 2])

Transpose to make it possible to index it right.

转置以使其可以正确索引。

>>> X.T
array([[ 1,  4,  7, 10],
       [ 2,  5,  8, 11],
       [ 3,  6,  9, 12]])

Get rows in the Y order.

按 Y 顺序获取行。

>>> X.T[Y]
array([[ 2,  5,  8, 11],
       [ 1,  4,  7, 10],
       [ 3,  6,  9, 12],
       [ 3,  6,  9, 12]])

The diagonal should now become clear.

对角线现在应该变得清晰。

>>> np.diag(X.T[Y])
array([ 2,  4,  9, 12]

回答by hpaulj

Recent numpyversions have added a take_along_axis(and put_along_axis) that does this indexing cleanly.

最近的numpy版本添加了一个take_along_axis(和put_along_axis),可以干净地进行索引。

In [101]: a = np.arange(1,10).reshape(3,3)                                                             
In [102]: b = np.array([1,0,2])                                                                        
In [103]: np.take_along_axis(a, b[:,None], axis=1)                                                     
Out[103]: 
array([[2],
       [4],
       [9]])

It operates in the same way as:

它的运作方式与:

In [104]: a[np.arange(3), b]                                                                           
Out[104]: array([2, 4, 9])

but with different axis handling. It's especially aimed at applying the results of argsortand argmax.

但具有不同的轴处理。这是特别针对应用的结果argsortargmax