Python NumPy 使用索引列表选择每行特定的列索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23435782/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
NumPy selecting specific column index per row by using a list of indexes
提问by Zee
I'm struggling to select the specific columns per row of a NumPy
matrix.
我正在努力选择NumPy
矩阵每行的特定列。
Suppose I have the following matrix which I would call X
:
假设我有以下矩阵,我会称之为X
:
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
I also have a list
of column indexes per every row which I would call Y
:
我list
每行都有一个列索引,我称之为Y
:
[1, 0, 2]
I need to get the values:
我需要获取值:
[2]
[4]
[9]
Instead of a list
with indexes Y
, I can also produce a matrix with the same shape as X
where every column is a bool
/ int
in the range 0-1 value, indicating whether this is the required column.
除了list
with 索引Y
,我还可以生成一个形状与X
其中每一列都是0-1 值范围内的bool
/相同形状的矩阵int
,指示这是否是所需的列。
[0, 1, 0]
[1, 0, 0]
[0, 0, 1]
I know this can be done with iterating over the array and selecting the column values I need. However, this will be executed frequently on big arrays of data and that's why it has to run as fast as it can.
我知道这可以通过迭代数组并选择我需要的列值来完成。但是,这将在大量数据上频繁执行,这就是它必须尽可能快地运行的原因。
I was thus wondering if there is a better solution?
因此我想知道是否有更好的解决方案?
Thank you.
谢谢你。
采纳答案by Slater Victoroff
If you've got a boolean array you can do direct selection based on that like so:
如果你有一个布尔数组,你可以根据它进行直接选择,如下所示:
>>> a = np.array([True, True, True, False, False])
>>> b = np.array([1,2,3,4,5])
>>> b[a]
array([1, 2, 3])
To go along with your initial example you could do the following:
为了配合您的初始示例,您可以执行以下操作:
>>> a = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> b = np.array([[False,True,False],[True,False,False],[False,False,True]])
>>> a[b]
array([2, 4, 9])
You can also add in an arange
and do direct selection on that, though depending on how you're generating your boolean array and what your code looks like YMMV.
您还可以添加一个arange
并直接选择它,但这取决于您生成布尔数组的方式以及您的代码看起来像 YMMV。
>>> a = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> a[np.arange(len(a)), [1,0,2]]
array([2, 4, 9])
Hope that helps, let me know if you've got any more questions.
希望有帮助,如果您还有其他问题,请告诉我。
回答by Ashwini Chaudhary
You can do something like this:
你可以这样做:
In [7]: a = np.array([[1, 2, 3],
...: [4, 5, 6],
...: [7, 8, 9]])
In [8]: lst = [1, 0, 2]
In [9]: a[np.arange(len(a)), lst]
Out[9]: array([2, 4, 9])
More on indexing multi-dimensional arrays: http://docs.scipy.org/doc/numpy/user/basics.indexing.html#indexing-multi-dimensional-arrays
有关索引多维数组的更多信息:http: //docs.scipy.org/doc/numpy/user/basics.indexing.html#indexing-multi-dimensional-arrays
回答by Kei Minagawa
You can do it by using iterator. Like this:
您可以使用迭代器来完成。像这样:
np.fromiter((row[index] for row, index in zip(X, Y)), dtype=int)
Time:
时间:
N = 1000
X = np.zeros(shape=(N, N))
Y = np.arange(N)
#@A?wini ?haudhary
%timeit X[np.arange(len(X)), Y]
10000 loops, best of 3: 30.7 us per loop
#mine
%timeit np.fromiter((row[index] for row, index in zip(X, Y)), dtype=int)
1000 loops, best of 3: 1.15 ms per loop
#mine
%timeit np.diag(X.T[Y])
10 loops, best of 3: 20.8 ms per loop
回答by Dhaval Mayatra
A simple way might look like:
一种简单的方法可能如下所示:
In [1]: a = np.array([[1, 2, 3],
...: [4, 5, 6],
...: [7, 8, 9]])
In [2]: y = [1, 0, 2] #list of indices we want to select from matrix 'a'
range(a.shape[0])
will return array([0, 1, 2])
range(a.shape[0])
将返回 array([0, 1, 2])
In [3]: a[range(a.shape[0]), y] #we're selecting y indices from every row
Out[3]: array([2, 4, 9])
回答by Thomas Devoogdt
Another clever way is to first transpose the array and index it thereafter. Finally, take the diagonal, its always the right answer.
另一个聪明的方法是先转置数组,然后再索引它。最后,取对角线,它总是正确的答案。
X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
Y = np.array([1, 0, 2, 2])
np.diag(X.T[Y])
Step by step:
一步步:
Original arrays:
原始数组:
>>> X
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
>>> Y
array([1, 0, 2, 2])
Transpose to make it possible to index it right.
转置以使其可以正确索引。
>>> X.T
array([[ 1, 4, 7, 10],
[ 2, 5, 8, 11],
[ 3, 6, 9, 12]])
Get rows in the Y order.
按 Y 顺序获取行。
>>> X.T[Y]
array([[ 2, 5, 8, 11],
[ 1, 4, 7, 10],
[ 3, 6, 9, 12],
[ 3, 6, 9, 12]])
The diagonal should now become clear.
对角线现在应该变得清晰。
>>> np.diag(X.T[Y])
array([ 2, 4, 9, 12]
回答by hpaulj
Recent numpy
versions have added a take_along_axis
(and put_along_axis
) that does this indexing cleanly.
最近的numpy
版本添加了一个take_along_axis
(和put_along_axis
),可以干净地进行索引。
In [101]: a = np.arange(1,10).reshape(3,3)
In [102]: b = np.array([1,0,2])
In [103]: np.take_along_axis(a, b[:,None], axis=1)
Out[103]:
array([[2],
[4],
[9]])
It operates in the same way as:
它的运作方式与:
In [104]: a[np.arange(3), b]
Out[104]: array([2, 4, 9])
but with different axis handling. It's especially aimed at applying the results of argsort
and argmax
.
但具有不同的轴处理。这是特别针对应用的结果argsort
和argmax
。