Python NumPy 使用索引列表选择每行特定的列索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23435782/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
NumPy selecting specific column index per row by using a list of indexes
提问by Zee
I'm struggling to select the specific columns per row of a NumPymatrix.
我正在努力选择NumPy矩阵每行的特定列。
Suppose I have the following matrix which I would call X:
假设我有以下矩阵,我会称之为X:
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
I also have a listof column indexes per every row which I would call Y:
我list每行都有一个列索引,我称之为Y:
[1, 0, 2]
I need to get the values:
我需要获取值:
[2]
[4]
[9]
Instead of a listwith indexes Y, I can also produce a matrix with the same shape as Xwhere every column is a bool/ intin the range 0-1 value, indicating whether this is the required column.
除了listwith 索引Y,我还可以生成一个形状与X其中每一列都是0-1 值范围内的bool/相同形状的矩阵int,指示这是否是所需的列。
[0, 1, 0]
[1, 0, 0]
[0, 0, 1]
I know this can be done with iterating over the array and selecting the column values I need. However, this will be executed frequently on big arrays of data and that's why it has to run as fast as it can.
我知道这可以通过迭代数组并选择我需要的列值来完成。但是,这将在大量数据上频繁执行,这就是它必须尽可能快地运行的原因。
I was thus wondering if there is a better solution?
因此我想知道是否有更好的解决方案?
Thank you.
谢谢你。
采纳答案by Slater Victoroff
If you've got a boolean array you can do direct selection based on that like so:
如果你有一个布尔数组,你可以根据它进行直接选择,如下所示:
>>> a = np.array([True, True, True, False, False])
>>> b = np.array([1,2,3,4,5])
>>> b[a]
array([1, 2, 3])
To go along with your initial example you could do the following:
为了配合您的初始示例,您可以执行以下操作:
>>> a = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> b = np.array([[False,True,False],[True,False,False],[False,False,True]])
>>> a[b]
array([2, 4, 9])
You can also add in an arangeand do direct selection on that, though depending on how you're generating your boolean array and what your code looks like YMMV.
您还可以添加一个arange并直接选择它,但这取决于您生成布尔数组的方式以及您的代码看起来像 YMMV。
>>> a = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> a[np.arange(len(a)), [1,0,2]]
array([2, 4, 9])
Hope that helps, let me know if you've got any more questions.
希望有帮助,如果您还有其他问题,请告诉我。
回答by Ashwini Chaudhary
You can do something like this:
你可以这样做:
In [7]: a = np.array([[1, 2, 3],
...: [4, 5, 6],
...: [7, 8, 9]])
In [8]: lst = [1, 0, 2]
In [9]: a[np.arange(len(a)), lst]
Out[9]: array([2, 4, 9])
More on indexing multi-dimensional arrays: http://docs.scipy.org/doc/numpy/user/basics.indexing.html#indexing-multi-dimensional-arrays
有关索引多维数组的更多信息:http: //docs.scipy.org/doc/numpy/user/basics.indexing.html#indexing-multi-dimensional-arrays
回答by Kei Minagawa
You can do it by using iterator. Like this:
您可以使用迭代器来完成。像这样:
np.fromiter((row[index] for row, index in zip(X, Y)), dtype=int)
Time:
时间:
N = 1000
X = np.zeros(shape=(N, N))
Y = np.arange(N)
#@A?wini ?haudhary
%timeit X[np.arange(len(X)), Y]
10000 loops, best of 3: 30.7 us per loop
#mine
%timeit np.fromiter((row[index] for row, index in zip(X, Y)), dtype=int)
1000 loops, best of 3: 1.15 ms per loop
#mine
%timeit np.diag(X.T[Y])
10 loops, best of 3: 20.8 ms per loop
回答by Dhaval Mayatra
A simple way might look like:
一种简单的方法可能如下所示:
In [1]: a = np.array([[1, 2, 3],
...: [4, 5, 6],
...: [7, 8, 9]])
In [2]: y = [1, 0, 2] #list of indices we want to select from matrix 'a'
range(a.shape[0])will return array([0, 1, 2])
range(a.shape[0])将返回 array([0, 1, 2])
In [3]: a[range(a.shape[0]), y] #we're selecting y indices from every row
Out[3]: array([2, 4, 9])
回答by Thomas Devoogdt
Another clever way is to first transpose the array and index it thereafter. Finally, take the diagonal, its always the right answer.
另一个聪明的方法是先转置数组,然后再索引它。最后,取对角线,它总是正确的答案。
X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
Y = np.array([1, 0, 2, 2])
np.diag(X.T[Y])
Step by step:
一步步:
Original arrays:
原始数组:
>>> X
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
>>> Y
array([1, 0, 2, 2])
Transpose to make it possible to index it right.
转置以使其可以正确索引。
>>> X.T
array([[ 1, 4, 7, 10],
[ 2, 5, 8, 11],
[ 3, 6, 9, 12]])
Get rows in the Y order.
按 Y 顺序获取行。
>>> X.T[Y]
array([[ 2, 5, 8, 11],
[ 1, 4, 7, 10],
[ 3, 6, 9, 12],
[ 3, 6, 9, 12]])
The diagonal should now become clear.
对角线现在应该变得清晰。
>>> np.diag(X.T[Y])
array([ 2, 4, 9, 12]
回答by hpaulj
Recent numpyversions have added a take_along_axis(and put_along_axis) that does this indexing cleanly.
最近的numpy版本添加了一个take_along_axis(和put_along_axis),可以干净地进行索引。
In [101]: a = np.arange(1,10).reshape(3,3)
In [102]: b = np.array([1,0,2])
In [103]: np.take_along_axis(a, b[:,None], axis=1)
Out[103]:
array([[2],
[4],
[9]])
It operates in the same way as:
它的运作方式与:
In [104]: a[np.arange(3), b]
Out[104]: array([2, 4, 9])
but with different axis handling. It's especially aimed at applying the results of argsortand argmax.
但具有不同的轴处理。这是特别针对应用的结果argsort和argmax。

