Python numpy.where(condition) 的输出不是数组,而是数组元组:为什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33747908/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
output of numpy.where(condition) is not an array, but a tuple of arrays: why?
提问by Fabio
I am experimenting with the numpy.where(condition[, x, y])
function.
From the numpy documentation,I learn that if you give just one array as input, it should return the indices where the array is non-zero (i.e. "True"):
我正在试验这个numpy.where(condition[, x, y])
功能。
从numpy 文档中,我了解到如果你只提供一个数组作为输入,它应该返回数组非零的索引(即“真”):
If only condition is given, return the tuple condition.nonzero(), the indices where condition is True.
如果只给出条件,则返回元组 condition.nonzero(),条件为 True 的索引。
But if try it, it returns me a tupleof two elements, where the first is the wanted list of indices, and the second is a null element:
但是如果尝试一下,它会返回一个包含两个元素的元组,其中第一个是想要的索引列表,第二个是一个空元素:
>>> import numpy as np
>>> array = np.array([1,2,3,4,5,6,7,8,9])
>>> np.where(array>4)
(array([4, 5, 6, 7, 8]),) # notice the comma before the last parenthesis
so the question is: why? what is the purpose of this behaviour? in what situation this is useful?
Indeed, to get the wanted list of indices I have to add the indexing, as in np.where(array>4)[0]
, which seems... "ugly".
所以问题是:为什么?这种行为的目的是什么?在什么情况下这是有用的?事实上,为了获得想要的索引列表,我必须添加索引,如np.where(array>4)[0]
,这看起来......“丑陋”。
ADDENDUM
附录
I understand (from some answers) that it is actually a tuple of just one element. Still I don't understand why to give the output in this way. To illustrate how this is not ideal, consider the following error (which motivated my question in the first place):
我理解(从一些答案中)它实际上只是一个元素的元组。我仍然不明白为什么要以这种方式给出输出。为了说明这如何不理想,请考虑以下错误(这首先激发了我的问题):
>>> import numpy as np
>>> array = np.array([1,2,3,4,5,6,7,8,9])
>>> pippo = np.where(array>4)
>>> pippo + 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only concatenate tuple (not "int") to tuple
so that you need to do some indexing to access the actual array of indices:
所以你需要做一些索引来访问实际的索引数组:
>>> pippo[0] + 1
array([5, 6, 7, 8, 9])
采纳答案by hpaulj
In Python (1)
means just 1
. ()
can be freely added to group numbers and expressions for human readability (e.g. (1+3)*3
v (1+3,)*3
). Thus to denote a 1 element tuple it uses (1,)
(and requires you to use it as well).
在 Python 中(1)
意味着只是1
. ()
可以自由添加到组号和表达式中以提高可读性(例如(1+3)*3
v (1+3,)*3
)。因此,要表示它使用的 1 元素元组(1,)
(并要求您也使用它)。
Thus
因此
(array([4, 5, 6, 7, 8]),)
is a one element tuple, that element being an array.
是一个单元素元组,该元素是一个数组。
If you applied where
to a 2d array, the result would be a 2 element tuple.
如果您应用于where
2d 数组,结果将是一个 2 元素元组。
The result of where
is such that it can be plugged directly into an indexing slot, e.g.
结果where
是它可以直接插入索引槽,例如
a[where(a>0)]
a[a>0]
should return the same things
应该返回相同的东西
as would
一样
I,J = where(a>0) # a is 2d
a[I,J]
a[(I,J)]
Or with your example:
或者用你的例子:
In [278]: a=np.array([1,2,3,4,5,6,7,8,9])
In [279]: np.where(a>4)
Out[279]: (array([4, 5, 6, 7, 8], dtype=int32),) # tuple
In [280]: a[np.where(a>4)]
Out[280]: array([5, 6, 7, 8, 9])
In [281]: I=np.where(a>4)
In [282]: I
Out[282]: (array([4, 5, 6, 7, 8], dtype=int32),)
In [283]: a[I]
Out[283]: array([5, 6, 7, 8, 9])
In [286]: i, = np.where(a>4) # note the , on LHS
In [287]: i
Out[287]: array([4, 5, 6, 7, 8], dtype=int32) # not tuple
In [288]: a[i]
Out[288]: array([5, 6, 7, 8, 9])
In [289]: a[(i,)]
Out[289]: array([5, 6, 7, 8, 9])
======================
======================
np.flatnonzero
shows the correct way of returning just one array, regardless of the dimensions of the input array.
np.flatnonzero
显示了只返回一个数组的正确方法,而不管输入数组的维数。
In [299]: np.flatnonzero(a>4)
Out[299]: array([4, 5, 6, 7, 8], dtype=int32)
In [300]: np.flatnonzero(a>4)+10
Out[300]: array([14, 15, 16, 17, 18], dtype=int32)
It's doc says:
它的医生说:
This is equivalent to a.ravel().nonzero()[0]
这相当于 a.ravel().nonzero()[0]
In fact that is literally what the function does.
事实上,这就是函数所做的。
By flattening a
removes the question of what to do with multiple dimensions. And then it takes the response out of the tuple, giving you a plain array. With flattening it doesn't have make a special case for 1d arrays.
通过展平a
消除了如何处理多个维度的问题。然后它从元组中取出响应,为您提供一个普通数组。通过展平,它不会对一维数组进行特殊处理。
===========================
============================
@Divakar suggests np.argwhere
:
@Divakar 建议np.argwhere
:
In [303]: np.argwhere(a>4)
Out[303]:
array([[4],
[5],
[6],
[7],
[8]], dtype=int32)
which does np.transpose(np.where(a>4))
哪个 np.transpose(np.where(a>4))
Or if you don't like the column vector, you could transpose it again
或者如果你不喜欢列向量,你可以再次转置它
In [307]: np.argwhere(a>4).T
Out[307]: array([[4, 5, 6, 7, 8]], dtype=int32)
except now it is a 1xn array.
除了现在它是一个 1xn 数组。
We could just as well have wrapped where
in array
:
我们也可以包含where
在array
:
In [311]: np.array(np.where(a>4))
Out[311]: array([[4, 5, 6, 7, 8]], dtype=int32)
Lots of ways of taking an array out the where
tuple ([0]
, i,=
, transpose
, array
, etc).
大量的以阵列出来的方式where
元组([0]
,i,=
,transpose
,array
,等等)。
回答by jakevdp
Short answer: np.where
is designed to have consistent output regardless of the dimension of the array.
简短回答:np.where
无论数组的维度如何,都旨在获得一致的输出。
A two-dimensional array has two indices, so the result of np.where
is a length-2 tuple containing the relevant indices. This generalizes to a length-3 tuple for 3-dimensions, a length-4 tuple for 4 dimensions, or a length-N tuple for N dimensions. By this rule, it is clear that in 1 dimension, the result should be a length-1 tuple.
二维数组有两个索引,因此结果np.where
是一个包含相关索引的长度为 2 的元组。这可以推广到 3 维的长度为 3 的元组、4 维的长度为 4 的元组或 N 维的长度为 N 的元组。根据这个规则,很明显,在一维中,结果应该是一个长度为 1 的元组。
回答by Panagiotis Simakis
Just use np.asarray
function. In your case:
就用np.asarray
函数吧。在你的情况下:
>>> import numpy as np
>>> array = np.array([1,2,3,4,5,6,7,8,9])
>>> pippo = np.asarray(np.where(array>4))
>>> pippo + 1
array([[5, 6, 7, 8, 9]])