Python 系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36921951/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
提问by obabs
Having issue filtering my result dataframe with an or
condition. I want my result df
to extract all column var
values that are above 0.25 and below -0.25.
使用or
条件过滤我的结果数据框时出现问题。我希望我的结果df
提取所有var
高于 0.25 和低于 -0.25 的列值。
This logic below gives me an ambiguous truth value however it work when I split this filtering in two separate operations. What is happening here? not sure where to use the suggested a.empty(), a.bool(), a.item(),a.any() or a.all()
.
下面的逻辑给了我一个不明确的真值,但是当我将此过滤拆分为两个单独的操作时它会起作用。这里发生了什么?不确定在哪里使用建议的a.empty(), a.bool(), a.item(),a.any() or a.all()
.
result = result[(result['var']>0.25) or (result['var']<-0.25)]
回答by MSeifert
The or
and and
python statements require truth
-values. For pandas
these are considered ambiguous so you should use "bitwise" |
(or) or &
(and) operations:
在or
和and
蟒蛇语句需要truth
-值。因为pandas
这些被认为是不明确的,所以你应该使用“按位” |
(或)或&
(和)操作:
result = result[(result['var']>0.25) | (result['var']<-0.25)]
These are overloaded for these kind of datastructures to yield the element-wise or
(or and
).
对于这些类型的数据结构,它们被重载以产生元素or
(或and
)。
Just to add some more explanation to this statement:
只是为这个声明添加一些更多的解释:
The exception is thrown when you want to get the bool
of a pandas.Series
:
当您想要获取bool
a时抛出异常pandas.Series
:
>>> import pandas as pd
>>> x = pd.Series([1])
>>> bool(x)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
What you hit was a place where the operator implicitlyconverted the operands to bool
(you used or
but it also happens for and
, if
and while
):
你击中的是运算符将操作数隐式转换为的地方bool
(你使用过,or
但它也发生在and
,if
和while
):
>>> x or x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> x and x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> if x:
... print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> while x:
... print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Besides these 4 statements there are several python functions that hide some bool
calls (like any
, all
, filter
, ...) these are normally not problematic with pandas.Series
but for completeness I wanted to mention these.
除了这 4 条语句之外,还有几个 Python 函数隐藏了一些bool
调用(如any
, all
, filter
, ...),这些调用通常没有问题,pandas.Series
但为了完整性,我想提及这些。
In your case the exception isn't really helpful, because it doesn't mention the right alternatives. For and
and or
you can use (if you want element-wise comparisons):
在您的情况下,异常并没有真正的帮助,因为它没有提到正确的替代方案。For and
andor
你可以使用(如果你想要逐元素比较):
>>> import numpy as np >>> np.logical_or(x, y)
or simply the
|
operator:>>> x | y
>>> np.logical_and(x, y)
or simply the
&
operator:>>> x & y
>>> import numpy as np >>> np.logical_or(x, y)
或简单的
|
操作员:>>> x | y
>>> np.logical_and(x, y)
或简单的
&
操作员:>>> x & y
If you're using the operators then make sure you set your parenthesis correctly because of the operator precedence.
如果您使用运算符,请确保正确设置括号,因为运算符优先级。
There are several logical numpy functionswhich shouldwork on pandas.Series
.
有几个逻辑numpy的功能,它应该工作的pandas.Series
。
The alternatives mentioned in the Exception are more suited if you encountered it when doing if
or while
. I'll shortly explain each of these:
如果您在执行if
或时遇到它,则 Exception 中提到的替代方案更适合while
。我将简要解释其中的每一个:
If you want to check if your Series is empty:
>>> x = pd.Series([]) >>> x.empty True >>> x = pd.Series([1]) >>> x.empty False
Python normally interprets the
len
gth of containers (likelist
,tuple
, ...) as truth-value if it has no explicit boolean interpretation. So if you want the python-like check, you could do:if x.size
orif not x.empty
instead ofif x
.If your
Series
contains one and only oneboolean value:>>> x = pd.Series([100]) >>> (x > 50).bool() True >>> (x < 50).bool() False
If you want to check the first and only itemof your Series (like
.bool()
but works even for not boolean contents):>>> x = pd.Series([100]) >>> x.item() 100
If you want to check if allor anyitem is not-zero, not-empty or not-False:
>>> x = pd.Series([0, 1, 2]) >>> x.all() # because one element is zero False >>> x.any() # because one (or more) elements are non-zero True
如果您想检查您的系列是否为空:
>>> x = pd.Series([]) >>> x.empty True >>> x = pd.Series([1]) >>> x.empty False
如果没有明确的布尔解释,Python 通常将
len
容器的gth(如list
,tuple
, ...)解释为真值。所以如果你想要类似 python 的检查,你可以这样做:if x.size
或if not x.empty
代替if x
.如果您
Series
包含一个且只有一个布尔值:>>> x = pd.Series([100]) >>> (x > 50).bool() True >>> (x < 50).bool() False
如果您想检查系列的第一个也是唯一的项目(例如
.bool()
但即使对于非布尔内容也有效):>>> x = pd.Series([100]) >>> x.item() 100
如果要检查所有或任何项目是否不为零、不为空或不为假:
>>> x = pd.Series([0, 1, 2]) >>> x.all() # because one element is zero False >>> x.any() # because one (or more) elements are non-zero True
回答by Alexander
For boolean logic, use &
and |
.
对于布尔逻辑,请使用&
和|
。
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
>>> df
A B C
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
2 0.950088 -0.151357 -0.103219
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
>>> df.loc[(df.C > 0.25) | (df.C < -0.25)]
A B C
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
To see what is happening, you get a column of booleans for each comparison, e.g.
要查看发生了什么,您会为每次比较获得一列布尔值,例如
df.C > 0.25
0 True
1 False
2 False
3 True
4 True
Name: C, dtype: bool
When you have multiple criteria, you will get multiple columns returned. This is why the the join logic is ambiguous. Using and
or or
treats each column separately, so you first need to reduce that column to a single boolean value. For example, to see if any value or all values in each of the columns is True.
当您有多个条件时,您将返回多个列。这就是连接逻辑不明确的原因。分别使用and
或or
处理每一列,因此您首先需要将该列减少为单个布尔值。例如,查看每列中的任何值或所有值是否为 True。
# Any value in either column is True?
(df.C > 0.25).any() or (df.C < -0.25).any()
True
# All values in either column is True?
(df.C > 0.25).all() or (df.C < -0.25).all()
False
One convoluted way to achieve the same thing is to zip all of these columns together, and perform the appropriate logic.
实现相同目的的一种复杂方法是将所有这些列压缩在一起,并执行适当的逻辑。
>>> df[[any([a, b]) for a, b in zip(df.C > 0.25, df.C < -0.25)]]
A B C
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
For more details, refer to Boolean Indexingin the docs.
有关更多详细信息,请参阅文档中的布尔索引。
回答by Nipun
Well pandas use bitwise '&' '|' and each condition should be wrapped in a '()'
大熊猫使用按位 '&' '|' 并且每个条件都应包含在“()”中
For example following works
例如以下作品
data_query = data[(data['year'] >= 2005) & (data['year'] <= 2010)]
But the same query without proper brackets does not
但是没有适当括号的相同查询不会
data_query = data[(data['year'] >= 2005 & data['year'] <= 2010)]
回答by C?nh Toàn Nguy?n
Or, alternatively, you could use Operator module. More detailed information is here Python docs
或者,您也可以使用 Operator 模块。更详细的信息在这里Python 文档
import operator
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df.loc[operator.or_(df.C > 0.25, df.C < -0.25)]
A B C
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.4438
回答by bli
This excellent answerexplains very well what is happening and provides a solution. I would like to add another solution that might be suitable in similar cases: using the query
method:
这个出色的答案很好地解释了正在发生的事情并提供了解决方案。我想添加另一个可能适用于类似情况的解决方案:使用以下query
方法:
result = result.query("(var > 0.25) or (var < -0.25)")
See also http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-query.
另请参阅http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-query。
(Some tests with a dataframe I'm currently working with suggest that this method is a bit slower than using the bitwise operators on series of booleans: 2 ms vs. 870 μs)
(我目前正在使用的数据帧的一些测试表明,这种方法比在一系列布尔值上使用按位运算符慢一点:2 ms vs. 870 μs)
A piece of warning: At least one situation where this is not straightforward is when column names happen to be python expressions. I had columns named WT_38hph_IP_2
, WT_38hph_input_2
and log2(WT_38hph_IP_2/WT_38hph_input_2)
and wanted to perform the following query: "(log2(WT_38hph_IP_2/WT_38hph_input_2) > 1) and (WT_38hph_IP_2 > 20)"
一条警告:至少有一种情况并不简单,即列名恰好是 Python 表达式。我有名为 的列WT_38hph_IP_2
,WT_38hph_input_2
并且log2(WT_38hph_IP_2/WT_38hph_input_2)
想要执行以下查询:"(log2(WT_38hph_IP_2/WT_38hph_input_2) > 1) and (WT_38hph_IP_2 > 20)"
I obtained the following exception cascade:
我获得了以下异常级联:
KeyError: 'log2'
UndefinedVariableError: name 'log2' is not defined
ValueError: "log2" is not a supported function
KeyError: 'log2'
UndefinedVariableError: name 'log2' is not defined
ValueError: "log2" is not a supported function
I guess this happened because the query parser was trying to make something from the first two columns instead of identifying the expression with the name of the third column.
我猜这是因为查询解析器试图从前两列做一些事情,而不是用第三列的名称来识别表达式。
A possible workaround is proposed here.
回答by iretex
I encountered the same error and got stalled with a pyspark dataframe for few days, I was able to resolve it successfully by filling na values with 0since I was comparing integer values from 2 fields.
我遇到了同样的错误,并且在 pyspark 数据帧中停滞了几天,我能够通过用 0 填充 na 值来成功解决它,因为我正在比较 2 个字段的整数值。