Python 系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36921951/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:32:46  来源:igfitidea点击:

Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

pythonpandasdataframebooleanfiltering

提问by obabs

Having issue filtering my result dataframe with an orcondition. I want my result dfto extract all column varvalues that are above 0.25 and below -0.25.

使用or条件过滤我的结果数据框时出现问题。我希望我的结果df提取所有var高于 0.25 和低于 -0.25 的列值。

This logic below gives me an ambiguous truth value however it work when I split this filtering in two separate operations. What is happening here? not sure where to use the suggested a.empty(), a.bool(), a.item(),a.any() or a.all().

下面的逻辑给了我一个不明确的真值,但是当我将此过滤拆分为两个单独的操作时它会起作用。这里发生了什么?不确定在哪里使用建议的a.empty(), a.bool(), a.item(),a.any() or a.all().

 result = result[(result['var']>0.25) or (result['var']<-0.25)]

回答by MSeifert

The orand andpython statements require truth-values. For pandasthese are considered ambiguous so you should use "bitwise" |(or) or &(and) operations:

orand蟒蛇语句需要truth-值。因为pandas这些被认为是不明确的,所以你应该使用“按位” |(或)或&(和)操作:

result = result[(result['var']>0.25) | (result['var']<-0.25)]

These are overloaded for these kind of datastructures to yield the element-wise or(or and).

对于这些类型的数据结构,它们被重载以产生元素or(或and)。



Just to add some more explanation to this statement:

只是为这个声明添加一些更多的解释:

The exception is thrown when you want to get the boolof a pandas.Series:

当您想要获取boola时抛出异常pandas.Series

>>> import pandas as pd
>>> x = pd.Series([1])
>>> bool(x)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What you hit was a place where the operator implicitlyconverted the operands to bool(you used orbut it also happens for and, ifand while):

你击中的是运算符将操作数隐式转换为的地方bool(你使用过,or但它也发生在and,ifwhile):

>>> x or x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> x and x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> if x:
...     print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> while x:
...     print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Besides these 4 statements there are several python functions that hide some boolcalls (like any, all, filter, ...) these are normally not problematic with pandas.Seriesbut for completeness I wanted to mention these.

除了这 4 条语句之外,还有几个 Python 函数隐藏了一些bool调用(如any, all, filter, ...),这些调用通常没有问题,pandas.Series但为了完整性,我想提及这些。



In your case the exception isn't really helpful, because it doesn't mention the right alternatives. For andand oryou can use (if you want element-wise comparisons):

在您的情况下,异常并没有真正的帮助,因为它没有提到正确的替代方案。For andandor你可以使用(如果你想要逐元素比较):

  • numpy.logical_or:

    >>> import numpy as np
    >>> np.logical_or(x, y)
    

    or simply the |operator:

    >>> x | y
    
  • numpy.logical_and:

    >>> np.logical_and(x, y)
    

    or simply the &operator:

    >>> x & y
    
  • numpy.logical_or

    >>> import numpy as np
    >>> np.logical_or(x, y)
    

    或简单的|操作员:

    >>> x | y
    
  • numpy.logical_and

    >>> np.logical_and(x, y)
    

    或简单的&操作员:

    >>> x & y
    

If you're using the operators then make sure you set your parenthesis correctly because of the operator precedence.

如果您使用运算符,请确保正确设置括号,因为运算符优先级

There are several logical numpy functionswhich shouldwork on pandas.Series.

几个逻辑numpy的功能,应该工作的pandas.Series



The alternatives mentioned in the Exception are more suited if you encountered it when doing ifor while. I'll shortly explain each of these:

如果您在执行if或时遇到它,则 Exception 中提到的替代方案更适合while。我将简要解释其中的每一个:

  • If you want to check if your Series is empty:

    >>> x = pd.Series([])
    >>> x.empty
    True
    >>> x = pd.Series([1])
    >>> x.empty
    False
    

    Python normally interprets the length of containers (like list, tuple, ...) as truth-value if it has no explicit boolean interpretation. So if you want the python-like check, you could do: if x.sizeor if not x.emptyinstead of if x.

  • If your Seriescontains one and only oneboolean value:

    >>> x = pd.Series([100])
    >>> (x > 50).bool()
    True
    >>> (x < 50).bool()
    False
    
  • If you want to check the first and only itemof your Series (like .bool()but works even for not boolean contents):

    >>> x = pd.Series([100])
    >>> x.item()
    100
    
  • If you want to check if allor anyitem is not-zero, not-empty or not-False:

    >>> x = pd.Series([0, 1, 2])
    >>> x.all()   # because one element is zero
    False
    >>> x.any()   # because one (or more) elements are non-zero
    True
    
  • 如果您想检查您的系列是否为

    >>> x = pd.Series([])
    >>> x.empty
    True
    >>> x = pd.Series([1])
    >>> x.empty
    False
    

    如果没有明确的布尔解释,Python 通常将len容器的gth(如list, tuple, ...)解释为真值。所以如果你想要类似 python 的检查,你可以这样做:if x.sizeif not x.empty代替if x.

  • 如果您Series包含一个且只有一个布尔值:

    >>> x = pd.Series([100])
    >>> (x > 50).bool()
    True
    >>> (x < 50).bool()
    False
    
  • 如果您想检查系列的第一个也是唯一的项目(例如.bool()但即使对于非布尔内容也有效):

    >>> x = pd.Series([100])
    >>> x.item()
    100
    
  • 如果要检查所有任何项目是否不为零、不为空或不为假:

    >>> x = pd.Series([0, 1, 2])
    >>> x.all()   # because one element is zero
    False
    >>> x.any()   # because one (or more) elements are non-zero
    True
    

回答by Alexander

For boolean logic, use &and |.

对于布尔逻辑,请使用&|

np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))

>>> df
          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
2  0.950088 -0.151357 -0.103219
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

>>> df.loc[(df.C > 0.25) | (df.C < -0.25)]
          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

To see what is happening, you get a column of booleans for each comparison, e.g.

要查看发生了什么,您会为每次比较获得一列布尔值,例如

df.C > 0.25
0     True
1    False
2    False
3     True
4     True
Name: C, dtype: bool

When you have multiple criteria, you will get multiple columns returned. This is why the the join logic is ambiguous. Using andor ortreats each column separately, so you first need to reduce that column to a single boolean value. For example, to see if any value or all values in each of the columns is True.

当您有多个条件时,您将返回多个列。这就是连接逻辑不明确的原因。分别使用andor处理每一列,因此您首先需要将该列减少为单个布尔值。例如,查看每列中的任何值或所有值是否为 True。

# Any value in either column is True?
(df.C > 0.25).any() or (df.C < -0.25).any()
True

# All values in either column is True?
(df.C > 0.25).all() or (df.C < -0.25).all()
False

One convoluted way to achieve the same thing is to zip all of these columns together, and perform the appropriate logic.

实现相同目的的一种复杂方法是​​将所有这些列压缩在一起,并执行适当的逻辑。

>>> df[[any([a, b]) for a, b in zip(df.C > 0.25, df.C < -0.25)]]
          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

For more details, refer to Boolean Indexingin the docs.

有关更多详细信息,请参阅文档中的布尔索引

回答by Nipun

Well pandas use bitwise '&' '|' and each condition should be wrapped in a '()'

大熊猫使用按位 '&' '|' 并且每个条件都应包含在“()”中

For example following works

例如以下作品

data_query = data[(data['year'] >= 2005) & (data['year'] <= 2010)]

But the same query without proper brackets does not

但是没有适当括号的相同查询不会

data_query = data[(data['year'] >= 2005 & data['year'] <= 2010)]

回答by C?nh Toàn Nguy?n

Or, alternatively, you could use Operator module. More detailed information is here Python docs

或者,您也可以使用 Operator 模块。更详细的信息在这里Python 文档

import operator
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df.loc[operator.or_(df.C > 0.25, df.C < -0.25)]

          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.4438

回答by bli

This excellent answerexplains very well what is happening and provides a solution. I would like to add another solution that might be suitable in similar cases: using the querymethod:

这个出色的答案很好地解释了正在发生的事情并提供了解决方案。我想添加另一个可能适用于类似情况的解决方案:使用以下query方法:

result = result.query("(var > 0.25) or (var < -0.25)")

See also http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-query.

另请参阅http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-query

(Some tests with a dataframe I'm currently working with suggest that this method is a bit slower than using the bitwise operators on series of booleans: 2 ms vs. 870 μs)

(我目前正在使用的数据帧的一些测试表明,这种方法比在一系列布尔值上使用按位运算符慢一点:2 ms vs. 870 μs)

A piece of warning: At least one situation where this is not straightforward is when column names happen to be python expressions. I had columns named WT_38hph_IP_2, WT_38hph_input_2and log2(WT_38hph_IP_2/WT_38hph_input_2)and wanted to perform the following query: "(log2(WT_38hph_IP_2/WT_38hph_input_2) > 1) and (WT_38hph_IP_2 > 20)"

一条警告:至少有一种情况并不简单,即列名恰好是 Python 表达式。我有名为 的列WT_38hph_IP_2WT_38hph_input_2并且log2(WT_38hph_IP_2/WT_38hph_input_2)想要执行以下查询:"(log2(WT_38hph_IP_2/WT_38hph_input_2) > 1) and (WT_38hph_IP_2 > 20)"

I obtained the following exception cascade:

我获得了以下异常级联:

  • KeyError: 'log2'
  • UndefinedVariableError: name 'log2' is not defined
  • ValueError: "log2" is not a supported function
  • KeyError: 'log2'
  • UndefinedVariableError: name 'log2' is not defined
  • ValueError: "log2" is not a supported function

I guess this happened because the query parser was trying to make something from the first two columns instead of identifying the expression with the name of the third column.

我猜这是因为查询解析器试图从前两列做一些事情,而不是用第三列的名称来识别表达式。

A possible workaround is proposed here.

此处提出一种可能的解决方法。

回答by iretex

I encountered the same error and got stalled with a pyspark dataframe for few days, I was able to resolve it successfully by filling na values with 0since I was comparing integer values from 2 fields.

我遇到了同样的错误,并且在 pyspark 数据帧中停滞了几天,我能够通过用 0 填充 na 值来成功解决它,因为我正在比较 2 个字段的整数值。