Python 系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()

Question

提问by obabs

Having issue filtering my result dataframe with an orcondition. I want my result dfto extract all column varvalues that are above 0.25 and below -0.25.

使用or条件过滤我的结果数据框时出现问题。我希望我的结果df提取所有var高于 0.25 和低于 -0.25 的列值。

This logic below gives me an ambiguous truth value however it work when I split this filtering in two separate operations. What is happening here? not sure where to use the suggested a.empty(), a.bool(), a.item(),a.any() or a.all().

下面的逻辑给了我一个不明确的真值，但是当我将此过滤拆分为两个单独的操作时它会起作用。这里发生了什么？不确定在哪里使用建议的a.empty(), a.bool(), a.item(),a.any() or a.all().

 result = result[(result['var']>0.25) or (result['var']<-0.25)]

Answer 1

回答by MSeifert

The orand andpython statements require truth-values. For pandasthese are considered ambiguous so you should use "bitwise" |(or) or &(and) operations:

在or和and蟒蛇语句需要truth-值。因为pandas这些被认为是不明确的，所以你应该使用“按位” |（或）或&（和）操作：

result = result[(result['var']>0.25) | (result['var']<-0.25)]

These are overloaded for these kind of datastructures to yield the element-wise or(or and).

对于这些类型的数据结构，它们被重载以产生元素or（或and）。

Just to add some more explanation to this statement:

只是为这个声明添加一些更多的解释：

The exception is thrown when you want to get the boolof a pandas.Series:

当您想要获取boola时抛出异常pandas.Series：

>>> import pandas as pd
>>> x = pd.Series([1])
>>> bool(x)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What you hit was a place where the operator implicitlyconverted the operands to bool(you used orbut it also happens for and, ifand while):

你击中的是运算符将操作数隐式转换为的地方bool（你使用过，or但它也发生在and,if和while）：

>>> x or x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> x and x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> if x:
...     print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> while x:
...     print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Besides these 4 statements there are several python functions that hide some boolcalls (like any, all, filter, ...) these are normally not problematic with pandas.Seriesbut for completeness I wanted to mention these.

除了这 4 条语句之外，还有几个 Python 函数隐藏了一些bool调用（如any, all, filter, ...），这些调用通常没有问题，pandas.Series但为了完整性，我想提及这些。

In your case the exception isn't really helpful, because it doesn't mention the right alternatives. For andand oryou can use (if you want element-wise comparisons):

在您的情况下，异常并没有真正的帮助，因为它没有提到正确的替代方案。For andandor你可以使用（如果你想要逐元素比较）：

numpy.logical_or:

>>> import numpy as np
>>> np.logical_or(x, y)

or simply the |operator:

>>> x | y

numpy.logical_and:
```
>>> np.logical_and(x, y)
```
or simply the &operator:
```
>>> x & y
```

numpy.logical_or：

>>> import numpy as np
>>> np.logical_or(x, y)

或简单的|操作员：

>>> x | y

numpy.logical_and：
```
>>> np.logical_and(x, y)
```
或简单的&操作员：
```
>>> x & y
```

If you're using the operators then make sure you set your parenthesis correctly because of the operator precedence.

如果您使用运算符，请确保正确设置括号，因为运算符优先级。

There are several logical numpy functionswhich shouldwork on pandas.Series.

有几个逻辑numpy的功能，它应该工作的pandas.Series。

The alternatives mentioned in the Exception are more suited if you encountered it when doing ifor while. I'll shortly explain each of these:

如果您在执行if或时遇到它，则 Exception 中提到的替代方案更适合while。我将简要解释其中的每一个：

If you want to check if your Series is empty:
```
>>> x = pd.Series([])
>>> x.empty
True
>>> x = pd.Series([1])
>>> x.empty
False
```
Python normally interprets the length of containers (like list, tuple, ...) as truth-value if it has no explicit boolean interpretation. So if you want the python-like check, you could do: if x.sizeor if not x.emptyinstead of if x.

If your Seriescontains one and only oneboolean value:

>>> x = pd.Series([100])
>>> (x > 50).bool()
True
>>> (x < 50).bool()
False

If you want to check the first and only itemof your Series (like .bool()but works even for not boolean contents):
```
>>> x = pd.Series([100])
>>> x.item()
100
```

If you want to check if allor anyitem is not-zero, not-empty or not-False:

>>> x = pd.Series([0, 1, 2])
>>> x.all()   # because one element is zero
False
>>> x.any()   # because one (or more) elements are non-zero
True

如果您想检查您的系列是否为空：
```
>>> x = pd.Series([])
>>> x.empty
True
>>> x = pd.Series([1])
>>> x.empty
False
```
如果没有明确的布尔解释，Python 通常将len容器的gth（如list, tuple, ...）解释为真值。所以如果你想要类似 python 的检查，你可以这样做：if x.size或if not x.empty代替if x.

如果您Series包含一个且只有一个布尔值：

>>> x = pd.Series([100])
>>> (x > 50).bool()
True
>>> (x < 50).bool()
False

如果您想检查系列的第一个也是唯一的项目（例如.bool()但即使对于非布尔内容也有效）：
```
>>> x = pd.Series([100])
>>> x.item()
100
```

如果要检查所有或任何项目是否不为零、不为空或不为假：

>>> x = pd.Series([0, 1, 2])
>>> x.all()   # because one element is zero
False
>>> x.any()   # because one (or more) elements are non-zero
True

Answer 2

回答by Alexander

For boolean logic, use &and |.

对于布尔逻辑，请使用&和|。

np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))

>>> df
          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
2  0.950088 -0.151357 -0.103219
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

>>> df.loc[(df.C > 0.25) | (df.C < -0.25)]
          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

To see what is happening, you get a column of booleans for each comparison, e.g.

要查看发生了什么，您会为每次比较获得一列布尔值，例如

df.C > 0.25
0     True
1    False
2    False
3     True
4     True
Name: C, dtype: bool

When you have multiple criteria, you will get multiple columns returned. This is why the the join logic is ambiguous. Using andor ortreats each column separately, so you first need to reduce that column to a single boolean value. For example, to see if any value or all values in each of the columns is True.

当您有多个条件时，您将返回多个列。这就是连接逻辑不明确的原因。分别使用and或or处理每一列，因此您首先需要将该列减少为单个布尔值。例如，查看每列中的任何值或所有值是否为 True。

# Any value in either column is True?
(df.C > 0.25).any() or (df.C < -0.25).any()
True

# All values in either column is True?
(df.C > 0.25).all() or (df.C < -0.25).all()
False

One convoluted way to achieve the same thing is to zip all of these columns together, and perform the appropriate logic.

实现相同目的的一种复杂方法是将所有这些列压缩在一起，并执行适当的逻辑。

>>> df[[any([a, b]) for a, b in zip(df.C > 0.25, df.C < -0.25)]]
          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

For more details, refer to Boolean Indexingin the docs.

有关更多详细信息，请参阅文档中的布尔索引。

Answer 3

回答by Nipun

Well pandas use bitwise '&' '|' and each condition should be wrapped in a '()'

大熊猫使用按位 '&' '|' 并且每个条件都应包含在“()”中

For example following works

例如以下作品

data_query = data[(data['year'] >= 2005) & (data['year'] <= 2010)]

But the same query without proper brackets does not

但是没有适当括号的相同查询不会

data_query = data[(data['year'] >= 2005 & data['year'] <= 2010)]

Answer 4

回答by C?nh Toàn Nguy?n

Or, alternatively, you could use Operator module. More detailed information is here Python docs

或者，您也可以使用 Operator 模块。更详细的信息在这里Python 文档

import operator
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df.loc[operator.or_(df.C > 0.25, df.C < -0.25)]

          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.4438

Answer 5

回答by bli

This excellent answerexplains very well what is happening and provides a solution. I would like to add another solution that might be suitable in similar cases: using the querymethod:

这个出色的答案很好地解释了正在发生的事情并提供了解决方案。我想添加另一个可能适用于类似情况的解决方案：使用以下query方法：

result = result.query("(var > 0.25) or (var < -0.25)")

See also http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-query.

另请参阅http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-query。

(Some tests with a dataframe I'm currently working with suggest that this method is a bit slower than using the bitwise operators on series of booleans: 2 ms vs. 870 μs)

（我目前正在使用的数据帧的一些测试表明，这种方法比在一系列布尔值上使用按位运算符慢一点：2 ms vs. 870 μs）

A piece of warning: At least one situation where this is not straightforward is when column names happen to be python expressions. I had columns named WT_38hph_IP_2, WT_38hph_input_2and log2(WT_38hph_IP_2/WT_38hph_input_2)and wanted to perform the following query: "(log2(WT_38hph_IP_2/WT_38hph_input_2) > 1) and (WT_38hph_IP_2 > 20)"

一条警告：至少有一种情况并不简单，即列名恰好是 Python 表达式。我有名为的列WT_38hph_IP_2，WT_38hph_input_2并且log2(WT_38hph_IP_2/WT_38hph_input_2)想要执行以下查询："(log2(WT_38hph_IP_2/WT_38hph_input_2) > 1) and (WT_38hph_IP_2 > 20)"

I obtained the following exception cascade:

我获得了以下异常级联：

KeyError: 'log2'
UndefinedVariableError: name 'log2' is not defined
ValueError: "log2" is not a supported function

KeyError: 'log2'
UndefinedVariableError: name 'log2' is not defined
ValueError: "log2" is not a supported function

I guess this happened because the query parser was trying to make something from the first two columns instead of identifying the expression with the name of the third column.

我猜这是因为查询解析器试图从前两列做一些事情，而不是用第三列的名称来识别表达式。

A possible workaround is proposed here.

此处提出了一种可能的解决方法。

Answer 6

回答by iretex

I encountered the same error and got stalled with a pyspark dataframe for few days, I was able to resolve it successfully by filling na values with 0since I was comparing integer values from 2 fields.

我遇到了同样的错误，并且在 pyspark 数据帧中停滞了几天，我能够通过用 0 填充 na 值来成功解决它，因为我正在比较 2 个字段的整数值。

Python 系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()

提问by obabs

回答by MSeifert

回答by Alexander

回答by Nipun

回答by C?nh Toàn Nguy?n

回答by bli

回答by iretex

相关推荐

最近更新

标签

Python 系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()

提问by obabs

回答by MSeifert

回答by Alexander

回答by Nipun

回答by C?nh Toàn Nguy?n

回答by bli

回答by iretex

相关推荐

Python 使用 pyODBC 的 fast_executemany 加速 pandas.DataFrame.to_sql

如何在 Python 3.4 或 Python 2.7 上安装 win32com.client

如何使用 Python 将新列附加到 CSV 文件？

Python pip3：找不到命令

相关推荐

最近更新

标签