Python 使用 or 语句在多个条件下对 Pandas 进行切片/选择

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42082385/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:13:05  来源:igfitidea点击:

Pandas slicing/selecting with multiple conditions with or statement

pythonpython-3.xpandas

提问by jtorca

When I select by chaining different conditions with "AND" the selection works fine. When I select by chaining conditions with "OR" the selection throws an error.

当我通过用“AND”链接不同的条件进行选择时,选择工作正常。当我通过使用“OR”链接条件进行选择时,选择会引发错误。

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame([[1,4,3],[2,3,5],[4,5,6],[3,2,5]], 
...     columns=['a', 'b', 'c'])
>>> df
   a  b  c
0  1  4  3
1  2  3  5
2  4  5  6
3  3  2  5
>>> df.loc[(df.a != 1) & (df.b < 5)]
   a  b  c
1  2  3  5
3  3  2  5
>>> df.loc[(df.a != 1) or (df.b < 5)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 731, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I would expect it to return the whole dataframe as all rows meet this condition.

我希望它返回整个数据帧,因为所有行都满足此条件。

回答by Steve Barnes

The important thing to note is that &is not identical to andthey are different things so the "or" equivalent to to &is |

需要注意的重要一点是,&不等于and他们是不同的东西,因此“或”等同于对&IS|

Normally both &and |are bitwiselogical operators rather than the python "logical" operators.

通常,&|都是按位逻辑运算符,而不是python“逻辑”运算符。

In pandas these operators are overloaded for Seriesoperation.

在熊猫中,这些运算符被重载以进行Series操作。

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame([[1,4,3],[2,3,5],[4,5,6],[3,2,5]], columns=['a', 'b',
   ...:  'c'])

In [4]: df
Out[4]:
   a  b  c
0  1  4  3
1  2  3  5
2  4  5  6
3  3  2  5

In [5]: df.loc[(df.a != 1) & (df.b < 5)]
Out[5]:
   a  b  c
1  2  3  5
3  3  2  5

In [6]: df.loc[(df.a != 1) | (df.b < 5)]
Out[6]:
   a  b  c
0  1  4  3
1  2  3  5
2  4  5  6
3  3  2  5