在 Pandas 数据框布尔索引中使用“相反布尔值”的正确方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33512372/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:09:59  来源:igfitidea点击:

Proper way to use "opposite boolean" in Pandas data frame boolean indexing

pythonpandasindexingboolean

提问by Mike Williamson

I wanted to use a boolean indexing, checking for rows of my data frame where a particular column does nothave NaNvalues. So, I did the following:

我想用一个布尔值索引,检查我的数据帧的行,其中特定的列并没有NaN值。所以,我做了以下事情:

import pandas as pd
my_df.loc[pd.isnull(my_df['col_of_interest']) == False].head()

to see a snippet of that data frame, including only the values that are not NaN(most values are NaN).

查看该数据框的片段,仅包括不是的值NaN(大多数值为NaN)。

It worked, but seems less-than-elegant. I'd want to type:

它有效,但似乎不够优雅。我想输入:

my_df.loc[!pd.isnull(my_df['col_of_interest'])].head()

However, that generated an error. I also spend a lot of time in R, so maybe I'm confusing things. In Python, I usually put in the syntax "not" where I can. For instance, if x is not none:, but I couldn't really do it here. Is there a more elegant way? I don't like having to put in a senseless comparison.

但是,这产生了错误。我也花了很多时间在 R 上,所以也许我把事情搞糊涂了。在 Python 中,我通常在可能的地方输入“not”语法。例如,if x is not none:,但我不能在这里真正做到。有没有更优雅的方式?我不喜欢进行毫无意义的比较。

回答by DSM

In general with pandas (and numpy), we use the bitwise NOT ~instead of !or not(whose behaviour can't be overridden by types).

一般来说,对于 Pandas(和 numpy),我们使用按位 NOT~而不是!or not(其行为不能被类型覆盖)。

While in this case we have notnull, ~can come in handy in situations where there's no special opposite method.

虽然在这种情况下,我们有notnull,~可以在没有特殊相反方法的情况下派上用场。

>>> df = pd.DataFrame({"a": [1, 2, np.nan, 3]})
>>> df.a.isnull()
0    False
1    False
2     True
3    False
Name: a, dtype: bool
>>> ~df.a.isnull()
0     True
1     True
2    False
3     True
Name: a, dtype: bool
>>> df.a.notnull()
0     True
1     True
2    False
3     True
Name: a, dtype: bool

(For completeness I'll note that -, the unary negative operator, will also work on a boolean Series but ~is the canonical choice, and -has been deprecated for numpy boolean arrays.)

(为了完整-起见,我会注意到,一元负运算符也适用于布尔系列,但~它是规范选择,并且-已被 numpy 布尔数组弃用。)

回答by Anand S Kumar

Instead of using pandas.isnull(), you should use pandas.notnull()to find the rows where the column has not null values. Example -

pandas.isnull()您应该使用pandas.notnull()来查找列不包含空值的行,而不是使用。例子 -

import pandas as pd
my_df.loc[pd.notnull(my_df['col_of_interest'])].head()

pandas.notnull()is the boolean inverse of pandas.isnull(), as given in the documentation -

pandas.notnull()是 的布尔倒数pandas.isnull(),如文档中所述 -

See also
pandas.notnull
boolean inverse of pandas.isnull

另请参见
pandas.notnull
pandas.isnull 的布尔逆