Python 为什么测试 `NaN == NaN` 不能用于从 Pandas 数据帧中删除?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17969878/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why does testing `NaN == NaN` not work for dropping from a pandas dataFrame?
提问by idoda
Please explain how NaN's are treated in pandas because the following logic seems "broken" to me, I tried various ways (shown below) to drop the empty values.
请解释如何在 Pandas 中处理 NaN,因为以下逻辑对我来说似乎“破坏”了,我尝试了各种方法(如下所示)来删除空值。
My dataframe, which I load from a CSV file using read.csv
, has a column comments
, which is empty most of the time.
我使用从 CSV 文件加载的数据框read.csv
有一个列comments
,该列大部分时间为空。
The column marked_results.comments
looks like this; all the rest of the column is NaN, so pandas loads empty entries as NaNs, so far so good:
该列marked_results.comments
看起来像这样;该列的所有其余部分都是 NaN,因此 Pandas 将空条目加载为 NaN,到目前为止一切顺利:
0 VP
1 VP
2 VP
3 TEST
4 NaN
5 NaN
....
Now I try to drop those entries, only this works:
现在我尝试删除这些条目,只有这样才有效:
marked_results.comments.isnull()
marked_results.comments.isnull()
All these don't work:
所有这些都不起作用:
marked_results.comments.dropna()
only gives the same column, nothing gets dropped, confusing.marked_results.comments == NaN
only gives a series of allFalse
s. Nothing was NaNs... confusing.- likewise
marked_results.comments == nan
marked_results.comments.dropna()
只给出相同的列,没有任何东西被丢弃,令人困惑。marked_results.comments == NaN
只给出一系列所有的False
s。没有什么是 NaN ......令人困惑。- 同样地
marked_results.comments == nan
I also tried:
我也试过:
comments_values = marked_results.comments.unique()
array(['VP', 'TEST', nan], dtype=object)
# Ah, gotya! so now ive tried:
marked_results.comments == comments_values[2]
# but still all the results are Falses!!!
采纳答案by Andy Hayden
You should use isnull
and notnull
to test for NaN (these are more robust using pandas dtypes than numpy), see "values considered missing" in the docs.
您应该使用isnull
和notnull
来测试 NaN(使用 Pandas dtypes 比使用 numpy 更健壮),请参阅文档中的“被认为缺失的值”。
Using the Series method dropna
on a column won't affect the original dataframe, but do what you want:
dropna
在列上使用 Series 方法不会影响原始数据框,但可以执行您想要的操作:
In [11]: df
Out[11]:
comments
0 VP
1 VP
2 VP
3 TEST
4 NaN
5 NaN
In [12]: df.comments.dropna()
Out[12]:
0 VP
1 VP
2 VP
3 TEST
Name: comments, dtype: object
The dropna
DataFramemethod has a subset argument (to drop rows which have NaNs in specific columns):
该数据帧的方法有一个子集的参数(于具有在特定列的NaN降行):dropna
In [13]: df.dropna(subset=['comments'])
Out[13]:
comments
0 VP
1 VP
2 VP
3 TEST
In [14]: df = df.dropna(subset=['comments'])
回答by Sukrit Kalra
You need to test NaN
with math.isnan()
function (Or numpy.isnan
). NaNs cannot be checked with the equality operator.
您需要NaN
使用math.isnan()
函数 (Or numpy.isnan
)进行测试。无法使用相等运算符检查 NaN。
>>> a = float('NaN')
>>> a
nan
>>> a == 'NaN'
False
>>> isnan(a)
True
>>> a == float('NaN')
False
Help Function ->
帮助功能 ->
isnan(...)
isnan(x) -> bool
Check if float x is not a number (NaN).