Python 为什么测试 `NaN == NaN` 不能用于从 Pandas 数据帧中删除？

Question

提问by idoda

Please explain how NaN's are treated in pandas because the following logic seems "broken" to me, I tried various ways (shown below) to drop the empty values.

请解释如何在 Pandas 中处理 NaN，因为以下逻辑对我来说似乎“破坏”了，我尝试了各种方法（如下所示）来删除空值。

My dataframe, which I load from a CSV file using read.csv, has a column comments, which is empty most of the time.

我使用从 CSV 文件加载的数据框read.csv有一个列comments，该列大部分时间为空。

The column marked_results.commentslooks like this; all the rest of the column is NaN, so pandas loads empty entries as NaNs, so far so good:

该列marked_results.comments看起来像这样；该列的所有其余部分都是 NaN，因此 Pandas 将空条目加载为 NaN，到目前为止一切顺利：

0       VP
1       VP
2       VP
3     TEST
4      NaN
5      NaN
....

Now I try to drop those entries, only this works:

现在我尝试删除这些条目，只有这样才有效：

marked_results.comments.isnull()

marked_results.comments.isnull()

All these don't work:

所有这些都不起作用：

marked_results.comments.dropna()only gives the same column, nothing gets dropped, confusing.
marked_results.comments == NaNonly gives a series of all Falses. Nothing was NaNs... confusing.
likewise marked_results.comments == nan

marked_results.comments.dropna()只给出相同的列，没有任何东西被丢弃，令人困惑。
marked_results.comments == NaN只给出一系列所有的Falses。没有什么是 NaN ......令人困惑。
同样地 marked_results.comments == nan

I also tried:

我也试过：

comments_values = marked_results.comments.unique()

array(['VP', 'TEST', nan], dtype=object)

# Ah, gotya! so now ive tried:
marked_results.comments == comments_values[2]
# but still all the results are Falses!!!

Answer 1

采纳答案by Andy Hayden

You should use isnulland notnullto test for NaN (these are more robust using pandas dtypes than numpy), see "values considered missing" in the docs.

您应该使用isnull和notnull来测试 NaN（使用 Pandas dtypes 比使用 numpy 更健壮），请参阅文档中的“被认为缺失的值”。

Using the Series method dropnaon a column won't affect the original dataframe, but do what you want:

dropna在列上使用 Series 方法不会影响原始数据框，但可以执行您想要的操作：

In [11]: df
Out[11]:
  comments
0       VP
1       VP
2       VP
3     TEST
4      NaN
5      NaN

In [12]: df.comments.dropna()
Out[12]:
0      VP
1      VP
2      VP
3    TEST
Name: comments, dtype: object

The dropnaDataFramemethod has a subset argument (to drop rows which have NaNs in specific columns):

该数据帧的方法有一个子集的参数（于具有在特定列的NaN降行）：dropna

In [13]: df.dropna(subset=['comments'])
Out[13]:
  comments
0       VP
1       VP
2       VP
3     TEST

In [14]: df = df.dropna(subset=['comments'])

Answer 2

回答by Sukrit Kalra

You need to test NaNwith math.isnan()function (Or numpy.isnan). NaNs cannot be checked with the equality operator.

您需要NaN使用math.isnan()函数 (Or numpy.isnan)进行测试。无法使用相等运算符检查 NaN。

>>> a = float('NaN')
>>> a
nan
>>> a == 'NaN'
False
>>> isnan(a)
True
>>> a == float('NaN')
False

Help Function ->

帮助功能 ->

isnan(...)
    isnan(x) -> bool

    Check if float x is not a number (NaN).

Python 为什么测试 `NaN == NaN` 不能用于从 Pandas 数据帧中删除？

提问by idoda

采纳答案by Andy Hayden

回答by Sukrit Kalra

相关推荐

最近更新

标签

Python 为什么测试 `NaN == NaN` 不能用于从 Pandas 数据帧中删除？

提问by idoda

采纳答案by Andy Hayden

回答by Sukrit Kalra

相关推荐

Python 将 Pandas 数据框中的一列从 int 转换为字符串

如何在 Python 中进行按位 Not 操作？

python中的对数y轴箱

检查元组在 Python 中是否有任何空/无值的最佳方法是什么？

相关推荐

最近更新

标签