删除 python pandas 中的 NaN 值

Question

提问by lollercoaster

Data is of income of adults from census data, rows look like:

数据是人口普查数据中成年人的收入，行看起来像：

31, Private, 84154, Some-college, 10, Married-civ-spouse, Sales, Husband, White, Male, 0, 0, 38, NaN, >50K
48, Self-emp-not-inc, 265477, Assoc-acdm, 12, Married-civ-spouse, Prof-specialty, Husband, White, Male, 0, 0, 40, United-States, <=50K

I'm trying to remove all rows with NaNs from a DataFrame loaded from a CSV file in pandas.

我正在尝试从从 Pandas 中的 CSV 文件加载的 DataFrame 中删除所有带有 NaN 的行。

>>> import pandas as pd
>>> income = pd.read_csv('income.data')
>>> income['type'].unique()
array([ State-gov,  Self-emp-not-inc,  Private,  Federal-gov,  Local-gov,
    NaN,  Self-emp-inc,  Without-pay,  Never-worked], dtype=object)
>>> income.dropna(how='any') # should drop all rows with NaNs
>>> income['type'].unique()
array([ State-gov,  Self-emp-not-inc,  Private,  Federal-gov,  Local-gov,
    NaN,  Self-emp-inc,  Without-pay,  Never-worked], dtype=object)
    Self-emp-inc, nan], dtype=object) # what??
>>> income = income.dropna(how='any') # ok, maybe reassignment will work?
>>> income['type'].unique()
array([ State-gov,  Self-emp-not-inc,  Private,  Federal-gov,  Local-gov,
    NaN,  Self-emp-inc,  Without-pay,  Never-worked], dtype=object) # what??

I tried with a smaller example.csv:

我尝试使用较小的example.csv：

label,age,sex
1,43,M
-1,NaN,F
1,65,NaN

And dropna()worked just fine here for both categorical and numerical NaNs. What is going on? I'm new to Pandas, just learning the ropes.

并且dropna()在这里对于分类和数字 NaN 都工作得很好。到底是怎么回事？我是 Pandas 的新手，刚开始学习。

Answer 1

回答by dorvak

As I wrote in the comment: The "NaN" has a leading whitespace (at least in the data you provided). Therefore, you need to specifiy the na_valuesparamter in the read_csvfunction.

正如我在评论中所写：“NaN”有一个前导空格（至少在您提供的数据中）。因此，您需要na_values在read_csv函数中指定参数。

Try this one:

试试这个：

df = pd.read_csv("income.csv",header=None,na_values=" NaN")

This is why your second example works, because there is no leading whitespace here.

这就是您的第二个示例有效的原因，因为这里没有前导空格。

删除 python pandas 中的 NaN 值

提问by lollercoaster

回答by dorvak

相关推荐

最近更新

标签

删除 python pandas 中的 NaN 值

提问by lollercoaster

回答by dorvak

相关推荐

DataFrame.drop 不会删除 Pandas 中的预期行

将对象类型更改为 datetime64[ns]-pandas

带有大 .dta 文件的 Pandas read_stata()

pandas 使用混合列类型增加数据框的单元格值

相关推荐

最近更新

标签