删除 python pandas 中的 NaN 值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20053529/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
removing NaN values in python pandas
提问by lollercoaster
Data is of income of adults from census data, rows look like:
数据是人口普查数据中成年人的收入,行看起来像:
31, Private, 84154, Some-college, 10, Married-civ-spouse, Sales, Husband, White, Male, 0, 0, 38, NaN, >50K
48, Self-emp-not-inc, 265477, Assoc-acdm, 12, Married-civ-spouse, Prof-specialty, Husband, White, Male, 0, 0, 40, United-States, <=50K
I'm trying to remove all rows with NaNs from a DataFrame loaded from a CSV file in pandas.
我正在尝试从从 Pandas 中的 CSV 文件加载的 DataFrame 中删除所有带有 NaN 的行。
>>> import pandas as pd
>>> income = pd.read_csv('income.data')
>>> income['type'].unique()
array([ State-gov, Self-emp-not-inc, Private, Federal-gov, Local-gov,
NaN, Self-emp-inc, Without-pay, Never-worked], dtype=object)
>>> income.dropna(how='any') # should drop all rows with NaNs
>>> income['type'].unique()
array([ State-gov, Self-emp-not-inc, Private, Federal-gov, Local-gov,
NaN, Self-emp-inc, Without-pay, Never-worked], dtype=object)
Self-emp-inc, nan], dtype=object) # what??
>>> income = income.dropna(how='any') # ok, maybe reassignment will work?
>>> income['type'].unique()
array([ State-gov, Self-emp-not-inc, Private, Federal-gov, Local-gov,
NaN, Self-emp-inc, Without-pay, Never-worked], dtype=object) # what??
I tried with a smaller example.csv:
我尝试使用较小的example.csv:
label,age,sex
1,43,M
-1,NaN,F
1,65,NaN
And dropna()worked just fine here for both categorical and numerical NaNs. What is going on? I'm new to Pandas, just learning the ropes.
并且dropna()在这里对于分类和数字 NaN 都工作得很好。到底是怎么回事?我是 Pandas 的新手,刚开始学习。
回答by dorvak
As I wrote in the comment: The "NaN" has a leading whitespace (at least in the data you provided). Therefore, you need to specifiy the na_valuesparamter in the read_csvfunction.
正如我在评论中所写:“NaN”有一个前导空格(至少在您提供的数据中)。因此,您需要na_values在read_csv函数中指定参数。
Try this one:
试试这个:
df = pd.read_csv("income.csv",header=None,na_values=" NaN")
This is why your second example works, because there is no leading whitespace here.
这就是您的第二个示例有效的原因,因为这里没有前导空格。

