pandas 无法将 nan 转换为 int(但没有 nan)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41985063/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:53:50  来源:igfitidea点击:

cannot convert nan to int (but there are no nans)

pandas

提问by ale19

I have a dataframe with a column of floats that I want to convert to int:

我有一个包含一列浮点数的数据框,我想将其转换为 int:

> df['VEHICLE_ID'].head()
0    8659366.0
1    8659368.0
2    8652175.0
3    8652174.0
4    8651488.0

In theory I should just be able to use:

从理论上讲,我应该能够使用:

> df['VEHICLE_ID'] = df['VEHICLE_ID'].astype(int)

But I get:

但我得到:

Output: ValueError: Cannot convert NA to integer

But I am pretty sure that there are no NaNs in this series:

但我很确定这个系列中没有 NaN:

> df['VEHICLE_ID'].fillna(999,inplace=True)
> df[df['VEHICLE_ID'] == 999]
> Output: Empty DataFrame
Columns: [VEHICLE_ID]
Index: []

What's going on?

这是怎么回事?

回答by EdChum

Basically the error is telling you that you NaNvalues and I will show why your attempts didn't reveal this:

基本上错误是告诉你你NaN重视,我将说明为什么你的尝试没有揭示这一点:

In [7]:
# setup some data
df = pd.DataFrame({'a':[1.0, np.NaN, 3.0, 4.0]})
df
Out[7]:
     a
0  1.0
1  NaN
2  3.0
3  4.0

now try to cast:

现在尝试投射:

df['a'].astype(int)

this raises:

这提出:

ValueError: Cannot convert NA to integer

but then you tried something like this:

但后来你尝试了这样的事情:

In [5]:
for index, row in df['a'].iteritems():
    if row == np.NaN:
        print('index:', index, 'isnull')

this printed nothing, but NaNcannot be evaluated like this using equality, in fact it has a special property that it will return Falsewhen comparing against itself:

this 什么都不打印,但NaN不能像这样使用相等进行评估,实际上它有一个特殊的属性,False当与自身比较时它将返回:

In [6]:
for index, row in df['a'].iteritems():
    if row != row:
        print('index:', index, 'isnull')

index: 1 isnull

now it prints the row, you should use isnullfor readability:

现在它打印行,您应该使用isnull可读性:

In [9]:
for index, row in df['a'].iteritems():
    if pd.isnull(row):
        print('index:', index, 'isnull')

index: 1 isnull

So what to do? We can drop the rows: df.dropna(subset='a'), or we can replace using fillna:

那么该怎么办?我们可以删除行:df.dropna(subset='a'),或者我们可以使用fillna

In [8]:
df['a'].fillna(0).astype(int)

Out[8]:
0    1
1    0
2    3
3    4
Name: a, dtype: int32