pandas 无法将 nan 转换为 int(但没有 nan)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41985063/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
cannot convert nan to int (but there are no nans)
提问by ale19
I have a dataframe with a column of floats that I want to convert to int:
我有一个包含一列浮点数的数据框,我想将其转换为 int:
> df['VEHICLE_ID'].head()
0 8659366.0
1 8659368.0
2 8652175.0
3 8652174.0
4 8651488.0
In theory I should just be able to use:
从理论上讲,我应该能够使用:
> df['VEHICLE_ID'] = df['VEHICLE_ID'].astype(int)
But I get:
但我得到:
Output: ValueError: Cannot convert NA to integer
But I am pretty sure that there are no NaNs in this series:
但我很确定这个系列中没有 NaN:
> df['VEHICLE_ID'].fillna(999,inplace=True)
> df[df['VEHICLE_ID'] == 999]
> Output: Empty DataFrame
Columns: [VEHICLE_ID]
Index: []
What's going on?
这是怎么回事?
回答by EdChum
Basically the error is telling you that you NaN
values and I will show why your attempts didn't reveal this:
基本上错误是告诉你你NaN
重视,我将说明为什么你的尝试没有揭示这一点:
In [7]:
# setup some data
df = pd.DataFrame({'a':[1.0, np.NaN, 3.0, 4.0]})
df
Out[7]:
a
0 1.0
1 NaN
2 3.0
3 4.0
now try to cast:
现在尝试投射:
df['a'].astype(int)
this raises:
这提出:
ValueError: Cannot convert NA to integer
but then you tried something like this:
但后来你尝试了这样的事情:
In [5]:
for index, row in df['a'].iteritems():
if row == np.NaN:
print('index:', index, 'isnull')
this printed nothing, but NaN
cannot be evaluated like this using equality, in fact it has a special property that it will return False
when comparing against itself:
this 什么都不打印,但NaN
不能像这样使用相等进行评估,实际上它有一个特殊的属性,False
当与自身比较时它将返回:
In [6]:
for index, row in df['a'].iteritems():
if row != row:
print('index:', index, 'isnull')
index: 1 isnull
now it prints the row, you should use isnull
for readability:
现在它打印行,您应该使用isnull
可读性:
In [9]:
for index, row in df['a'].iteritems():
if pd.isnull(row):
print('index:', index, 'isnull')
index: 1 isnull
So what to do? We can drop the rows: df.dropna(subset='a')
, or we can replace using fillna
:
那么该怎么办?我们可以删除行:df.dropna(subset='a')
,或者我们可以使用fillna
:
In [8]:
df['a'].fillna(0).astype(int)
Out[8]:
0 1
1 0
2 3
3 4
Name: a, dtype: int32