错误:无法在 Pandas 中将浮点 NaN 转换为整数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44896377/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:55:41  来源:igfitidea点击:

Error:cannot convert float NaN to integer in pandas

pythonpandas

提问by

I have the dataframe:

我有数据框:

   a            b     c      d
0 nan           Y     nan   nan
1  1.27838e+06  N      3     96
2 nan           N      2    nan
3  284633       Y     nan    44

I try to change the data which is non zero to interger type to avoid exponential data(1.27838e+06):

我尝试将非零数据更改为整数类型以避免指数数据(1.27838e+06):

f=lambda x : int(x)
df['a']=np.where(df['a']==None,np.nan,df['a'].apply(f))

But I get error also event thought I wish to change the dtype of not null value, anyone can point out my error? thanks

但是我也收到错误,认为我希望更改非空值的 dtype,任何人都可以指出我的错误吗?谢谢

回答by Ken Wei

Pandas doesn't have the ability to store NaN values for integers. Strictly speaking, you could have a column with mixed data types, but this can be computationally inefficient. So if you insist, you can do

Pandas 不能为 integers 存储 NaN 值。严格来说,您可以有一个包含混合数据类型的列,但这在计算上可能效率低下。所以如果你坚持,你可以做到

df['a'] = df['a'].astype('O')
df.loc[df['a'].notnull(), 'a'] = df.loc[df['a'].notnull(), 'a'].astype(int)

回答by lmiguelvargasf

As far as I have read in the pandas documentation, it is not possible to represent an integer NaN:

据我在Pandas文档中读到的,不可能表示一个整数NaN

"In the absence of high performance NA support being built into NumPy from the ground up, the primary casualty is the ability to represent NAs in integer arrays."

“在 NumPy 中没有从头开始内置高性能 NA 支持的情况下,主要的损失是在整数数组中表示 NA 的能力。”

As it is explained later, it is due to memory and performance reasons, and also so that the resulting Series continues to be “numeric”. One possibility is to use dtype=objectarrays instead.

正如后面解释的那样,这是由于内存和性能原因,而且结果系列仍然是“数字”。一种可能性是改用dtype=object数组。