Python Pandas:ValueError:无法将浮点 NaN 转换为整数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47333227/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:10:12  来源:igfitidea点击:

Pandas: ValueError: cannot convert float NaN to integer

pythonpandascsv

提问by JaakL

I get ValueError: cannot convert float NaN to integerfor following:

我得到ValueError: cannot convert float NaN to integerfor following:

df = pandas.read_csv('zoom11.csv')
df[['x']] = df[['x']].astype(int)
  • The "x" is obviously a column in the csv file, but I cannot spot any float NaNin the file, and dont get what does it mean by this.
  • When I read the column as String, then it has values like -1,0,1,...2000, all look very nice int numbers to me.
  • When I read the column as float, then this can be loaded. Then it shows values as -1.0,0.0 etc, still there are no any NaN-s
  • I tried with error_bad_lines = Falseand dtype parameter in read_csv to no avail. It just cancels loading with same exception.
  • The file is not small (10+ M rows), so cannot inspect it manually, when I extract a small header part, then there is no error, but it happens with full file. So it is something in the file, but cannot detect what.
  • Logically the csv should not have missing values, but even if there is some garbage then I would be ok to skip the rows. Or at least identify them, but I do not see way to scan through file and report conversion errors.
  • “x”显然是 csv 文件中的一列,但我无法在文件中发现任何浮点 NaN,也不明白这是什么意思。
  • 当我将列读取为字符串时,它具有像 -1,0,1,...2000 这样的值,对我来说,所有的整数看起来都非常好。
  • 当我将列读取为浮动时,可以加载它。然后它将值显示为 -1.0,0.0 等,仍然没有任何 NaN-s
  • 我尝试在read_csv 中使用error_bad_lines = False和 dtype 参数无济于事。它只是以相同的异常取消加载。
  • 该文件不小(10+ M 行),因此无法手动检查它,当我提取一个小的标题部分时,没有错误,但它发生在完整文件中。所以它是文件中的东西,但无法检测到什么。
  • 从逻辑上讲,csv 不应该有缺失值,但即使有一些垃圾,我也可以跳过这些行。或者至少识别它们,但我看不到扫描文件和报告转换错误的方法。

Update: Using the hints in comments/answers I got my data clean with this:

更新:使用评论/答案中的提示,我用这个清理了我的数据:

# x contained NaN
df = df[~df['x'].isnull()]

# Y contained some other garbage, so null check was not enough
df = df[df['y'].str.isnumeric()]

# final conversion now worked
df[['x']] = df[['x']].astype(int)
df[['y']] = df[['y']].astype(int)

回答by jezrael

For identifying NaNvalues use boolean indexing:

用于识别NaN值使用boolean indexing

print(df[df['x'].isnull()])

Then for remove all not numeric values use to_numericwith parameetr errors='coerce'- it replace non numeric to NaNs:

然后删除所有非数字值to_numeric与 parameetr 一起使用errors='coerce'- 它将非数字替换为NaNs:

df['x'] = pd.to_numeric(df['x'], errors='coerce')

And for remove all rows with NaNs in column xuse dropna:

并且要删除列中带有NaNs 的所有行,请x使用dropna

df = df.dropna(subset=['x'])

Last convert values to ints:

最后将值转换为ints:

df['x'] = df['x'].astype(int)

回答by cs95

ValueError: cannot convert float NaN to integer

ValueError:无法将浮点 NaN 转换为整数

From v0.24, you actually can. Pandas introduces Nullable Integer Data Typeswhich allows integers to coexist with NaNs.

从 v0.24 开始,您实际上可以。Pandas 引入了Nullable Integer 数据类型,它允许整数与 NaN 共存。

Given a series of whole float numbers with missing data,

给定一系列缺失数据的整数浮点数,

s = pd.Series([1.0, 2.0, np.nan, 4.0])
s

0    1.0
1    2.0
2    NaN
3    4.0
dtype: float64

s.dtype
# dtype('float64')

You can convert it to a nullable int type (choose from one of Int16, Int32, or Int64) with,

您可以将其转换为可为空的 int 类型(从Int16Int32、 或之一中选择Int64),

s2 = s.astype('Int32') # note the 'I' is uppercase
s2

0      1
1      2
2    NaN
3      4
dtype: Int32

s2.dtype
# Int32Dtype()

Your column needs to have whole numbers for the cast to happen. Anything else will raise a TypeError:

您的专栏需要有整数才能进行演员表。其他任何事情都会引发 TypeError:

s = pd.Series([1.1, 2.0, np.nan, 4.0])

s.astype('Int32')
# TypeError: cannot safely cast non-equivalent float64 to int32

回答by Matt W.

I know this has been answered but wanted to provide alternate solution for anyone in the future:

我知道这已得到解答,但希望将来为任何人提供替代解决方案:

You can use .locto subset the dataframe by only values that are notnull(), and then subset out the 'x'column only. Take that same vector, and apply(int)to it.

您可以使用.loc仅按 的值对数据帧进行子集化notnull(),然后'x'仅对列进行子集化。使用相同的向量,并apply(int)对其进行处理。

If column x is float:

如果列 x 是浮动的:

df.loc[df['x'].notnull(), 'x'] = df.loc[df['x'].notnull(), 'x'].apply(int)

回答by Luiz Fernando Lobo

Also, even at the lastest versions of pandas if the column is objecttype you would have to convert into float first, something like:

此外,即使在最新版本的熊猫中,如果列是对象类型,您也必须先转换为浮点数,例如:

df['column_name'].astype("Float32").astype("Int32")

The size of the float and int if it's 32 or 64 depends on your variable, be aware you may loose some precision if your numbers are to big for the format.

float 和 int 的大小(如果是 32 或 64)取决于您的变量,请注意,如果您的数字对于格式来说太大,您可能会失去一些精度。

回答by SATYAJIT MAITRA

if you have null value then in doing mathematical operation you will get this error to resolve it use df[~df['x'].isnull()]df[['x']].astype(int)if you want your dataset to be unchangeable.

如果您有空值,那么在进行数学运算时,df[~df['x'].isnull()]df[['x']].astype(int)如果您希望数据集不可更改,您将收到此错误以解决它。