Python Pandas:ValueError:无法将浮点 NaN 转换为整数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47333227/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: ValueError: cannot convert float NaN to integer
提问by JaakL
I get ValueError: cannot convert float NaN to integerfor following:
我得到ValueError: cannot convert float NaN to integerfor following:
df = pandas.read_csv('zoom11.csv')
df[['x']] = df[['x']].astype(int)
- The "x" is obviously a column in the csv file, but I cannot spot any float NaNin the file, and dont get what does it mean by this.
- When I read the column as String, then it has values like -1,0,1,...2000, all look very nice int numbers to me.
- When I read the column as float, then this can be loaded. Then it shows values as -1.0,0.0 etc, still there are no any NaN-s
- I tried with error_bad_lines = Falseand dtype parameter in read_csv to no avail. It just cancels loading with same exception.
- The file is not small (10+ M rows), so cannot inspect it manually, when I extract a small header part, then there is no error, but it happens with full file. So it is something in the file, but cannot detect what.
- Logically the csv should not have missing values, but even if there is some garbage then I would be ok to skip the rows. Or at least identify them, but I do not see way to scan through file and report conversion errors.
- “x”显然是 csv 文件中的一列,但我无法在文件中发现任何浮点 NaN,也不明白这是什么意思。
- 当我将列读取为字符串时,它具有像 -1,0,1,...2000 这样的值,对我来说,所有的整数看起来都非常好。
- 当我将列读取为浮动时,可以加载它。然后它将值显示为 -1.0,0.0 等,仍然没有任何 NaN-s
- 我尝试在read_csv 中使用error_bad_lines = False和 dtype 参数无济于事。它只是以相同的异常取消加载。
- 该文件不小(10+ M 行),因此无法手动检查它,当我提取一个小的标题部分时,没有错误,但它发生在完整文件中。所以它是文件中的东西,但无法检测到什么。
- 从逻辑上讲,csv 不应该有缺失值,但即使有一些垃圾,我也可以跳过这些行。或者至少识别它们,但我看不到扫描文件和报告转换错误的方法。
Update: Using the hints in comments/answers I got my data clean with this:
更新:使用评论/答案中的提示,我用这个清理了我的数据:
# x contained NaN
df = df[~df['x'].isnull()]
# Y contained some other garbage, so null check was not enough
df = df[df['y'].str.isnumeric()]
# final conversion now worked
df[['x']] = df[['x']].astype(int)
df[['y']] = df[['y']].astype(int)
回答by jezrael
For identifying NaN
values use boolean indexing
:
用于识别NaN
值使用boolean indexing
:
print(df[df['x'].isnull()])
Then for remove all not numeric values use to_numeric
with parameetr errors='coerce'
- it replace non numeric to NaN
s:
然后删除所有非数字值to_numeric
与 parameetr 一起使用errors='coerce'
- 它将非数字替换为NaN
s:
df['x'] = pd.to_numeric(df['x'], errors='coerce')
And for remove all rows with NaN
s in column x
use dropna
:
并且要删除列中带有NaN
s 的所有行,请x
使用dropna
:
df = df.dropna(subset=['x'])
Last convert values to int
s:
最后将值转换为int
s:
df['x'] = df['x'].astype(int)
回答by cs95
ValueError: cannot convert float NaN to integer
ValueError:无法将浮点 NaN 转换为整数
From v0.24, you actually can. Pandas introduces Nullable Integer Data Typeswhich allows integers to coexist with NaNs.
从 v0.24 开始,您实际上可以。Pandas 引入了Nullable Integer 数据类型,它允许整数与 NaN 共存。
Given a series of whole float numbers with missing data,
给定一系列缺失数据的整数浮点数,
s = pd.Series([1.0, 2.0, np.nan, 4.0])
s
0 1.0
1 2.0
2 NaN
3 4.0
dtype: float64
s.dtype
# dtype('float64')
You can convert it to a nullable int type (choose from one of Int16
, Int32
, or Int64
) with,
您可以将其转换为可为空的 int 类型(从Int16
、Int32
、 或之一中选择Int64
),
s2 = s.astype('Int32') # note the 'I' is uppercase
s2
0 1
1 2
2 NaN
3 4
dtype: Int32
s2.dtype
# Int32Dtype()
Your column needs to have whole numbers for the cast to happen. Anything else will raise a TypeError:
您的专栏需要有整数才能进行演员表。其他任何事情都会引发 TypeError:
s = pd.Series([1.1, 2.0, np.nan, 4.0])
s.astype('Int32')
# TypeError: cannot safely cast non-equivalent float64 to int32
回答by Matt W.
I know this has been answered but wanted to provide alternate solution for anyone in the future:
我知道这已得到解答,但希望将来为任何人提供替代解决方案:
You can use .loc
to subset the dataframe by only values that are notnull()
, and then subset out the 'x'
column only. Take that same vector, and apply(int)
to it.
您可以使用.loc
仅按 的值对数据帧进行子集化notnull()
,然后'x'
仅对列进行子集化。使用相同的向量,并apply(int)
对其进行处理。
If column x is float:
如果列 x 是浮动的:
df.loc[df['x'].notnull(), 'x'] = df.loc[df['x'].notnull(), 'x'].apply(int)
回答by Luiz Fernando Lobo
Also, even at the lastest versions of pandas if the column is objecttype you would have to convert into float first, something like:
此外,即使在最新版本的熊猫中,如果列是对象类型,您也必须先转换为浮点数,例如:
df['column_name'].astype("Float32").astype("Int32")
The size of the float and int if it's 32 or 64 depends on your variable, be aware you may loose some precision if your numbers are to big for the format.
float 和 int 的大小(如果是 32 或 64)取决于您的变量,请注意,如果您的数字对于格式来说太大,您可能会失去一些精度。
回答by SATYAJIT MAITRA
if you have null value then in doing mathematical operation you will get this error to resolve it use df[~df['x'].isnull()]df[['x']].astype(int)
if you want your dataset to be unchangeable.
如果您有空值,那么在进行数学运算时,df[~df['x'].isnull()]df[['x']].astype(int)
如果您希望数据集不可更改,您将收到此错误以解决它。