pandas read_csv 使用 dtypes 但列中有 na 值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/52002271/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
read_csv using dtypes but there is na value in columns
提问by
I used the following code to read csv, by specifying the types for each col:
我使用以下代码通过指定每个列的类型来读取 csv:
clean_pdf_type=pd.read_csv('table_updated.csv',usecols=col_names,dtype =col_types)
But it has the error:
但它有错误:
ValueError: Integer column has NA values in column 298
Not sure how to skip the NA?
不确定如何跳过 NA?
采纳答案by jpp
Pandas v0.24+
Pandas v0.24+
See NumPy or Pandas: Keeping array type as integer while having a NaN value
请参阅NumPy 或 Pandas:在具有 NaN 值的同时保持数组类型为整数
Pandas pre-v0.24
Pandas v0.24 之前的版本
You cannot have NaN
values in an int
dtype series. This is non-avoidable, because NaN
values are considered float
:
dtype 系列中不能有NaN
值int
。这是不可避免的,因为NaN
值被考虑float
:
import numpy as np
type(np.nan) # float
Your best bet is to read in these columns as float
instead. If you are then able to replace NaN
values by a filler value such as 0
or -1
, you can process accordingly and convert to int
:
最好的办法是阅读这些专栏float
。如果您随后能够用NaN
填充值(例如0
或 )替换值-1
,则可以进行相应处理并转换为int
:
int_cols = ['col1', 'col2', 'col3']
df[int_cols] = df[int_cols].fillna(-1)
df[int_cols] = df[int_cols].apply(pd.to_numeric, downcast='integer')
The alternative of having mixed int
and float
values will result in a series of dtype object
. It is not recommended.
混合int
和float
值的替代方法将导致一系列 dtype object
。不推荐。
回答by Frayal
clean_pdf_type=pd.read_csv('table_updated.csv',usecols=col_names)
clean_pdf_type = (clean_pdf_type.fillna(0)).astype(col_types)
As said in the comments, don't specify the type, remove the NA and then cast to a certain type
正如评论中所说,不要指定类型,删除 NA 然后转换为某种类型