pandas read_csv 使用 dtypes 但列中有 na 值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/52002271/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:58:38  来源:igfitidea点击:

read_csv using dtypes but there is na value in columns

pythonpandascsvdataframe

提问by

I used the following code to read csv, by specifying the types for each col:

我使用以下代码通过指定每个列的类型来读取 csv:

clean_pdf_type=pd.read_csv('table_updated.csv',usecols=col_names,dtype =col_types)

But it has the error:

但它有错误:

ValueError: Integer column has NA values in column 298 

Not sure how to skip the NA?

不确定如何跳过 NA?

采纳答案by jpp

Pandas v0.24+

Pandas v0.24+

See NumPy or Pandas: Keeping array type as integer while having a NaN value

请参阅NumPy 或 Pandas:在具有 NaN 值的同时保持数组类型为整数

Pandas pre-v0.24

Pandas v0.24 之前的版本

You cannot have NaNvalues in an intdtype series. This is non-avoidable, because NaNvalues are considered float:

dtype 系列中不能有NaNint。这是不可避免的,因为NaN值被考虑float

import numpy as np
type(np.nan)  # float

Your best bet is to read in these columns as floatinstead. If you are then able to replace NaNvalues by a filler value such as 0or -1, you can process accordingly and convert to int:

最好的办法是阅读这些专栏float。如果您随后能够用NaN填充值(例如0或 )替换值-1,则可以进行相应处理并转换为int

int_cols = ['col1', 'col2', 'col3']
df[int_cols] = df[int_cols].fillna(-1)
df[int_cols] = df[int_cols].apply(pd.to_numeric, downcast='integer')

The alternative of having mixed intand floatvalues will result in a series of dtype object. It is not recommended.

混合intfloat值的替代方法将导致一系列 dtype object。不推荐。

回答by Frayal

clean_pdf_type=pd.read_csv('table_updated.csv',usecols=col_names)
clean_pdf_type = (clean_pdf_type.fillna(0)).astype(col_types)

As said in the comments, don't specify the type, remove the NA and then cast to a certain type

正如评论中所说,不要指定类型,删除 NA 然后转换为某种类型