pandas read_csv 使用 dtypes 但列中有 na 值

Question

提问by

I used the following code to read csv, by specifying the types for each col:

我使用以下代码通过指定每个列的类型来读取 csv：

clean_pdf_type=pd.read_csv('table_updated.csv',usecols=col_names,dtype =col_types)

But it has the error:

但它有错误：

ValueError: Integer column has NA values in column 298

Not sure how to skip the NA?

不确定如何跳过 NA？

Answer 1

采纳答案by jpp

Pandas v0.24+

See NumPy or Pandas: Keeping array type as integer while having a NaN value

请参阅NumPy 或 Pandas：在具有 NaN 值的同时保持数组类型为整数

Pandas pre-v0.24

Pandas v0.24 之前的版本

You cannot have NaNvalues in an intdtype series. This is non-avoidable, because NaNvalues are considered float:

dtype 系列中不能有NaN值int。这是不可避免的，因为NaN值被考虑float：

import numpy as np
type(np.nan)  # float

Your best bet is to read in these columns as floatinstead. If you are then able to replace NaNvalues by a filler value such as 0or -1, you can process accordingly and convert to int:

最好的办法是阅读这些专栏float。如果您随后能够用NaN填充值（例如0或）替换值-1，则可以进行相应处理并转换为int：

int_cols = ['col1', 'col2', 'col3']
df[int_cols] = df[int_cols].fillna(-1)
df[int_cols] = df[int_cols].apply(pd.to_numeric, downcast='integer')

The alternative of having mixed intand floatvalues will result in a series of dtype object. It is not recommended.

混合int和float值的替代方法将导致一系列 dtype object。不推荐。

Answer 2

回答by Frayal

clean_pdf_type=pd.read_csv('table_updated.csv',usecols=col_names)
clean_pdf_type = (clean_pdf_type.fillna(0)).astype(col_types)

As said in the comments, don't specify the type, remove the NA and then cast to a certain type

正如评论中所说，不要指定类型，删除 NA 然后转换为某种类型

pandas read_csv 使用 dtypes 但列中有 na 值

提问by

采纳答案by jpp

Pandas v0.24+

Pandas v0.24+

Pandas pre-v0.24

Pandas v0.24 之前的版本

回答by Frayal

相关推荐

最近更新

标签

pandas read_csv 使用 dtypes 但列中有 na 值

提问by

采纳答案by jpp

Pandas v0.24+

Pandas v0.24+

Pandas pre-v0.24

Pandas v0.24 之前的版本

回答by Frayal

相关推荐

Pandas：如果条件从另一列更新列值

在 Pandas 0.23+ 中删除空列

Pandas groupby 和 value_counts

从 Pandas 系列列表中获取唯一值

相关推荐

最近更新

标签