Python 熊猫将字符串转换为整数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42719749/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:03:58  来源:igfitidea点击:

Pandas convert string to int

pythonpandas

提问by gmarais

I have a large dataframe with ID numbers:

我有一个带有 ID 号的大数据框:

ID.head()
Out[64]: 
0    4806105017087
1    4806105017087
2    4806105017087
3    4901295030089
4    4901295030089

These are all strings at the moment.

目前这些都是字符串。

I want to convert to intwithout using loops - for this I use ID.astype(int).

我想在int不使用循环的情况下转换为- 为此我使用ID.astype(int).

The problem is that some of my lines contain dirty data which cannot be converted to int, for e.g.

问题是我的某些行包含无法转换为的脏数据int,例如

ID[154382]
Out[58]: 'CN414149'

How can I (without using loops) remove these type of occurrences so that I can use astypewith peace of mind?

我如何(不使用循环)删除这些类型的事件,以便我可以astype安心使用?

回答by jezrael

You need add parameter errors='coerce'to function to_numeric:

您需要errors='coerce'向函数添加参数to_numeric

ID = pd.to_numeric(ID, errors='coerce')

If IDis column:

如果ID是列:

df.ID = pd.to_numeric(df.ID, errors='coerce')

but non numeric are converted to NaN, so all values are float.

但非数字被转换为NaN,所以所有值都是float

For intneed convert NaNto some value e.g. 0and then cast to int:

对于int需要转换NaN到一些值,例如,0然后转换为int

df.ID = pd.to_numeric(df.ID, errors='coerce').fillna(0).astype(np.int64)

Sample:

样本:

df = pd.DataFrame({'ID':['4806105017087','4806105017087','CN414149']})
print (df)
              ID
0  4806105017087
1  4806105017087
2       CN414149

print (pd.to_numeric(df.ID, errors='coerce'))
0    4.806105e+12
1    4.806105e+12
2             NaN
Name: ID, dtype: float64

df.ID = pd.to_numeric(df.ID, errors='coerce').fillna(0).astype(np.int64)
print (df)
              ID
0  4806105017087
1  4806105017087
2              0

EDIT: If use pandas 0.25+ then is possible use integer_na:

编辑:如果使用 Pandas 0.25+ 那么可以使用integer_na

df.ID = pd.to_numeric(df.ID, errors='coerce').astype('Int64')
print (df)
              ID
0  4806105017087
1  4806105017087
2            NaN