pandas 将字符串列转换为整数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39694192/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:05:16  来源:igfitidea点击:

Convert string column to integer

pythonstringpandasnumpyint

提问by billboard

I have a dataframe like below

我有一个如下所示的数据框

    a   b
0   1   26190
1   5   python
2   5   580

I want to make column bto host only integers, but as you can see pythonis not int convertible, so I want to delete the row at index 1. My expected out put has to be like

我想让列b只承载整数,但正如你所看到的,python它不是 int 可转换的,所以我想删除 index 处的行1。我的预期输出必须像

    a   b
0   1   26190
1   5   580

How to filter and remove using pandas in python?

如何在python中使用pandas过滤和删除?

回答by jezrael

You can use to_numericwith notnulland filter by boolean indexing:

您可以使用to_numericwithnotnull和过滤方式boolean indexing

print (pd.to_numeric(df.b, errors='coerce'))
0    26190.0
1        NaN
2      580.0
Name: b, dtype: float64

print (pd.to_numeric(df.b, errors='coerce').notnull())
0     True
1    False
2     True
Name: b, dtype: bool

df = df[pd.to_numeric(df.b, errors='coerce').notnull()]
print (df)

   a      b
0  1  26190
2  5    580

Another solution by comment of Boud- use to_numericwith dropnaand last convert to intby astype:

Boud评论的另一个解决方案- 使用to_numericwithdropna并最后转换为intby astype

df.b = pd.to_numeric(df.b, errors='coerce')
df = df.dropna(subset=['b'])
df.b = df.b. astype(int)
print (df)
   a      b
0  1  26190
2  5    580


If need check all rows with bad data use isnull- filter all data where after applying function to_numericget NaN:

如果需要检查所有具有错误数据的行,请使用isnull- 在应用函数to_numericget后过滤所有数据NaN

print (pd.to_numeric(df.b, errors='coerce').isnull())
0    False
1     True
2    False
Name: b, dtype: bool

print (df[pd.to_numeric(df.b, errors='coerce').isnull()])
   a       b
1  5  python

回答by conor

This should work

这应该工作

import pandas as pd
import numpy as np

df = pd.DataFrame({'a' : [1, 5, 5],
                   'b' : [26190, 'python', 580]})
df
   a       b
0  1   26190
1  5  python
2  5     580

df['b'] = np.where(df.b.str.contains('[a-z]') == True, np.NaN, df.b)
df
   a      b
0  1  26190
1  5    NaN
2  5    580

df = df.dropna()
df
   a      b
0  1  26190
2  5    580

You use the regex to identify strings, then convert these to np.NaNusing np.wherethen drop them from the df with df.dropna().

您使用正则表达式来识别字符串,然后将它们转换为np.NaNusingnp.where然后将它们从 df 中删除df.dropna()