pandas 将字符串列转换为整数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39694192/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert string column to integer
提问by billboard
I have a dataframe like below
我有一个如下所示的数据框
a b
0 1 26190
1 5 python
2 5 580
I want to make column b
to host only integers, but as you can see python
is not int convertible, so I want to delete the row at index 1
. My expected out put has to be like
我想让列b
只承载整数,但正如你所看到的,python
它不是 int 可转换的,所以我想删除 index 处的行1
。我的预期输出必须像
a b
0 1 26190
1 5 580
How to filter and remove using pandas in python?
如何在python中使用pandas过滤和删除?
回答by jezrael
You can use to_numeric
with notnull
and filter by boolean indexing
:
您可以使用to_numeric
withnotnull
和过滤方式boolean indexing
:
print (pd.to_numeric(df.b, errors='coerce'))
0 26190.0
1 NaN
2 580.0
Name: b, dtype: float64
print (pd.to_numeric(df.b, errors='coerce').notnull())
0 True
1 False
2 True
Name: b, dtype: bool
df = df[pd.to_numeric(df.b, errors='coerce').notnull()]
print (df)
a b
0 1 26190
2 5 580
Another solution by comment of Boud- use to_numeric
with dropna
and last convert to int
by astype
:
Boud评论的另一个解决方案- 使用to_numeric
withdropna
并最后转换为int
by astype
:
df.b = pd.to_numeric(df.b, errors='coerce')
df = df.dropna(subset=['b'])
df.b = df.b. astype(int)
print (df)
a b
0 1 26190
2 5 580
If need check all rows with bad data use isnull
- filter all data where after applying function to_numeric
get NaN
:
如果需要检查所有具有错误数据的行,请使用isnull
- 在应用函数to_numeric
get后过滤所有数据NaN
:
print (pd.to_numeric(df.b, errors='coerce').isnull())
0 False
1 True
2 False
Name: b, dtype: bool
print (df[pd.to_numeric(df.b, errors='coerce').isnull()])
a b
1 5 python
回答by conor
This should work
这应该工作
import pandas as pd
import numpy as np
df = pd.DataFrame({'a' : [1, 5, 5],
'b' : [26190, 'python', 580]})
df
a b
0 1 26190
1 5 python
2 5 580
df['b'] = np.where(df.b.str.contains('[a-z]') == True, np.NaN, df.b)
df
a b
0 1 26190
1 5 NaN
2 5 580
df = df.dropna()
df
a b
0 1 26190
2 5 580
You use the regex to identify strings, then convert these to np.NaN
using np.where
then drop them from the df with df.dropna()
.
您使用正则表达式来识别字符串,然后将它们转换为np.NaN
usingnp.where
然后将它们从 df 中删除df.dropna()
。