pandas 将字符串列转换为整数

Question

提问by billboard

I have a dataframe like below

我有一个如下所示的数据框

    a   b
0   1   26190
1   5   python
2   5   580

I want to make column bto host only integers, but as you can see pythonis not int convertible, so I want to delete the row at index 1. My expected out put has to be like

我想让列b只承载整数，但正如你所看到的，python它不是 int 可转换的，所以我想删除 index 处的行1。我的预期输出必须像

    a   b
0   1   26190
1   5   580

How to filter and remove using pandas in python?

如何在python中使用pandas过滤和删除？

Answer 1

回答by jezrael

You can use to_numericwith notnulland filter by boolean indexing:

您可以使用to_numericwithnotnull和过滤方式boolean indexing：

print (pd.to_numeric(df.b, errors='coerce'))
0    26190.0
1        NaN
2      580.0
Name: b, dtype: float64

print (pd.to_numeric(df.b, errors='coerce').notnull())
0     True
1    False
2     True
Name: b, dtype: bool

df = df[pd.to_numeric(df.b, errors='coerce').notnull()]
print (df)

   a      b
0  1  26190
2  5    580

Another solution by comment of Boud- use to_numericwith dropnaand last convert to intby astype:

Boud评论的另一个解决方案- 使用to_numericwithdropna并最后转换为intby astype：

df.b = pd.to_numeric(df.b, errors='coerce')
df = df.dropna(subset=['b'])
df.b = df.b. astype(int)
print (df)
   a      b
0  1  26190
2  5    580

If need check all rows with bad data use isnull- filter all data where after applying function to_numericget NaN:

如果需要检查所有具有错误数据的行，请使用isnull- 在应用函数to_numericget后过滤所有数据NaN：

print (pd.to_numeric(df.b, errors='coerce').isnull())
0    False
1     True
2    False
Name: b, dtype: bool

print (df[pd.to_numeric(df.b, errors='coerce').isnull()])
   a       b
1  5  python

Answer 2

回答by conor

This should work

这应该工作

import pandas as pd
import numpy as np

df = pd.DataFrame({'a' : [1, 5, 5],
                   'b' : [26190, 'python', 580]})
df
   a       b
0  1   26190
1  5  python
2  5     580

df['b'] = np.where(df.b.str.contains('[a-z]') == True, np.NaN, df.b)
df
   a      b
0  1  26190
1  5    NaN
2  5    580

df = df.dropna()
df
   a      b
0  1  26190
2  5    580

You use the regex to identify strings, then convert these to np.NaNusing np.wherethen drop them from the df with df.dropna().

您使用正则表达式来识别字符串，然后将它们转换为np.NaNusingnp.where然后将它们从 df 中删除df.dropna()。

pandas 将字符串列转换为整数

提问by billboard

回答by jezrael

回答by conor

相关推荐

最近更新

标签

pandas 将字符串列转换为整数

提问by billboard

回答by jezrael

回答by conor

相关推荐

pandas 根据另一列的值在熊猫中创建新列

Python Pandas - 如何通过描述函数计算 25 个百分位

在 Pandas 中聚合多列时如何重置索引

pandas merge() 缺少 1 个必需的位置参数：“正确”

相关推荐

最近更新

标签