如何从 Pandas 数据框中特定列中的所有值中删除所有非数字字符？

Question

提问by ag14

I have a dataframe which looks like this:

我有一个如下所示的数据框：

     A       B           C
1   red78   square    big235
2   green   circle    small123
3   blue45  triangle  big657

I need to be able to remove the non-numeric characters from all the rows in column C so that my dataframe looks like:

我需要能够从 C 列中的所有行中删除非数字字符，以便我的数据框看起来像：

     A       B           C
1   red78   square    235
2   green   circle    123
3   blue45  triangle  657

I tried using the following but get the error expected string or buffer:

我尝试使用以下方法但得到错误预期的字符串或缓冲区：

import re
dfOutput.imgID = dfOutput.imgID.apply(re.sub('[^0-9]','', dfOutput.imgID), axis = 0)

What should I do instead?

我应该怎么做？

Code to create dataframe:

创建数据框的代码：

dfObject = pd.DataFrame()
dfObject.set_value(1, 'A', 'red78')
dfObject.set_value(1, 'B', 'square')
dfObject.set_value(1, 'C', 'big235')
dfObject.set_value(2, 'A', 'green')
dfObject.set_value(2, 'B', 'circle')
dfObject.set_value(2, 'C', 'small123')
dfObject.set_value(3, 'A', 'blue45')
dfObject.set_value(3, 'B', 'triangle')
dfObject.set_value(3, 'C', 'big657')

Answer 1

回答by EdChum

Use str.extractand pass a regex pattern to extract just the numeric parts:

使用str.extract并传递正则表达式模式以仅提取数字部分：

In[40]:
dfObject['C'] = dfObject['C'].str.extract('(\d+)', expand=False)
dfObject

Out[40]: 
        A         B    C
1   red78    square  235
2   green    circle  123
3  blue45  triangle  657

If needed you can cast to int:

如果需要，您可以投射到int：

dfObject['C'] = dfObject['C'].astype(int)

Answer 2

回答by Scott Boston

You can use .str.replacewith a regex:

您可以使用.str.replace正则表达式：

dfObject['C'] = dfObject.C.str.replace(r"[a-zA-Z]",'')

output:

输出：

        A         B    C
1   red78    square  235
2   green    circle  123
3  blue45  triangle  657

Answer 3

回答by Wiktor Stribi?ew

To remove all non-digit characters from strings in a Pandas column you should use str.replacewith \D+or [^0-9]+patterns:

要从 Pandas 列中的字符串中删除所有非数字字符，您应该使用str.replacewith\D+或[^0-9]+patterns：

dfObject['C'] = dfObject['C'].str.replace(r'\D+', '')

Or, since in Python 3, \Dis fully Unicode-aware by default and thus does not match non-ASCII digits (like ?????????, see proof) you should consider

或者，由于在 Python 3 中，\D默认情况下完全识别Unicode，因此不匹配非 ASCII 数字（如?????????，请参阅proof），您应该考虑

dfObject['C'] = dfObject['C'].str.replace(r'[^0-9]+', '')

So,

所以，

import re
print ( re.sub( r'\D+', '', '1?????????0') )         # => 1?????????0
print ( re.sub( r'[^0-9]+', '', '1?????????0') )     # => 10

Answer 4

回答by jpp

You can also do this via a lambdafunction with str.isdigit:

你也可以通过一个lambda函数来做到这一点str.isdigit：

import pandas as pd

df = pd.DataFrame({'Name': ['John5', 'Tom 8', 'Ron 722']})

df['Name'] = df['Name'].map(lambda x: ''.join([i for i in x if i.isdigit()]))

#   Name
# 0    5
# 1    8
# 2  722

Answer 5

回答by MEdwin

After 2 years, to help others, I actually think that you were very close to the answer. I have used your logic but made it work. basically you create a function that does the clean up and then apply it to the column C.

2年后，帮助别人，其实我觉得你已经很接近答案了。我已经使用了你的逻辑，但使它起作用。基本上，您创建一个执行清理工作的函数，然后将其应用于 column C。

import pandas as pd
import re

df = pd.DataFrame({
     'A': ['red78', 'green', 'blue45'],
     'B': ['square', 'circle', 'triangle'],
    'C': ['big235', 'small123',  'big657']
})

def remove_chars(s):
    return re.sub('[^0-9]+', '', s) 

df['C'] = df['C'].apply(remove_chars)
df

Result below:

结果如下：

A   B   C
0   red78   square  235
1   green   circle  123
2   blue45  triangle    657

如何从 Pandas 数据框中特定列中的所有值中删除所有非数字字符？

提问by ag14

回答by EdChum

回答by Scott Boston

回答by Wiktor Stribi?ew

回答by jpp

回答by MEdwin

相关推荐

最近更新

标签

如何从 Pandas 数据框中特定列中的所有值中删除所有非数字字符？

提问by ag14

回答by EdChum

回答by Scott Boston

回答by Wiktor Stribi?ew

回答by jpp

回答by MEdwin

相关推荐

pandas Python：将列从浮点数转换为整数

是否可以使用 Python Pandas 构建报告？

pandas 除一个外，所有行总和与熊猫

pandas 从数据框中删除特殊字符和字母数字的简单方法

相关推荐

最近更新

标签