pandas 如何从 Python 数据框列中的字符串中删除非字母数字字符？

Question

提问by TheSaint321

I have a DF column which has many strings in it. I need to remove all non-alpha numeric characters from that column: i.e:

我有一个 DF 列，其中包含许多字符串。我需要从该列中删除所有非字母数字字符：即：

df['strings'] = ["a#bc1!","a(b$c"]

Run code:

运行代码：

Print(df['strings']): ['abc','abc']

I've tried:

我试过了：

df['strings'].replace([',','.','/','"',':',';','!','@','#','$','%',"'","*","(",")","&",],"")

But this didn't work and I feel that there should be a more efficient way to do this using regex. Any help would be very appreciated.

但这不起作用，我觉得应该有一种更有效的方法来使用正则表达式来做到这一点。任何帮助将不胜感激。

Answer 1

Use str.replace.

使用str.replace.

df
  strings
0  a#bc1!
1   a(b$c

df.strings.str.replace('[^a-zA-Z]', '')
0    abc
1    abc
Name: strings, dtype: object

To retain alphanumericcharacters (not just alphabets as your expected output suggests), you'll need:

要保留字母数字字符（不仅仅是您预期输出所建议的字母），您需要：

df.strings.str.replace('\W', '')
0    abc1
1     abc
Name: strings, dtype: object

Answer 2

Since you wrote alphanumeric, you need to add 0-9 in the regex. But maybe you only wanted alphabetic...

由于您编写了字母数字，因此您需要在正则表达式中添加 0-9。但也许你只想要字母...

import pandas as pd

ded = pd.DataFrame({'strings': ['a#bc1!', 'a(b$c']})

ded.strings.str.replace('[^a-zA-Z0-9]', '')

But it's basically what COLDSPEED wrote

但这基本上是 COLDSPEED 写的

Answer 3

You can also use regex

您也可以使用正则表达式

import re

regex = re.compile('[^a-zA-Z]')

l = ["a#bc1!","a(b$c"]

print [regex.sub('', i) for i in l]

['abc', 'abc']