pandas 如何从 Python 数据框列中的字符串中删除非字母数字字符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46241120/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:28:06  来源:igfitidea点击:

How to remove non-alpha-numeric characters from strings within a dataframe column in Python?

pythonregexpandasdataframe

提问by TheSaint321

I have a DF column which has many strings in it. I need to remove all non-alpha numeric characters from that column: i.e:

我有一个 DF 列,其中包含许多字符串。我需要从该列中删除所有非字母数字字符:即:

df['strings'] = ["a#bc1!","a(b$c"]

Run code:

运行代码:

Print(df['strings']): ['abc','abc']

I've tried:

我试过了:

df['strings'].replace([',','.','/','"',':',';','!','@','#','$','%',"'","*","(",")","&",],"")

But this didn't work and I feel that there should be a more efficient way to do this using regex. Any help would be very appreciated.

但这不起作用,我觉得应该有一种更有效的方法来使用正则表达式来做到这一点。任何帮助将不胜感激。

回答by cs95

Use str.replace.

使用str.replace.

df
  strings
0  a#bc1!
1   a(b$c

df.strings.str.replace('[^a-zA-Z]', '')
0    abc
1    abc
Name: strings, dtype: object


To retain alphanumericcharacters (not just alphabets as your expected output suggests), you'll need:

要保留字母数字字符(不仅仅是您预期输出所建议的字母),您需要:

df.strings.str.replace('\W', '')
0    abc1
1     abc
Name: strings, dtype: object 

回答by StefanK

Since you wrote alphanumeric, you need to add 0-9 in the regex. But maybe you only wanted alphabetic...

由于您编写了字母数字,因此您需要在正则表达式中添加 0-9。但也许你只想要字母...

import pandas as pd

ded = pd.DataFrame({'strings': ['a#bc1!', 'a(b$c']})

ded.strings.str.replace('[^a-zA-Z0-9]', '')

But it's basically what COLDSPEED wrote

但这基本上是 COLDSPEED 写的

回答by lapinktheitroada

You can also use regex

您也可以使用正则表达式

import re

regex = re.compile('[^a-zA-Z]')

l = ["a#bc1!","a(b$c"]

print [regex.sub('', i) for i in l]

['abc', 'abc']