pandas 如何从 Python 数据框列中的字符串中删除非字母数字字符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46241120/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove non-alpha-numeric characters from strings within a dataframe column in Python?
提问by TheSaint321
I have a DF column which has many strings in it. I need to remove all non-alpha numeric characters from that column: i.e:
我有一个 DF 列,其中包含许多字符串。我需要从该列中删除所有非字母数字字符:即:
df['strings'] = ["a#bc1!","a(b$c"]
Run code:
运行代码:
Print(df['strings']): ['abc','abc']
I've tried:
我试过了:
df['strings'].replace([',','.','/','"',':',';','!','@','#','$','%',"'","*","(",")","&",],"")
But this didn't work and I feel that there should be a more efficient way to do this using regex. Any help would be very appreciated.
但这不起作用,我觉得应该有一种更有效的方法来使用正则表达式来做到这一点。任何帮助将不胜感激。
回答by cs95
Use str.replace
.
使用str.replace
.
df
strings
0 a#bc1!
1 a(b$c
df.strings.str.replace('[^a-zA-Z]', '')
0 abc
1 abc
Name: strings, dtype: object
To retain alphanumericcharacters (not just alphabets as your expected output suggests), you'll need:
要保留字母数字字符(不仅仅是您预期输出所建议的字母),您需要:
df.strings.str.replace('\W', '')
0 abc1
1 abc
Name: strings, dtype: object
回答by StefanK
Since you wrote alphanumeric, you need to add 0-9 in the regex. But maybe you only wanted alphabetic...
由于您编写了字母数字,因此您需要在正则表达式中添加 0-9。但也许你只想要字母...
import pandas as pd
ded = pd.DataFrame({'strings': ['a#bc1!', 'a(b$c']})
ded.strings.str.replace('[^a-zA-Z0-9]', '')
But it's basically what COLDSPEED wrote
但这基本上是 COLDSPEED 写的
回答by lapinktheitroada
You can also use regex
您也可以使用正则表达式
import re
regex = re.compile('[^a-zA-Z]')
l = ["a#bc1!","a(b$c"]
print [regex.sub('', i) for i in l]
['abc', 'abc']