pandas 如何从 Python 数据框中查找特殊字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51287850/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to find special characters from Python Data frame
提问by SPy
I need to find special characters from entire dataframe.
我需要从整个数据框中找到特殊字符。
In below data frame some columns contains special characters, how to find the which columns contains special characters?
在下面的数据框中,某些列包含特殊字符,如何查找哪些列包含特殊字符?
Want to display text for each columns if it contains special characters.
如果每列包含特殊字符,则希望为每列显示文本。
采纳答案by rafaelc
You can setup an alphabet of valid characters, for example
例如,您可以设置有效字符的字母表
import string
alphabet = string.ascii_letters+string.punctuation
Which is
这是
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\]^_`{|}~'
And just use
只需使用
df.col.str.strip(alphabet).astype(bool).any()
For example,
例如,
df = pd.DataFrame({'col1':['abc', 'hello?'], 'col2': ['?éG', '?']})
col1 col2
0 abc ?éG
1 hello? ?
Then, with the above alphabet,
然后,用上面的字母表,
df.col1.str.strip(alphabet).astype(bool).any()
False
df.col2.str.strip(alphabet).astype(bool).any()
True
The statement special characterscan be very tricky, because it depends on your interpretation. For example, you mightor might notconsider #
to be a special character. Also, some languages (such as Portuguese) may have chars like ?
and é
but others (such as English) will not.
语句特殊字符可能非常棘手,因为这取决于您的解释。例如,您可能会或可能不会认为#
是特殊字符。此外,某些语言(例如葡萄牙语)可能具有类似字符?
,é
而其他语言(例如英语)则不会。
回答by Plinus
To remove unwanted characters from dataframe columns, use regex:
要从数据框列中删除不需要的字符,请使用正则表达式:
def strip_character(dataCol):
r = re.compile(r'[^a-zA-Z !@#$%&*_+-=|\:";<>,./()[\]{}\']')
return r.sub('', dataCol)
df[resultCol] = df[dataCol].apply(strip_character)