pandas 如何从 Python 数据框中查找特殊字符

Question

提问by SPy

I need to find special characters from entire dataframe.

我需要从整个数据框中找到特殊字符。

In below data frame some columns contains special characters, how to find the which columns contains special characters?

在下面的数据框中，某些列包含特殊字符，如何查找哪些列包含特殊字符？

Want to display text for each columns if it contains special characters.

如果每列包含特殊字符，则希望为每列显示文本。

Answer 1

采纳答案by rafaelc

You can setup an alphabet of valid characters, for example

例如，您可以设置有效字符的字母表

import string
alphabet = string.ascii_letters+string.punctuation

Which is

这是

'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\]^_`{|}~'

And just use

只需使用

df.col.str.strip(alphabet).astype(bool).any()

For example,

例如，

df = pd.DataFrame({'col1':['abc', 'hello?'], 'col2': ['?éG', '?']})


    col1    col2
0   abc     ?éG
1   hello?  ?

Then, with the above alphabet,

然后，用上面的字母表，

df.col1.str.strip(alphabet).astype(bool).any()
False
df.col2.str.strip(alphabet).astype(bool).any()
True

The statement special characterscan be very tricky, because it depends on your interpretation. For example, you mightor might notconsider #to be a special character. Also, some languages (such as Portuguese) may have chars like ?and ébut others (such as English) will not.

语句特殊字符可能非常棘手，因为这取决于您的解释。例如，您可能会或可能不会认为#是特殊字符。此外，某些语言（例如葡萄牙语）可能具有类似字符?，é而其他语言（例如英语）则不会。

Answer 2

回答by Plinus

To remove unwanted characters from dataframe columns, use regex:

要从数据框列中删除不需要的字符，请使用正则表达式：

def strip_character(dataCol):
    r = re.compile(r'[^a-zA-Z !@#$%&*_+-=|\:";<>,./()[\]{}\']')
    return r.sub('', dataCol)

df[resultCol] = df[dataCol].apply(strip_character)

pandas 如何从 Python 数据框中查找特殊字符

提问by SPy

采纳答案by rafaelc

回答by Plinus

相关推荐

最近更新

标签

pandas 如何从 Python 数据框中查找特殊字符

提问by SPy

采纳答案by rafaelc

回答by Plinus

相关推荐

在 Pandas groupby 对象上获取 count() 函数的最大值

pandas ValueError：只能比较相同标记的系列对象python

从 Pandas 写入 Excel 时设置默认数字格式

pandas 将点转换为线 Geopandas

相关推荐

最近更新

标签