pandas 如何从列中的值中删除重音符号?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37926248/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove accents from values in columns?
提问by Marius
How do I change the special characters to the usual alphabet letters? This is my dataframe:
如何将特殊字符更改为常用字母?这是我的数据框:
In [56]: cities
Out[56]:
Table Code Country Year City Value
240 ?land Islands 2014.0 MARIEHAMN 11437.0 1
240 ?land Islands 2010.0 MARIEHAMN 5829.5 1
240 Albania 2011.0 Durr?s 113249.0
240 Albania 2011.0 TIRANA 418495.0
240 Albania 2011.0 Durr?s 56511.0
I want it to look like this:
我希望它看起来像这样:
In [56]: cities
Out[56]:
Table Code Country Year City Value
240 Aland Islands 2014.0 MARIEHAMN 11437.0 1
240 Aland Islands 2010.0 MARIEHAMN 5829.5 1
240 Albania 2011.0 Durres 113249.0
240 Albania 2011.0 TIRANA 418495.0
240 Albania 2011.0 Durres 56511.0
采纳答案by Blind0ne
回答by EdChum
The pandas method is to use the vectorised str.normalize
combined with str.decode
and str.encode
:
pandas 方法是使用矢量化str.normalize
与str.decode
和结合str.encode
:
In [60]:
df['Country'].str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8')
Out[60]:
0 Aland Islands
1 Aland Islands
2 Albania
3 Albania
4 Albania
Name: Country, dtype: object
So to do this for all str
dtypes:
因此,要对所有str
dtype执行此操作:
In [64]:
cols = df.select_dtypes(include=[np.object]).columns
df[cols] = df[cols].apply(lambda x: x.str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8'))
df
Out[64]:
Table Code Country Year City Value
0 240 Aland Islands 2014.0 MARIEHAMN 11437.0 1
1 240 Aland Islands 2010.0 MARIEHAMN 5829.5 1
2 240 Albania 2011.0 Durres 113249.0
3 240 Albania 2011.0 TIRANA 418495.0
4 240 Albania 2011.0 Durres 56511.0
回答by Caio Andrian
With pandas seriesexample
以Pandas系列为例
def remove_accents(a):
return unidecode.unidecode(a.decode('utf-8'))
df['column'] = df['column'].apply(remove_accents)
in this case decode asciis
在这种情况下解码 asciis
回答by advance512
This is for Python 2.7. For converting to ASCII you might want to try:
这适用于 Python 2.7。要转换为 ASCII,您可能想尝试:
import unicodedata
unicodedata.normalize('NFKD', u"Durr?s ?land Islands").encode('ascii','ignore')
'Durres Aland Islands'