pandas 如何从列中的值中删除重音符号?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37926248/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:25:29  来源:igfitidea点击:

How to remove accents from values in columns?

pythonpandasdataframe

提问by Marius

How do I change the special characters to the usual alphabet letters? This is my dataframe:

如何将特殊字符更改为常用字母?这是我的数据框:

In [56]: cities
Out[56]:

Table Code  Country         Year        City        Value       
240         ?land Islands   2014.0      MARIEHAMN   11437.0 1
240         ?land Islands   2010.0      MARIEHAMN   5829.5  1
240         Albania         2011.0      Durr?s      113249.0
240         Albania         2011.0      TIRANA      418495.0
240         Albania         2011.0      Durr?s      56511.0 

I want it to look like this:

我希望它看起来像这样:

In [56]: cities
Out[56]:

Table Code  Country         Year        City        Value       
240         Aland Islands   2014.0      MARIEHAMN   11437.0 1
240         Aland Islands   2010.0      MARIEHAMN   5829.5  1
240         Albania         2011.0      Durres      113249.0
240         Albania         2011.0      TIRANA      418495.0
240         Albania         2011.0      Durres      56511.0 

采纳答案by Blind0ne

Use this code:

使用此代码:

df['Country'] = df['Country'].str.replace(u"?", "A")
df['City'] = df['City'].str.replace(u"?", "e")

See here! Of course you should do it then for every special character and every column.

这里!当然,您应该为每个特殊字符和每一列都这样做。

回答by EdChum

The pandas method is to use the vectorised str.normalizecombined with str.decodeand str.encode:

pandas 方法是使用矢量化str.normalizestr.decode和结合str.encode

In [60]:
df['Country'].str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8')

Out[60]:
0    Aland Islands
1    Aland Islands
2          Albania
3          Albania
4          Albania
Name: Country, dtype: object

So to do this for all strdtypes:

因此,要对所有strdtype执行此操作:

In [64]:
cols = df.select_dtypes(include=[np.object]).columns
df[cols] = df[cols].apply(lambda x: x.str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8'))
df

Out[64]:
   Table Code        Country    Year       City      Value
0         240  Aland Islands  2014.0  MARIEHAMN  11437.0 1
1         240  Aland Islands  2010.0  MARIEHAMN  5829.5  1
2         240        Albania  2011.0     Durres   113249.0
3         240        Albania  2011.0     TIRANA   418495.0
4         240        Albania  2011.0     Durres    56511.0

回答by Caio Andrian

With pandas seriesexample

Pandas系列为例

def remove_accents(a):
    return unidecode.unidecode(a.decode('utf-8'))

df['column'] = df['column'].apply(remove_accents)

in this case decode asciis

在这种情况下解码 asciis

回答by advance512

This is for Python 2.7. For converting to ASCII you might want to try:

这适用于 Python 2.7。要转换为 ASCII,您可能想尝试:

import unicodedata

unicodedata.normalize('NFKD', u"Durr?s ?land Islands").encode('ascii','ignore')
'Durres Aland Islands'