无法在 Python 2.x 下从 Pandas 的列名中删除 unicode char

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28535067/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:57:06  来源:igfitidea点击:

Unable to remove unicode char from column names in pandas under Python 2.x

pythonpandasunicodecharacter-encodingpython-2.x

提问by

I have read a csv file in pandas dataframe and am trying to remove the unicode char ufrom the column names but with no luck.

我已经在 Pandas 数据框中读取了一个 csv 文件,并试图从列名中删除 unicode char u但没有运气。

fl.columns
Index([ u'time', u'contact', u'address'], dtype='object')

headers=[ 'time', 'contact', 'address']
fl=pandas.read_csv('file.csv',header=None,names=headers)

Still doesnt work

还是不行

fl.columns
Index([ u'time', u'contact', u'address'], dtype='object')

Even the rename doesnt work either

即使重命名也不起作用

fl.rename(columns=lambda x:x.replace(x,x.value.encode('ascii','ignore')),inplace=True)
fl.columns
Index([ u'time', u'contact', u'address'], dtype='object')

Can anybody please tell me why this is happening and how to fix it ? Thanks.

谁能告诉我为什么会发生这种情况以及如何解决?谢谢。

回答by paulo.filip3

If you really need to remove the u(since this is only a display issue) you can do the following very dirty trick:

如果您真的需要删除u(因为这只是一个显示问题),您可以执行以下非常肮脏的技巧

from pandas import compat

compat.PY3 = True

df.columns
Index(['time', 'contact', 'address'], dtype='object')

回答by elPastor

I had an issue with this today and used: df['var'] = df['var'].astype(str)

我今天遇到了这个问题并使用了: df['var'] = df['var'].astype(str)

回答by ashok suthar

I was facing a similar issue while building ML pipeline. My features list was having Unicode along with names.

我在构建 ML 管道时遇到了类似的问题。我的功能列表包含 Unicode 和名称。

features

特征

[u'Customer_id', u'Age',.....]

One way to get away with it is using str() function. Create a new list with applying an str function to each of the value.

摆脱它的一种方法是使用 str() 函数。创建一个新列表,将 str 函数应用于每个值。

features_new= [str(x) for x in features]

Now the features_newlist will not have any Unicode char. Let me know how it works.

现在features_new列表将没有任何 Unicode 字符。让我知道它是如何工作的。

回答by Dean Hu

Here is one way to remove Unicode from column names:

这是从列名中删除 Unicode 的一种方法:

df.columns = [strip_non_ascii(x) for x in df.columns]

The following is the function strip_non_asciito remove Unicode:

以下是strip_non_ascii去除Unicode的函数:

def strip_non_ascii(string):
''' Returns the string without non ASCII characters'''
stripped = (c for c in string if 0 < ord(c) < 127)
return ''.join(stripped)