python pandas read_excel在describe()上返回UnicodeDecodeError

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30765820/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:56:07  来源:igfitidea点击:

python pandas read_excel returns UnicodeDecodeError on describe()

pythonexcelpandasunicode

提问by hsinger

I love pandas, but I am having real problems with Unicode errors. read_excel() returns the dreaded Unicode error:

我喜欢熊猫,但我遇到了 Unicode 错误的真正问题。read_excel() 返回可怕的 Unicode 错误:

import pandas as pd
df=pd.read_excel('tmp.xlsx',encoding='utf-8')
df.describe()

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 259: ordinal not in range(128)

I figured out that the original Excel had   (non-breaking space) at the end of many cells, probably to avoid conversion of long digit strings to float.

我发现原始 Excel 在许多单元格的末尾有(不间断空格),可能是为了避免将长数字字符串转换为浮点数。

One way around this is to strip the cells, but there must be something better.

解决此问题的一种方法是剥离细胞,但必须有更好的方法。

for col in df.columns:
    df[col]=df[col].str.strip()

I am using anaconda2.2.0 win64, with pandas 0.16

我正在使用 anaconda2.2.0 win64,使用 Pandas 0.16

回答by skytaker

Try this method suggested here:

试试这个方法,建议在这里

df=pd.read_excel('tmp.xlsx',encoding=sys.getfilesystemencoding())

回答by ihightower

Hope this helps someone.

希望这可以帮助某人。

I had this error:

我有这个错误:

UnicodeDecodeError: 'ascii' codec can't decode byte ....

after reading an Excel File df = pd.read_excel...and trying to assign a new column to the dataframe like this df['new_col'] = 'foo bar'

在读取 Excel 文件df = pd.read_excel...并尝试像这样将新列分配给数据框后df['new_col'] = 'foo bar'

After closer inspection, I found the problem to be. There were some 'nan'columns in the dataframe due to missing column headers. After dropping the 'nan' columns using the following code, everything else was ok.

仔细检查后,我发现问题是。'nan'由于缺少列标题,数据框中有一些列。使用以下代码删除“nan”列后,其他一切正常。

df = df.dropna(axis=1,how='all')