Python pandas to_excel'utf8'编解码器无法解码字节

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18645401/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 11:19:10  来源:igfitidea点击:

Python pandas to_excel 'utf8' codec can't decode byte

pythonexcelutf-8pandas

提问by Wizuriel

I'm trying to do some data work in Python pandas and having trouble writing out my results. I read my data in as a CSV file and been exporting each script as it's own CSV file which works fine. Lately though I've tried exporting everything in 1 Excel file with worksheets and a few of the sheets give me an error

我正在尝试在 Python pandas 中做一些数据工作,但在写出我的结果时遇到了麻烦。我将我的数据作为 CSV 文件读取,并将每个脚本导出为它自己的 CSV 文件,该文件工作正常。最近虽然我尝试使用工作表导出 1 个 Excel 文件中的所有内容,但其中一些工作表给了我一个错误

"'utf8' codec can't decode byte 0xe9 in position 1: invalid continuation byte"

“‘utf8’编解码器无法解码位置 1 中的字节 0xe9:无效的连续字节”

I have no idea how to even start finding any characters that could be causing problems exporting to Excel. Not sure why it exports to CSV just fine though :(

我什至不知道如何开始查找可能导致导出到 Excel 出现问题的任何字符。不知道为什么它可以很好地导出到 CSV :(

relevant lines

相关线路

from pandas import ExcelWriter
data = pd.read_csv(input)
writer = ExcelWriter(output) #output is just the filename
fundraisers.to_excel(writer, "fundraisers")
locations.to_excel(writer, "locations") #error
locations.to_csv(outputcsv) #works
writer.save()

printing head of offending dataframe

违规数据帧的打印头

Event ID    Constituent ID  Email Address   First Name  \   Last Name
f       1       A       A       1
F       4       L       R       C
M       1       1       A       D
F       4       A       A       G
M       2       0       R       G
M       3       O       O       H
M       2       T       E       H
M       2       A       A       H
M       2       M       M       K
F       3       J       E       K
Location ID raised  raised con  raised email
a   0   0   0
a   8   0   0
o   0   0   0
o   0   0   0
o   0   0   0
t   5   0   0
o   1   0   0
o   6   a   0
o   6   0   0
d   0   0   0

looking at the excel sheet I do actually get a partial print out. Anything in the first name column and beyond are blank, but event, constituent and email all print.

看着excel表,我确实得到了部分打印出来。名字列及以后的任何内容都是空白的,但事件、成分和电子邮件都打印出来。

edit: Trying to read the csv in as utf8 fails, but reading it in as latin1 works. Is there a way to specify the to_excel encoding? Or decode and encode my dataframe to utf8?

编辑:尝试在 utf8 中读取 csv 失败,但在 latin1 中读取它有效。有没有办法指定 to_excel 编码?或者将我的数据帧解码和编码为 utf8?

采纳答案by Wizuriel

Managed to solve this.

设法解决了这个问题。

I made a function that goes through my columns that have strings and managed to decode/encode them into utf8 and it now works.

我制作了一个函数,它遍历我的包含字符串的列,并设法将它们解码/编码为 utf8,现在它可以工作了。

def changeencode(data, cols):
    for col in cols:
        data[col] = data[col].str.decode('iso-8859-1').str.encode('utf-8')
    return data   

回答by Jorge Tornero

don't know when it's going to be released but you can try with my github repository:

不知道什么时候发布,但你可以试试我的 github 存储库:

https://github.com/jtornero/pandas

https://github.com/jtornero/pandas

You can clone it and build pandas from source; the issue is almost solved and it works like

您可以克隆它并从源代码构建熊猫;这个问题几乎解决了,它的工作原理就像

sampleList = ['Mi?o', '1', '2', 'se?ora']
dataframe = pandas.DataFrame(sampleList)
ew = pandas.ExcelWriter('./test.xls', encoding='utf-8')
dataframe.to_excel(ew)
ew.save()

Cheers

干杯

Jorge Tornero

豪尔赫·托内罗

回答by user3570953

Actually, there is a way to force utf8 encoding by passing a parameter to ExcelWriter:

实际上,有一种方法可以通过将参数传递给 ExcelWriter 来强制进行 utf8 编码:

 ew = pandas.ExcelWriter('test.xlsx',options={'encoding':'utf-8'})
 sampleList = ['Mi?o', '1', '2', 'se?ora']
 dataframe = pandas.DataFrame(sampleList)
 dataframe.to_excel(ew)
 ew.save()

回答by Zenadix

In my case, the problem was that I was initially readingthe CSV file with the wrong encoding (ASCIIinstead of cp1252). Therefore, when pandas tried to write it to an Excel file, it found some characters it couldn't decode.

在我的情况下,问题是我最初使用错误的编码(而不是)读取CSV 文件。因此,当 Pandas 尝试将其写入 Excel 文件时,它发现一些无法解码的字符。ASCIIcp1252

I solved it by specifying the correct encoding when reading the CSV file.

我通过在读取 CSV 文件时指定正确的编码来解决它。

data = pd.read_csv(fname, encoding='cp1252')

回答by billmanH

The simplest thing is to load your dataframe in utf-8. Then it ExcelWriter will save it no problem.

最简单的方法是用 utf-8 加载数据帧。然后它 ExcelWriter 将保存它没有问题。

data = pd.read_csv(path,encoding='utf-8')

回答by Joe Banks

Similar to what was said by @Zenadix, reading the csvs in as UTF-8 allowed the ExcelWriter to write without an error.

与@Zenadix 所说的类似,以 UTF-8 格式读取 csvs 允许 ExcelWriter 写入而不会出错。

df = pd.read_csv('path', encoding='utf-8')

...

with pd.ExcelWriter('new_path') as writer:
    df.to_excel(writer, sheet_name='Foo')