Python 3 CSV 文件给出 UnicodeDecodeError: 'utf-8' 编解码器在打印时无法解码字节错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21504319/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python 3 CSV file giving UnicodeDecodeError: 'utf-8' codec can't decode byte error when I print
提问by HLH
I have the following code in Python 3, which is meant to print out each line in a csv file.
我在 Python 3 中有以下代码,用于打印 csv 文件中的每一行。
import csv
with open('my_file.csv', 'r', newline='') as csvfile:
lines = csv.reader(csvfile, delimiter = ',', quotechar = '|')
for line in lines:
print(' '.join(line))
But when I run it, it gives me this error:
但是当我运行它时,它给了我这个错误:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 7386: invalid start byte
I looked through the csv file, and it turns out that if I take out a single ? (little n with a tilde on top), every line prints out fine.
我查看了 csv 文件,结果发现如果我取出一个 ? (顶部有波浪号的小 n),每一行都打印得很好。
My problem is that I've looked through a bunch of different solutions to similar problems, but I still have no idea how to fix this, what to decode/encode, etc. Simply taking out the ? character in the data is NOT an option.
我的问题是,我已经查看了许多类似问题的不同解决方案,但我仍然不知道如何解决这个问题,解码/编码什么等等。简单地取出 ? 数据中的字符不是一个选项。
采纳答案by unutbu
We know the file contains the byte b'\x96'since it is mentioned in the error message:
我们知道该文件包含该字节,b'\x96'因为它在错误消息中被提及:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 7386: invalid start byte
Now we can write a little script to find out if there are any encodings where b'\x96'decodes to ?:
现在我们可以编写一个小脚本来找出是否有任何编码b'\x96'解码为 ?:
import pkgutil
import encodings
import os
def all_encodings():
modnames = set([modname for importer, modname, ispkg in pkgutil.walk_packages(
path=[os.path.dirname(encodings.__file__)], prefix='')])
aliases = set(encodings.aliases.aliases.values())
return modnames.union(aliases)
text = b'\x96'
for enc in all_encodings():
try:
msg = text.decode(enc)
except Exception:
continue
if msg == '?':
print('Decoding {t} with {enc} is {m}'.format(t=text, enc=enc, m=msg))
which yields
这产生
Decoding b'\x96' with mac_roman is ?
Decoding b'\x96' with mac_farsi is ?
Decoding b'\x96' with mac_croatian is ?
Decoding b'\x96' with mac_arabic is ?
Decoding b'\x96' with mac_romanian is ?
Decoding b'\x96' with mac_iceland is ?
Decoding b'\x96' with mac_turkish is ?
Therefore, try changing
因此,尝试改变
with open('my_file.csv', 'r', newline='') as csvfile:
to one of those encodings, such as:
到这些编码之一,例如:
with open('my_file.csv', 'r', encoding='mac_roman', newline='') as csvfile:
回答by MA1
with open('my_file.csv', 'r', newline='', encoding='utf-8') as csvfile:
Try opening the file like above
尝试像上面一样打开文件
回答by Timothy C. Quinn
For others who hit the same error shown in the subject, watch out for the file encoding of your csv file. Its possible it is not utf-8. I just noticed that LibreOffice created a utf-16 encoded file for me today without prompting me although I could not reproduce this.
对于遇到主题中显示的相同错误的其他人,请注意 csv 文件的文件编码。它可能不是 utf-8。我只是注意到 LibreOffice 今天为我创建了一个 utf-16 编码的文件,但没有提示我,尽管我无法重现这个文件。
If you try to open a utf-16 encoded document using open(... encoding='utf-8'), you will get the error:
如果您尝试使用打开 utf-16 编码的文档open(... encoding='utf-8'),您将收到错误消息:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
UnicodeDecodeError: 'utf-8' 编解码器无法解码位置 0 中的字节 0xff:起始字节无效
To fix either specify 'utf-16' encoding or change the encoding of the csv.
要修复指定 'utf-16' 编码或更改 csv 的编码。
回答by Sir Markpo
with open('my_file.csv', 'r', newline='', encoding='ISO-8859-1') as csvfile:
with open('my_file.csv', 'r', newline='', encoding='ISO-8859-1') as csvfile:
? character is not listed on UTC-8 encoding. To fix the issue, you may use ISO-8859-1 encoding instead. For more details about this encoding, you may refer to the link below: https://www.ic.unicamp.br/~stolfi/EXPORT/www/ISO-8859-1-Encoding.html
? 字符未在 UTC-8 编码中列出。要解决此问题,您可以改用 ISO-8859-1 编码。有关此编码的更多详细信息,您可以参考以下链接:https: //www.ic.unicamp.br/~stolfi/EXPORT/www/ISO-8859-1-Encoding.html
回答by Sheshan Gamage
I also faced the issue with python 3 and my issue got resolved using the encoding type as utf-16
我也遇到了 python 3 的问题,使用utf-16编码类型解决了我的问题
with open('data.csv', newline='',encoding='utf-16') as csvfile:
回答by Osama Subhani
A much simpler solution is to open the csv file in notepad and select "Save As" in "File" dropdown list. Choose "Save as type" to "All files(.)". Select "UTF-8 Encoding" in Encoding dropdown list and put ".csv" extension to the file name
一个更简单的解决方案是在记事本中打开 csv 文件,然后在“文件”下拉列表中选择“另存为”。选择“保存类型”到“所有文件(.)”。在编码下拉列表中选择“UTF-8 编码”并将“.csv”扩展名放在文件名中
回答by Mauricio
easy... just open it in Excel or OpenOffice calc, use text as columns, select ,, and then just save the file as .csv... it takes me one day and several hour of search in google... but at the end i figure it out.
简单...只需在 Excel 或 OpenOffice calc 中打开它,使用文本作为列,选择,,然后将文件另存为.csv...我在谷歌中搜索了一天和几个小时...但最后我想办法。

