Python 3 CSV 文件给出 UnicodeDecodeError: 'utf-8' 编解码器在打印时无法解码字节错误

Question

提问by HLH

I have the following code in Python 3, which is meant to print out each line in a csv file.

我在 Python 3 中有以下代码，用于打印 csv 文件中的每一行。

import csv
with open('my_file.csv', 'r', newline='') as csvfile:
    lines = csv.reader(csvfile, delimiter = ',', quotechar = '|')
    for line in lines:
        print(' '.join(line))

But when I run it, it gives me this error:

但是当我运行它时，它给了我这个错误：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 7386: invalid start byte

I looked through the csv file, and it turns out that if I take out a single ? (little n with a tilde on top), every line prints out fine.

我查看了 csv 文件，结果发现如果我取出一个 ? （顶部有波浪号的小 n），每一行都打印得很好。

My problem is that I've looked through a bunch of different solutions to similar problems, but I still have no idea how to fix this, what to decode/encode, etc. Simply taking out the ? character in the data is NOT an option.

我的问题是，我已经查看了许多类似问题的不同解决方案，但我仍然不知道如何解决这个问题，解码/编码什么等等。简单地取出 ? 数据中的字符不是一个选项。

Answer 1

采纳答案by unutbu

We know the file contains the byte b'\x96'since it is mentioned in the error message:

我们知道该文件包含该字节，b'\x96'因为它在错误消息中被提及：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 7386: invalid start byte

Now we can write a little script to find out if there are any encodings where b'\x96'decodes to ?:

现在我们可以编写一个小脚本来找出是否有任何编码b'\x96'解码为 ?：

import pkgutil
import encodings
import os

def all_encodings():
    modnames = set([modname for importer, modname, ispkg in pkgutil.walk_packages(
        path=[os.path.dirname(encodings.__file__)], prefix='')])
    aliases = set(encodings.aliases.aliases.values())
    return modnames.union(aliases)

text = b'\x96'
for enc in all_encodings():
    try:
        msg = text.decode(enc)
    except Exception:
        continue
    if msg == '?':
        print('Decoding {t} with {enc} is {m}'.format(t=text, enc=enc, m=msg))

which yields

这产生

Decoding b'\x96' with mac_roman is ?
Decoding b'\x96' with mac_farsi is ?
Decoding b'\x96' with mac_croatian is ?
Decoding b'\x96' with mac_arabic is ?
Decoding b'\x96' with mac_romanian is ?
Decoding b'\x96' with mac_iceland is ?
Decoding b'\x96' with mac_turkish is ?

Therefore, try changing

因此，尝试改变

with open('my_file.csv', 'r', newline='') as csvfile:

to one of those encodings, such as:

到这些编码之一，例如：

with open('my_file.csv', 'r', encoding='mac_roman', newline='') as csvfile:

Answer 2

回答by MA1

with open('my_file.csv', 'r', newline='', encoding='utf-8') as csvfile:

Try opening the file like above

尝试像上面一样打开文件

Answer 3

回答by Timothy C. Quinn

For others who hit the same error shown in the subject, watch out for the file encoding of your csv file. Its possible it is not utf-8. I just noticed that LibreOffice created a utf-16 encoded file for me today without prompting me although I could not reproduce this.

对于遇到主题中显示的相同错误的其他人，请注意 csv 文件的文件编码。它可能不是 utf-8。我只是注意到 LibreOffice 今天为我创建了一个 utf-16 编码的文件，但没有提示我，尽管我无法重现这个文件。

If you try to open a utf-16 encoded document using open(... encoding='utf-8'), you will get the error:

如果您尝试使用打开 utf-16 编码的文档open(... encoding='utf-8')，您将收到错误消息：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

UnicodeDecodeError: 'utf-8' 编解码器无法解码位置 0 中的字节 0xff：起始字节无效

To fix either specify 'utf-16' encoding or change the encoding of the csv.

要修复指定 'utf-16' 编码或更改 csv 的编码。

Answer 4

回答by Sir Markpo

with open('my_file.csv', 'r', newline='', encoding='ISO-8859-1') as csvfile:

? character is not listed on UTC-8 encoding. To fix the issue, you may use ISO-8859-1 encoding instead. For more details about this encoding, you may refer to the link below: https://www.ic.unicamp.br/~stolfi/EXPORT/www/ISO-8859-1-Encoding.html

? 字符未在 UTC-8 编码中列出。要解决此问题，您可以改用 ISO-8859-1 编码。有关此编码的更多详细信息，您可以参考以下链接：https: //www.ic.unicamp.br/~stolfi/EXPORT/www/ISO-8859-1-Encoding.html

Answer 5

回答by Sheshan Gamage

I also faced the issue with python 3 and my issue got resolved using the encoding type as utf-16

我也遇到了 python 3 的问题，使用utf-16编码类型解决了我的问题

with open('data.csv', newline='',encoding='utf-16') as csvfile:

Answer 6

回答by Osama Subhani

A much simpler solution is to open the csv file in notepad and select "Save As" in "File" dropdown list. Choose "Save as type" to "All files(.)". Select "UTF-8 Encoding" in Encoding dropdown list and put ".csv" extension to the file name

一个更简单的解决方案是在记事本中打开 csv 文件，然后在“文件”下拉列表中选择“另存为”。选择“保存类型”到“所有文件（.）”。在编码下拉列表中选择“UTF-8 编码”并将“.csv”扩展名放在文件名中

Answer 7

回答by Mauricio

easy... just open it in Excel or OpenOffice calc, use text as columns, select ,, and then just save the file as .csv... it takes me one day and several hour of search in google... but at the end i figure it out.

简单...只需在 Excel 或 OpenOffice calc 中打开它，使用文本作为列，选择,，然后将文件另存为.csv...我在谷歌中搜索了一天和几个小时...但最后我想办法。

Python 3 CSV 文件给出 UnicodeDecodeError: 'utf-8' 编解码器在打印时无法解码字节错误

提问by HLH

采纳答案by unutbu

回答by MA1

回答by Timothy C. Quinn

回答by Sir Markpo

回答by Sheshan Gamage

回答by Osama Subhani

回答by Mauricio

相关推荐

最近更新

标签

Python 3 CSV 文件给出 UnicodeDecodeError: 'utf-8' 编解码器在打印时无法解码字节错误

提问by HLH

采纳答案by unutbu

回答by MA1

回答by Timothy C. Quinn

回答by Sir Markpo

回答by Sheshan Gamage

回答by Osama Subhani

回答by Mauricio

相关推荐

Python 比较熊猫数据框的行（行有一些重叠的值）

Python 如何通过熊猫和雅虎金融获得“USDJPY”（货币汇率）？

带有索引的打印矩阵python

Python django - 导入错误：没有名为视图的模块

相关推荐

最近更新

标签