Python “for line in...”导致 UnicodeDecodeError: 'utf-8' codec can't decode byte

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19699367/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:24:09  来源:igfitidea点击:

"for line in..." results in UnicodeDecodeError: 'utf-8' codec can't decode byte

pythonpython-3.xcharacter-encoding

提问by SujitS

Here is my code,

这是我的代码,

for line in open('u.item'):
#read each line

whenever I run this code it gives the following error:

每当我运行此代码时,它都会出现以下错误:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2892: invalid continuation byte

I tried to solve this and add an extra parameter in open(), the code looks like;

我试图解决这个问题并在 open() 中添加一个额外的参数,代码看起来像;

for line in open('u.item', encoding='utf-8'):
#read each line

But again it gives the same error. what should I do then! Please help.

但它再次给出了同样的错误。那我该怎么办!请帮忙。

采纳答案by SujitS

As suggested by Mark Ransom, I found the right encoding for that problem. The encoding was "ISO-8859-1", so replacing open("u.item", encoding="utf-8")with open('u.item', encoding = "ISO-8859-1")will solve the problem.

正如 Mark Ransom 所建议的,我找到了解决该问题的正确编码。编码是“ISO-8859-1”,所以替换open("u.item", encoding="utf-8")open('u.item', encoding = "ISO-8859-1")可以解决问题。

回答by Mark Ransom

Your file doesn't actually contain utf-8 encoded data, it contains some other encoding. Figure out what that encoding is and use it in the opencall.

您的文件实际上并不包含 utf-8 编码数据,它包含一些其他编码。弄清楚该编码是什么并在open调用中使用它。

In Windows-1252 encoding for example the 0xe9would be the character é.

例如,在 Windows-1252 编码中,0xe9将是字符é.

回答by user6832484

If someone looking for these, this is an example for converting a CSV file in Python 3:

如果有人在寻找这些,这是在 Python 3 中转换 CSV 文件的示例:

try:
    inputReader = csv.reader(open(argv[1], encoding='ISO-8859-1'), delimiter=',',quotechar='"')
except IOError:
    pass

回答by Shashank

Try this to read using pandas

试试这个使用熊猫阅读

pd.read_csv('u.item', sep='|', names=m_cols , encoding='latin-1')

回答by Jeril

If you are using Python 2the following will the solution:

如果您正在使用Python 2以下解决方案:

import io
for line in io.open("u.item", encoding="ISO-8859-1"):
    # do something

Because encodingparameter doesn't work with open(), you will be getting the following error:

由于encoding参数不适用于open(),您将收到以下错误:

TypeError: 'encoding' is an invalid keyword argument for this function

回答by Ryoji Kuwae Neto

Also worked for me, ISO 8859-1 is going to save a lot, hahaha, mainly if using Speech Recognition API's

也对我有用,ISO 8859-1 会节省很多,哈哈哈,主要是如果使用语音识别 API 的

Example:

例子:

file = open('../Resources/' + filename, 'r', encoding="ISO-8859-1");

回答by xtluo

Sometimeswhen open(filepath)in which filepathactually is not a file would get the same error, so firstly make sure the file you're trying to open exists:

有时open(filepath)其中filepath实际上不是文件时会出现相同的错误,因此首先确保您尝试打开的文件存在:

import os
assert os.path.isfile(filepath)

hope this will help.

希望这会有所帮助。

回答by Ozcar Nguyen

You could resolve the problem with:

您可以通过以下方式解决问题:

for line in open(your_file_path, 'rb'):

'rb' is reading file in binary mode. Read more here. Hope this will help!

'rb' 正在以二进制模式读取文件。在这里阅读更多。希望这会有所帮助!

回答by Ayesha Siddiqa

This works:

这有效:

open('filename', encoding='latin-1')

or:

或者:

open('filename',encoding="IS0-8859-1")

回答by FaridLU

you can try this way:

你可以试试这种方式:

open('u.item', encoding='utf8', errors='ignore')