Python “for line in...”导致 UnicodeDecodeError: 'utf-8' codec can't decode byte

Question

提问by SujitS

Here is my code,

这是我的代码，

for line in open('u.item'):
#read each line

whenever I run this code it gives the following error:

每当我运行此代码时，它都会出现以下错误：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2892: invalid continuation byte

I tried to solve this and add an extra parameter in open(), the code looks like;

我试图解决这个问题并在 open() 中添加一个额外的参数，代码看起来像；

for line in open('u.item', encoding='utf-8'):
#read each line

But again it gives the same error. what should I do then! Please help.

但它再次给出了同样的错误。那我该怎么办！请帮忙。

Answer 1

采纳答案by SujitS

As suggested by Mark Ransom, I found the right encoding for that problem. The encoding was "ISO-8859-1", so replacing open("u.item", encoding="utf-8")with open('u.item', encoding = "ISO-8859-1")will solve the problem.

正如 Mark Ransom 所建议的，我找到了解决该问题的正确编码。编码是“ISO-8859-1”，所以替换open("u.item", encoding="utf-8")为open('u.item', encoding = "ISO-8859-1")可以解决问题。

Answer 2

回答by Mark Ransom

Your file doesn't actually contain utf-8 encoded data, it contains some other encoding. Figure out what that encoding is and use it in the opencall.

您的文件实际上并不包含 utf-8 编码数据，它包含一些其他编码。弄清楚该编码是什么并在open调用中使用它。

In Windows-1252 encoding for example the 0xe9would be the character é.

例如，在 Windows-1252 编码中，0xe9将是字符é.

Answer 3

回答by user6832484

If someone looking for these, this is an example for converting a CSV file in Python 3:

如果有人在寻找这些，这是在 Python 3 中转换 CSV 文件的示例：

try:
    inputReader = csv.reader(open(argv[1], encoding='ISO-8859-1'), delimiter=',',quotechar='"')
except IOError:
    pass

Answer 4

回答by Shashank

Try this to read using pandas

试试这个使用熊猫阅读

pd.read_csv('u.item', sep='|', names=m_cols , encoding='latin-1')

Answer 5

回答by Jeril

If you are using Python 2the following will the solution:

如果您正在使用Python 2以下解决方案：

import io
for line in io.open("u.item", encoding="ISO-8859-1"):
    # do something

Because encodingparameter doesn't work with open(), you will be getting the following error:

由于encoding参数不适用于open()，您将收到以下错误：

TypeError: 'encoding' is an invalid keyword argument for this function

Answer 6

回答by Ryoji Kuwae Neto

Also worked for me, ISO 8859-1 is going to save a lot, hahaha, mainly if using Speech Recognition API's

也对我有用，ISO 8859-1 会节省很多，哈哈哈，主要是如果使用语音识别 API 的

Example:

例子：

file = open('../Resources/' + filename, 'r', encoding="ISO-8859-1");

Answer 7

回答by xtluo

Sometimeswhen open(filepath)in which filepathactually is not a file would get the same error, so firstly make sure the file you're trying to open exists:

有时当open(filepath)其中filepath实际上不是文件时会出现相同的错误，因此首先确保您尝试打开的文件存在：

import os
assert os.path.isfile(filepath)

hope this will help.

希望这会有所帮助。

Answer 8

回答by Ozcar Nguyen

You could resolve the problem with:

您可以通过以下方式解决问题：

for line in open(your_file_path, 'rb'):

'rb' is reading file in binary mode. Read more here. Hope this will help!

'rb' 正在以二进制模式读取文件。在这里阅读更多。希望这会有所帮助！

Answer 9

回答by Ayesha Siddiqa

This works:

这有效：

open('filename', encoding='latin-1')

or:

或者：

open('filename',encoding="IS0-8859-1")

Answer 10

回答by FaridLU

you can try this way:

你可以试试这种方式：

open('u.item', encoding='utf8', errors='ignore')

Python “for line in...”导致 UnicodeDecodeError: 'utf-8' codec can't decode byte

提问by SujitS

采纳答案by SujitS

回答by Mark Ransom

回答by user6832484

回答by Shashank

回答by Jeril

回答by Ryoji Kuwae Neto

回答by xtluo

回答by Ozcar Nguyen

回答by Ayesha Siddiqa

回答by FaridLU

相关推荐

最近更新

标签

Python “for line in...”导致 UnicodeDecodeError: 'utf-8' codec can't decode byte

提问by SujitS

采纳答案by SujitS

回答by Mark Ransom

回答by user6832484

回答by Shashank

回答by Jeril

回答by Ryoji Kuwae Neto

回答by xtluo

回答by Ozcar Nguyen

回答by Ayesha Siddiqa

回答by FaridLU

相关推荐

在python中逐级打印二叉树

在 PowerShell 中运行 Python？

Python SKlearn 导入 MLPClassifier 失败

Python 带副本的 Numpy 数组赋值

相关推荐

最近更新

标签