Python 错误 UnicodeDecodeError: 'utf-8' 编解码器无法解码位置 0 中的字节 0xff：起始字节无效

Question

提问by pie

https://github.com/affinelayer/pix2pix-tensorflow/tree/master/tools

An error occurred when compiling "process.py" on the above site.

在上述站点上编译“process.py”时发生错误。

 python tools/process.py --input_dir data --            operation resize --outp
ut_dir data2/resize
data/0.jpg -> data2/resize/0.png

Traceback (most recent call last):

回溯（最近一次调用最后一次）：

File "tools/process.py", line 235, in <module>
  main()
File "tools/process.py", line 167, in main
  src = load(src_path)
File "tools/process.py", line 113, in load
  contents = open(path).read()
      File"/home/user/anaconda3/envs/tensorflow_2/lib/python3.5/codecs.py", line 321, in decode
  (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode     byte 0xff in position 0: invalid start byte

What is the cause of the error? Python's version is 3.5.2.

错误的原因是什么？Python 的版本是 3.5.2。

Answer 1

回答by Alfe

Python tries to convert a byte-array (a byteswhich it assumes to be a utf-8-encoded string) to a unicode string (str). This process of course is a decoding according to utf-8 rules. When it tries this, it encounters a byte sequence which is not allowed in utf-8-encoded strings (namely this 0xff at position 0).

Python 尝试将字节数组（bytes它假定为 utf-8 编码的字符串）转换为 unicode 字符串 ( str)。这个过程当然是按照utf-8规则进行解码。当它尝试这样做时，它遇到了 utf-8 编码字符串中不允许的字节序列（即位置 0 处的这个 0xff）。

Since you did not provide any code we could look at, we only could guess on the rest.

由于您没有提供任何我们可以查看的代码，我们只能猜测其余部分。

From the stack trace we can assume that the triggering action was the reading from a file (contents = open(path).read()). I propose to recode this in a fashion like this:

从堆栈跟踪中，我们可以假设触发操作是从文件 ( contents = open(path).read()) 中读取。我建议以这样的方式重新编码：

with open(path, 'rb') as f:
  contents = f.read()

That bin the mode specifier in the open()states that the file shall be treated as binary, so contentswill remain a bytes. No decoding attempt will happen this way.

这b在该模式说明open()，指出该文件应作为二进制来处理，所以contents仍将是一个bytes。不会以这种方式进行解码尝试。

Answer 2

回答by Nitish Kumar Pal

Use this solution it will strip out (ignore) the characters and return the string without them. Only use this if your need is to strip them not convert them.

使用此解决方案，它将删除（忽略）字符并返回没有它们的字符串。仅当您需要剥离它们而不是转换它们时才使用它。

with open(path, encoding="utf8", errors='ignore') as f:

Using errors='ignore'You'll just lose some characters. but if your don't care about them as they seem to be extra characters originating from a the bad formatting and programming of the clients connecting to my socket server. Then its a easy direct solution. reference

使用errors='ignore'你只会丢失一些字符。但是如果您不关心它们，因为它们似乎是源自连接到我的套接字服务器的客户端的错误格式和编程的额外字符。然后它是一个简单的直接解决方案。参考

Answer 3

回答by tattmoney76

Had an issue similar to this, Ended up using UTF-16 to decode. my code is below.

有一个与此类似的问题，最终使用 UTF-16 进行解码。我的代码在下面。

with open(path_to_file,'rb') as f:
    contents = f.read()
contents = contents.rstrip("\n").decode("utf-16")
contents = contents.split("\r\n")

this would take the file contents as an import, but it would return the code in UTF format. from there it would be decoded and seperated by lines.

这会将文件内容作为导入，但它会以 UTF 格式返回代码。从那里它将被解码并按行分隔。

Answer 4

回答by Ramineni Ravi Teja

Use encoding format ISO-8859-1to solve the issue.

使用编码格式ISO-8859-1来解决这个问题。

Answer 5

回答by Peter Ogden

I've come across this thread when suffering the same error, after doing some research I can confirm, this is an error that happens when you try to decode a UTF-16 file with UTF-8.

我在遇到同样的错误时遇到了这个线程，经过一些研究我可以确认，这是当您尝试使用 UTF-8 解码 UTF-16 文件时发生的错误。

With UTF-16 the first characther (2 bytes in UTF-16) is a Byte Order Mark (BOM), which is used as a decoding hint and doesn't appear as a character in the decoded string. This means the first byte will be either FE or FF and the second, the other.

使用 UTF-16 时，第一个字符（UTF-16 中的 2 个字节）是一个字节顺序标记 (BOM)，它用作解码提示并且不会在解码后的字符串中显示为字符。这意味着第一个字节将是 FE 或 FF，第二个是另一个。

Heavily edited after I found out the real answer

在我找到真正的答案后进行了大量编辑

Answer 6

回答by pradeep karunathilaka

use only

只使用

base64.b64decode(a)

instead of

代替

base64.b64decode(a).decode('utf-8')

Answer 7

回答by Juan Navarrete

If you are on a mac check if you for a hidden file, .DS_Store. After removing the file my program worked.

如果您使用的是 mac，请检查您是否有隐藏文件 .DS_Store。删除文件后，我的程序工作了。

Answer 8

回答by Saif Faidi

if you are receiving data from a serial port, make sure you are using the right baudrate (and the other configs ) : decoding using (utf-8) but the wrong config will generate the same error

如果您从串行端口接收数据，请确保使用正确的波特率（和其他配置）：使用（utf-8）解码但错误的配置将产生相同的错误

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

UnicodeDecodeError: 'utf-8' 编解码器无法解码位置 0 中的字节 0xff：起始字节无效

to check your serial port config on linux use : stty -F /dev/ttyUSBX -a

要在 linux 上检查您的串行端口配置，请使用： stty -F /dev/ttyUSBX -a

Answer 9

回答by Rex131xO

Check the path of the file to be read. My code kept on giving me errors until I changed the path name to present working directory. The error was:

检查要读取的文件的路径。我的代码一直给我错误，直到我将路径名更改为当前工作目录。错误是：

newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Answer 10

回答by Minh Triet

It simply means that one chose the wrong encoding to read the file.

它只是意味着人们选择了错误的编码来读取文件。

On Mac, use file -I file.txtto find the correct encoding. On Linux, use file -i file.txt.

在 Mac 上，用于file -I file.txt查找正确的编码。在 Linux 上，使用file -i file.txt.

Python 错误 UnicodeDecodeError: 'utf-8' 编解码器无法解码位置 0 中的字节 0xff：起始字节无效

提问by pie

回答by Alfe

回答by Nitish Kumar Pal

回答by tattmoney76

回答by Ramineni Ravi Teja

回答by Peter Ogden

回答by pradeep karunathilaka

回答by Juan Navarrete

回答by Saif Faidi

回答by Rex131xO

回答by Minh Triet

相关推荐

最近更新

标签

Python 错误 UnicodeDecodeError: 'utf-8' 编解码器无法解码位置 0 中的字节 0xff：起始字节无效

提问by pie

回答by Alfe

回答by Nitish Kumar Pal

回答by tattmoney76

回答by Ramineni Ravi Teja

回答by Peter Ogden

回答by pradeep karunathilaka

回答by Juan Navarrete

回答by Saif Faidi

回答by Rex131xO

回答by Minh Triet

相关推荐

Python 客户端错误“由对等方重置连接”

Python 中的简单多线程 for 循环

使用 opencv Python 去除图像的背景

Python：字符串替换索引

相关推荐

最近更新

标签