Python UnicodeDecodeError:'gbk' 编解码器无法解码位置 0 中的字节 0x80 非法多字节序列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28165639/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:50:06  来源:igfitidea点击:

UnicodeDecodeError:'gbk' codec can't decode byte 0x80 in position 0 illegal multibyte sequence

pythonencodingpickle

提问by Haoyu

I use python 3.4 with win 7 64-bit system. I ran the following code:

我在 win 7 64 位系统上使用 python 3.4。我运行了以下代码:

      6   """ load single batch of cifar """
      7   with open(filename, 'r') as f:
----> 8     datadict = pickle.load(f)
      9     X = datadict['data']

The wrong message is UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 0: illegal multibyte sequence

错误的信息是 UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 0: illegal multibyte sequence

I changed the line 7 as:

我将第 7 行更改为:

      6   """ load single batch of cifar """
      7   with open(filename, 'r',encoding='utf-8') as f:
----> 8     datadict = pickle.load(f)
      9     X = datadict['data']

The wrong message became UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte.

错误的信息变成了UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

The message finally points to the Python34\lib\codecs.py in decode(self, input, final).

该消息最终指向 decode(self, input, final) 中的 Python34\lib\codecs.py。

    311         # decode input (taking the buffer into account)
    312         data = self.buffer + input
--> 313         (result, consumed) = self._buffer_decode(data, self.errors, final)
    314         # keep undecoded input until the next call
    315         self.buffer = data[consumed:]

I further changed the code as:

我进一步将代码更改为:

      6 """ load single batch of cifar """ 
      7 with open(filename, 'rb') as f:
----> 8 datadict = pickle.load(f) 
      9 X = datadict['data'] 10 Y = datadict['labels']

Well, this time is UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128).

嗯,这次是UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128)

What is the problem and how to solve it?

问题是什么以及如何解决?

回答by Martijn Pieters

Pickle files are binary data files, so you always have to open the file with the 'rb'mode when loading. Don't try to use a text mode here.

Pickle 文件是二进制数据文件,因此'rb'加载时您必须始终使用模式打开文件。不要在这里尝试使用文本模式。

You are trying to load a Python 2 pickle that contains string data. You'll have to tell pickle.load()how to convert that data to Python 3 strings, or to leave them as bytes.

您正在尝试加载包含字符串数据的 Python 2 pickle。您必须说明pickle.load()如何将该数据转换为 Python 3 字符串,或将它们保留为字节。

The default is to try and decode those strings as ASCII, and that decoding fails. See the pickle.load()documentation:

默认是尝试将这些字符串解码为 ASCII,但解码失败。请参阅pickle.load()文档

Optional keyword arguments are fix_imports, encodingand errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_importsis true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encodingand errorstell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII' and ‘strict', respectively. The encodingcan be ‘bytes' to read these 8-bit string instances as bytes objects.

可选的关键字参数是fix_importsencodingerrors,用于控制对 Python 2 生成的 pickle 流的兼容性支持。如果fix_imports为 true,pickle 将尝试将旧的 Python 2 名称映射到 Python 3 中使用的新名称。encodingerrors告诉 pickle 如何解码 Python 2 腌制的 8 位字符串实例;这些分别默认为 'ASCII' 和 'strict'。该编码可以是“字节”来读取这些8位串实例作为字节对象。

Setting the encoding to latin1allows you to import the data directly:

将编码设置为latin1允许您直接导入数据:

with open(filename, 'rb') as f:
    datadict = pickle.load(f, encoding='latin1') 

It appears that it is the numpyarray data that is causing the problems here as all strings in the set use ASCII characters only.

似乎是numpy数组数据导致了这里的问题,因为集合中的所有字符串都只使用 ASCII 字符。

The alternative would by to use encoding='bytes'but then all the filenames and top-level dictionary keys are bytesobjects and you'd have to decode those or prefix all your key literals with b.

另一种方法是使用,encoding='bytes'但所有文件名和顶级字典键都是bytes对象,您必须解码这些或在所有键文字前加上b.

回答by varuscn

if you will open file with utf-8,then you need write: open(file_name, 'r', encoding='UTF-8') if you will open file with GBK,then you need do: open(file_name, 'rb') hope to solve your problem!

如果你用 utf-8 打开文件,那么你需要写: open(file_name, 'r', encoding='UTF-8') 如果你用 GBK 打开文件,那么你需要做: open(file_name, 'rb ') 希望能解决您的问题!