Python UnicodeDecodeError:'gbk' 编解码器无法解码位置 0 中的字节 0x80 非法多字节序列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28165639/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
UnicodeDecodeError:'gbk' codec can't decode byte 0x80 in position 0 illegal multibyte sequence
提问by Haoyu
I use python 3.4 with win 7 64-bit system. I ran the following code:
我在 win 7 64 位系统上使用 python 3.4。我运行了以下代码:
6 """ load single batch of cifar """
7 with open(filename, 'r') as f:
----> 8 datadict = pickle.load(f)
9 X = datadict['data']
The wrong message is UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 0: illegal multibyte sequence
错误的信息是 UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 0: illegal multibyte sequence
I changed the line 7 as:
我将第 7 行更改为:
6 """ load single batch of cifar """
7 with open(filename, 'r',encoding='utf-8') as f:
----> 8 datadict = pickle.load(f)
9 X = datadict['data']
The wrong message became UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
.
错误的信息变成了UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
。
The message finally points to the Python34\lib\codecs.py in decode(self, input, final).
该消息最终指向 decode(self, input, final) 中的 Python34\lib\codecs.py。
311 # decode input (taking the buffer into account)
312 data = self.buffer + input
--> 313 (result, consumed) = self._buffer_decode(data, self.errors, final)
314 # keep undecoded input until the next call
315 self.buffer = data[consumed:]
I further changed the code as:
我进一步将代码更改为:
6 """ load single batch of cifar """
7 with open(filename, 'rb') as f:
----> 8 datadict = pickle.load(f)
9 X = datadict['data'] 10 Y = datadict['labels']
Well, this time is UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128)
.
嗯,这次是UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128)
。
What is the problem and how to solve it?
问题是什么以及如何解决?
回答by Martijn Pieters
Pickle files are binary data files, so you always have to open the file with the 'rb'
mode when loading. Don't try to use a text mode here.
Pickle 文件是二进制数据文件,因此'rb'
加载时您必须始终使用模式打开文件。不要在这里尝试使用文本模式。
You are trying to load a Python 2 pickle that contains string data. You'll have to tell pickle.load()
how to convert that data to Python 3 strings, or to leave them as bytes.
您正在尝试加载包含字符串数据的 Python 2 pickle。您必须说明pickle.load()
如何将该数据转换为 Python 3 字符串,或将它们保留为字节。
The default is to try and decode those strings as ASCII, and that decoding fails. See the pickle.load()
documentation:
默认是尝试将这些字符串解码为 ASCII,但解码失败。请参阅pickle.load()
文档:
Optional keyword arguments are fix_imports, encodingand errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_importsis true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encodingand errorstell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII' and ‘strict', respectively. The encodingcan be ‘bytes' to read these 8-bit string instances as bytes objects.
可选的关键字参数是fix_imports、encoding和errors,用于控制对 Python 2 生成的 pickle 流的兼容性支持。如果fix_imports为 true,pickle 将尝试将旧的 Python 2 名称映射到 Python 3 中使用的新名称。encoding和errors告诉 pickle 如何解码 Python 2 腌制的 8 位字符串实例;这些分别默认为 'ASCII' 和 'strict'。该编码可以是“字节”来读取这些8位串实例作为字节对象。
Setting the encoding to latin1
allows you to import the data directly:
将编码设置为latin1
允许您直接导入数据:
with open(filename, 'rb') as f:
datadict = pickle.load(f, encoding='latin1')
It appears that it is the numpy
array data that is causing the problems here as all strings in the set use ASCII characters only.
似乎是numpy
数组数据导致了这里的问题,因为集合中的所有字符串都只使用 ASCII 字符。
The alternative would by to use encoding='bytes'
but then all the filenames and top-level dictionary keys are bytes
objects and you'd have to decode those or prefix all your key literals with b
.
另一种方法是使用,encoding='bytes'
但所有文件名和顶级字典键都是bytes
对象,您必须解码这些或在所有键文字前加上b
.
回答by varuscn
if you will open file with utf-8,then you need write: open(file_name, 'r', encoding='UTF-8') if you will open file with GBK,then you need do: open(file_name, 'rb') hope to solve your problem!
如果你用 utf-8 打开文件,那么你需要写: open(file_name, 'r', encoding='UTF-8') 如果你用 GBK 打开文件,那么你需要做: open(file_name, 'rb ') 希望能解决您的问题!