Python “utf-8”编解码器无法解码字节 0x80
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36825972/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
'utf-8' codec can't decode byte 0x80
提问by Ehab AlBadawy
I'm trying to download BVLC-trained model and I'm stuck with this error
我正在尝试下载 BVLC 训练模型,但遇到此错误
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 110: invalid start byte
I think it's because of the following function (complete code)
我认为这是因为以下功能(完整代码)
# Closure-d function for checking SHA1.
def model_checks_out(filename=model_filename, sha1=frontmatter['sha1']):
with open(filename, 'r') as f:
return hashlib.sha1(f.read()).hexdigest() == sha1
Any idea how to fix this?
知道如何解决这个问题吗?
回答by Martijn Pieters
You are opening a file that is not UTF-8 encoded, while the default encoding for your system is set to UTF-8.
您打开的文件不是 UTF-8 编码的,而系统的默认编码设置为 UTF-8。
Since you are calculating a SHA1 hash, you should read the data as binaryinstead. The hashlib
functions require you pass in bytes:
由于您正在计算 SHA1 哈希,因此您应该将数据作为二进制读取。这些hashlib
函数要求您传入字节:
with open(filename, 'rb') as f:
return hashlib.sha1(f.read()).hexdigest() == sha1
Note the addition of b
in the file mode.
注意b
在文件模式中的添加。
See the open()
documentation:
请参阅open()
文档:
modeis an optional string that specifies the mode in which the file is opened. It defaults to
'r'
which means open for reading in text mode. [...]In text mode, if encodingis not specified the encoding used is platform dependent:locale.getpreferredencoding(False)
is called to get the current locale encoding. (For reading and writing raw bytes use binary mode and leave encodingunspecified.)
mode是一个可选字符串,用于指定打开文件的模式。它默认为
'r'
这意味着以文本模式打开阅读。[...]在文本模式下,如果编码未指定使用的编码是与平台相关的:locale.getpreferredencoding(False)
被称为获取当前的本地编码。(对于读取和写入原始字节,请使用二进制模式并且不指定编码。)
and from the hashlib
module documentation:
并从hashlib
模块文档:
You can now feed this object with bytes-like objects (normally bytes) using the update() method.
您现在可以使用 update() 方法为这个对象提供类似字节的对象(通常是字节)。
回答by DSM
You didn't specify to open the file in binary mode, so f.read()
is trying to read the file as a UTF-8-encoded text file, which doesn't seem to be working. But since we take the hash of bytes, not of strings, it doesn't matter what the encoding is, or even whether the file is text at all: just open it, and then read it, as a binary file.
您没有指定以二进制模式打开文件,因此f.read()
尝试将该文件作为 UTF-8 编码的文本文件读取,这似乎不起作用。但是由于我们采用bytes的哈希值,而不是strings的哈希值,因此编码是什么,甚至文件是否是文本都无关紧要:只需打开它,然后将其作为二进制文件读取即可。
>>> with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
Traceback (most recent call last):
File "<ipython-input-3-fdba09d5390b>", line 1, in <module>
with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
File "/home/dsm/sys/pys/Python-3.5.1-bin/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb8 in position 10: invalid start byte
but
但
>>> with open("test.h5.bz2","rb") as f: print(hashlib.sha1(f.read()).hexdigest())
21bd89480061c80f347e34594e71c6943ca11325
回答by 4F2E4A2E
Since there is not a single hint in the documentation nor src code, I have no clue why, but using the b char (i guess for binary) totally works (tf-version: 1.1.0):
由于文档和 src 代码中没有任何提示,我不知道为什么,但使用 b 字符(我猜是二进制)完全有效(tf-version:1.1.0):
image_data = tf.gfile.FastGFile(filename, 'rb').read()