Python UnicodeDecodeError: 'utf-8' 编解码器无法解码字节错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24632298/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
UnicodeDecodeError: 'utf-8' codec can't decode byte error
提问by user1641071
I'm trying to get a response from urllib
and decode it
to a readable format. The text is in Hebrew and also contains characters like {
and /
我正在尝试获取响应urllib
并将其解码为可读格式。文本是希伯来语,还包含像{
和这样的字符/
top page coding is:
首页编码是:
# -*- coding: utf-8 -*-
raw string is:
原始字符串是:
b'\xff\xfe{\x00 \x00\r\x00\n\x00"\x00i\x00d\x00"\x00 \x00:\x00 \x00"\x001\x004\x000\x004\x008\x003\x000\x000\x006\x004\x006\x009\x006\x00"\x00,\x00\r\x00\n\x00"\x00t\x00i\x00t\x00l\x00e\x00"\x00 \x00:\x00 \x00"\x00\xe4\x05\xd9\x05\xe7\x05\xd5\x05\xd3\x05 \x00\xd4\x05\xe2\x05\xd5\x05\xe8\x05\xe3\x05 \x00\xd4\x05\xea\x05\xe8\x05\xe2\x05\xd4\x05 \x00\xd1\x05\xde\x05\xe8\x05\xd7\x05\xd1\x05 \x00"\x00,\x00\r\x00\n\x00"\x00d\x00a\x00t\x00a\x00"\x00 \x00:\x00 \x00[\x00]\x00\r\x00\n\x00}\x00\r\x00\n\x00\r\x00\n\x00'
Now I'm trying to decode it using:
现在我正在尝试使用以下方法对其进行解码:
data = data.decode()
and I get the following error:
我收到以下错误:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
回答by Martijn Pieters
Your problem is that that is not UTF-8. You have UTF-16encoded data, decode it as such:
你的问题是那不是 UTF-8。您有UTF-16编码的数据,将其解码为:
>>> data = b'\xff\xfe{\x00 \x00\r\x00\n\x00"\x00i\x00d\x00"\x00 \x00:\x00 \x00"\x001\x004\x000\x004\x008\x003\x000\x000\x006\x004\x006\x009\x006\x00"\x00,\x00\r\x00\n\x00"\x00t\x00i\x00t\x00l\x00e\x00"\x00 \x00:\x00 \x00"\x00\xe4\x05\xd9\x05\xe7\x05\xd5\x05\xd3\x05 \x00\xd4\x05\xe2\x05\xd5\x05\xe8\x05\xe3\x05 \x00\xd4\x05\xea\x05\xe8\x05\xe2\x05\xd4\x05 \x00\xd1\x05\xde\x05\xe8\x05\xd7\x05\xd1\x05 \x00"\x00,\x00\r\x00\n\x00"\x00d\x00a\x00t\x00a\x00"\x00 \x00:\x00 \x00[\x00]\x00\r\x00\n\x00}\x00\r\x00\n\x00\r\x00\n\x00'
>>> data.decode('utf16')
'{ \r\n"id" : "1404830064696",\r\n"title" : "????? ????? ????? ????? ",\r\n"data" : []\r\n}\r\n\r\n'
>>> import json
>>> json.loads(data.decode('utf16'))
{'title': '????? ????? ????? ????? ', 'id': '1404830064696', 'data': []}
If you loaded this from a website with urllib.request
, the Content-Type
header shouldcontain a charset
parameter telling you this; if response
is the returned urllib.request
response object, then use:
如果你从一个网站加载了这个urllib.request
,Content-Type
标题应该包含一个charset
告诉你这个的参数;如果response
是返回的urllib.request
响应对象,则使用:
codec = response.info().get_content_charset('utf-8')
This defaults to UTF-8 when no charset
parameter has been set, which is the appropriate default for JSON data.
charset
未设置参数时默认为 UTF-8 ,这是 JSON 数据的适当默认值。
Alternatively, use the requests
libraryto load the JSON response, it handles decoding automatically (including UTF-codec autodetection specific to JSON responses).
或者,使用该requests
库加载 JSON 响应,它会自动处理解码(包括特定于 JSON 响应的 UTF-codec 自动检测)。
One further note: the PEP 263 source code codec commentis used onlyto interpret your source code, including string literals. It has nothing to do with encodings of external sources (files, network data, etc.).
一个进一步注:PEP 263源代码编解码注释是用来唯一解释你的源代码,其中包括字符串常量。它与外部源(文件、网络数据等)的编码无关。
回答by Aaron Lelevier
I got this error in Django
with Python 3.4
. I was trying to get this to work with django-rest-framework.
我得到这个错误Django
使用Python 3.4
。我试图让它与django-rest-framework 一起使用。
This was my code that fixed the error UnicodeDecodeError: 'utf-8' codec can't decode byte error.
这是我修复错误UnicodeDecodeError: 'utf-8' codec can't decode byte error 的代码。
This is the passing test:
这是通过的测试:
import os
from os.path import join, dirname
import uuid
from rest_framework.test import APITestCase
class AttachmentTests(APITestCase):
def setUp(self):
self.base_dir = dirname(dirname(dirname(__file__)))
self.image = join(self.base_dir, "source/test_in/aaron.jpeg")
self.image_filename = os.path.split(self.image)[1]
def test_create_image(self):
id = str(uuid.uuid4())
with open(self.image, 'rb') as data:
# data = data.read()
post_data = {
'id': id,
'filename': self.image_filename,
'file': data
}
response = self.client.post("/api/admin/attachments/", post_data)
self.assertEqual(response.status_code, 201)