Python UnicodeDecodeError: 'utf-8' 编解码器无法解码字节错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24632298/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 04:54:30  来源:igfitidea点击:

UnicodeDecodeError: 'utf-8' codec can't decode byte error

pythonencodingutf-8urllib

提问by user1641071

I'm trying to get a response from urlliband decode it to a readable format. The text is in Hebrew and also contains characters like {and /

我正在尝试获取响应urllib并将其解码为可读格式。文本是希伯来语,还包含像{和这样的字符/

top page coding is:

首页编码是:

# -*- coding: utf-8 -*-

raw string is:

原始字符串是:

b'\xff\xfe{\x00 \x00\r\x00\n\x00"\x00i\x00d\x00"\x00 \x00:\x00 \x00"\x001\x004\x000\x004\x008\x003\x000\x000\x006\x004\x006\x009\x006\x00"\x00,\x00\r\x00\n\x00"\x00t\x00i\x00t\x00l\x00e\x00"\x00 \x00:\x00 \x00"\x00\xe4\x05\xd9\x05\xe7\x05\xd5\x05\xd3\x05 \x00\xd4\x05\xe2\x05\xd5\x05\xe8\x05\xe3\x05 \x00\xd4\x05\xea\x05\xe8\x05\xe2\x05\xd4\x05 \x00\xd1\x05\xde\x05\xe8\x05\xd7\x05\xd1\x05 \x00"\x00,\x00\r\x00\n\x00"\x00d\x00a\x00t\x00a\x00"\x00 \x00:\x00 \x00[\x00]\x00\r\x00\n\x00}\x00\r\x00\n\x00\r\x00\n\x00'

Now I'm trying to decode it using:

现在我正在尝试使用以下方法对其进行解码:

 data = data.decode()

and I get the following error:

我收到以下错误:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

回答by Martijn Pieters

Your problem is that that is not UTF-8. You have UTF-16encoded data, decode it as such:

你的问题是那不是 UTF-8。您有UTF-16编码的数据,将其解码为:

>>> data = b'\xff\xfe{\x00 \x00\r\x00\n\x00"\x00i\x00d\x00"\x00 \x00:\x00 \x00"\x001\x004\x000\x004\x008\x003\x000\x000\x006\x004\x006\x009\x006\x00"\x00,\x00\r\x00\n\x00"\x00t\x00i\x00t\x00l\x00e\x00"\x00 \x00:\x00 \x00"\x00\xe4\x05\xd9\x05\xe7\x05\xd5\x05\xd3\x05 \x00\xd4\x05\xe2\x05\xd5\x05\xe8\x05\xe3\x05 \x00\xd4\x05\xea\x05\xe8\x05\xe2\x05\xd4\x05 \x00\xd1\x05\xde\x05\xe8\x05\xd7\x05\xd1\x05 \x00"\x00,\x00\r\x00\n\x00"\x00d\x00a\x00t\x00a\x00"\x00 \x00:\x00 \x00[\x00]\x00\r\x00\n\x00}\x00\r\x00\n\x00\r\x00\n\x00'
>>> data.decode('utf16')
'{ \r\n"id" : "1404830064696",\r\n"title" : "????? ????? ????? ????? ",\r\n"data" : []\r\n}\r\n\r\n'
>>> import json
>>> json.loads(data.decode('utf16'))
{'title': '????? ????? ????? ????? ', 'id': '1404830064696', 'data': []}

If you loaded this from a website with urllib.request, the Content-Typeheader shouldcontain a charsetparameter telling you this; if responseis the returned urllib.requestresponse object, then use:

如果你从一个网站加载了这个urllib.requestContent-Type标题应该包含一个charset告诉你这个的参数;如果response是返回的urllib.request响应对象,则使用:

codec = response.info().get_content_charset('utf-8')

This defaults to UTF-8 when no charsetparameter has been set, which is the appropriate default for JSON data.

charset未设置参数时默认为 UTF-8 ,这是 JSON 数据的适当默认值。

Alternatively, use the requestslibraryto load the JSON response, it handles decoding automatically (including UTF-codec autodetection specific to JSON responses).

或者,使用该requests加载 JSON 响应,它会自动处理解码(包括特定于 JSON 响应的 UTF-codec 自动检测)。

One further note: the PEP 263 source code codec commentis used onlyto interpret your source code, including string literals. It has nothing to do with encodings of external sources (files, network data, etc.).

一个进一步注:PEP 263源代码编解码注释是用来唯一解释你的源代码,其中包括字符串常量。它与外部源(文件、网络数据等)的编码无关。

回答by Aaron Lelevier

I got this error in Djangowith Python 3.4. I was trying to get this to work with django-rest-framework.

我得到这个错误Django使用Python 3.4。我试图让它django-rest-framework 一起使用

This was my code that fixed the error UnicodeDecodeError: 'utf-8' codec can't decode byte error.

这是我修复错误UnicodeDecodeError: 'utf-8' codec can't decode byte error 的代码

This is the passing test:

这是通过的测试:

import os
from os.path import join, dirname
import uuid
from rest_framework.test import APITestCase

class AttachmentTests(APITestCase):

    def setUp(self):
        self.base_dir = dirname(dirname(dirname(__file__)))

        self.image = join(self.base_dir, "source/test_in/aaron.jpeg")
        self.image_filename = os.path.split(self.image)[1]

    def test_create_image(self):
        id = str(uuid.uuid4())
        with open(self.image, 'rb') as data:
            # data = data.read()
            post_data = {
                'id': id,
                'filename': self.image_filename,
                'file': data
            }

            response = self.client.post("/api/admin/attachments/", post_data)

            self.assertEqual(response.status_code, 201)