Python UnicodeDecodeError: 'utf-8' 编解码器无法解码字节错误

Question

提问by user1641071

I'm trying to get a response from urlliband decode it to a readable format. The text is in Hebrew and also contains characters like {and /

我正在尝试获取响应urllib并将其解码为可读格式。文本是希伯来语，还包含像{和这样的字符/

top page coding is:

首页编码是：

# -*- coding: utf-8 -*-

raw string is:

原始字符串是：

b'\xff\xfe{\x00 \x00\r\x00\n\x00"\x00i\x00d\x00"\x00 \x00:\x00 \x00"\x001\x004\x000\x004\x008\x003\x000\x000\x006\x004\x006\x009\x006\x00"\x00,\x00\r\x00\n\x00"\x00t\x00i\x00t\x00l\x00e\x00"\x00 \x00:\x00 \x00"\x00\xe4\x05\xd9\x05\xe7\x05\xd5\x05\xd3\x05 \x00\xd4\x05\xe2\x05\xd5\x05\xe8\x05\xe3\x05 \x00\xd4\x05\xea\x05\xe8\x05\xe2\x05\xd4\x05 \x00\xd1\x05\xde\x05\xe8\x05\xd7\x05\xd1\x05 \x00"\x00,\x00\r\x00\n\x00"\x00d\x00a\x00t\x00a\x00"\x00 \x00:\x00 \x00[\x00]\x00\r\x00\n\x00}\x00\r\x00\n\x00\r\x00\n\x00'

Now I'm trying to decode it using:

现在我正在尝试使用以下方法对其进行解码：

 data = data.decode()

and I get the following error:

我收到以下错误：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Answer 1

回答by Martijn Pieters

Your problem is that that is not UTF-8. You have UTF-16encoded data, decode it as such:

你的问题是那不是 UTF-8。您有UTF-16编码的数据，将其解码为：

>>> data = b'\xff\xfe{\x00 \x00\r\x00\n\x00"\x00i\x00d\x00"\x00 \x00:\x00 \x00"\x001\x004\x000\x004\x008\x003\x000\x000\x006\x004\x006\x009\x006\x00"\x00,\x00\r\x00\n\x00"\x00t\x00i\x00t\x00l\x00e\x00"\x00 \x00:\x00 \x00"\x00\xe4\x05\xd9\x05\xe7\x05\xd5\x05\xd3\x05 \x00\xd4\x05\xe2\x05\xd5\x05\xe8\x05\xe3\x05 \x00\xd4\x05\xea\x05\xe8\x05\xe2\x05\xd4\x05 \x00\xd1\x05\xde\x05\xe8\x05\xd7\x05\xd1\x05 \x00"\x00,\x00\r\x00\n\x00"\x00d\x00a\x00t\x00a\x00"\x00 \x00:\x00 \x00[\x00]\x00\r\x00\n\x00}\x00\r\x00\n\x00\r\x00\n\x00'
>>> data.decode('utf16')
'{ \r\n"id" : "1404830064696",\r\n"title" : "????? ????? ????? ????? ",\r\n"data" : []\r\n}\r\n\r\n'
>>> import json
>>> json.loads(data.decode('utf16'))
{'title': '????? ????? ????? ????? ', 'id': '1404830064696', 'data': []}

If you loaded this from a website with urllib.request, the Content-Typeheader shouldcontain a charsetparameter telling you this; if responseis the returned urllib.requestresponse object, then use:

如果你从一个网站加载了这个urllib.request，Content-Type标题应该包含一个charset告诉你这个的参数；如果response是返回的urllib.request响应对象，则使用：

codec = response.info().get_content_charset('utf-8')

This defaults to UTF-8 when no charsetparameter has been set, which is the appropriate default for JSON data.

charset未设置参数时默认为 UTF-8 ，这是 JSON 数据的适当默认值。

Alternatively, use the requestslibraryto load the JSON response, it handles decoding automatically (including UTF-codec autodetection specific to JSON responses).

或者，使用该requests库加载 JSON 响应，它会自动处理解码（包括特定于 JSON 响应的 UTF-codec 自动检测）。

One further note: the PEP 263 source code codec commentis used onlyto interpret your source code, including string literals. It has nothing to do with encodings of external sources (files, network data, etc.).

一个进一步注：PEP 263源代码编解码注释是用来唯一解释你的源代码，其中包括字符串常量。它与外部源（文件、网络数据等）的编码无关。

Answer 2

回答by Aaron Lelevier

I got this error in Djangowith Python 3.4. I was trying to get this to work with django-rest-framework.

我得到这个错误Django使用Python 3.4。我试图让它与django-rest-framework 一起使用。

This was my code that fixed the error UnicodeDecodeError: 'utf-8' codec can't decode byte error.

这是我修复错误UnicodeDecodeError: 'utf-8' codec can't decode byte error 的代码。

This is the passing test:

这是通过的测试：

import os
from os.path import join, dirname
import uuid
from rest_framework.test import APITestCase

class AttachmentTests(APITestCase):

    def setUp(self):
        self.base_dir = dirname(dirname(dirname(__file__)))

        self.image = join(self.base_dir, "source/test_in/aaron.jpeg")
        self.image_filename = os.path.split(self.image)[1]

    def test_create_image(self):
        id = str(uuid.uuid4())
        with open(self.image, 'rb') as data:
            # data = data.read()
            post_data = {
                'id': id,
                'filename': self.image_filename,
                'file': data
            }

            response = self.client.post("/api/admin/attachments/", post_data)

            self.assertEqual(response.status_code, 201)

Python UnicodeDecodeError: 'utf-8' 编解码器无法解码字节错误

提问by user1641071

回答by Martijn Pieters

回答by Aaron Lelevier

相关推荐

最近更新

标签

Python UnicodeDecodeError: 'utf-8' 编解码器无法解码字节错误

提问by user1641071

回答by Martijn Pieters

回答by Aaron Lelevier

相关推荐

使用 Selenium 和 Python 搜索 Google

在 Python 中创建垂直 NumPy 数组

Python 如何在熊猫中读取带有分号分隔符的文件

在python中规范化numpy数组列

相关推荐

最近更新

标签