在 JSON 中序列化 base64 编码的数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37225035/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 18:26:07  来源:igfitidea点击:

Serialize in JSON a base64 encoded data

jsonpython-3.xserializationbase64

提问by frollo

I'm writing a script to automate data generation for a demo and I need to serialize in a JSON some data. Part of this data is an image, so I encoded it in base64, but when I try to run my script I get:

我正在编写一个脚本来为演示自动生成数据,我需要在 JSON 中序列化一些数据。此数据的一部分是图像,因此我使用 base64 对其进行了编码,但是当我尝试运行我的脚本时,我得到:

Traceback (most recent call last):
  File "lazyAutomationScript.py", line 113, in <module>
    json.dump(out_dict, outfile)
  File "/usr/lib/python3.4/json/__init__.py", line 178, in dump
    for chunk in iterable:
  File "/usr/lib/python3.4/json/encoder.py", line 422, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/lib/python3.4/json/encoder.py", line 396, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.4/json/encoder.py", line 396, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.4/json/encoder.py", line 429, in _iterencode
    o = _default(o)
  File "/usr/lib/python3.4/json/encoder.py", line 173, in default
    raise TypeError(repr(o) + " is not JSON serializable")
  TypeError: b'iVBORw0KGgoAAAANSUhEUgAADWcAABRACAYAAABf7ZytAAAABGdB...
     ...
   BF2jhLaJNmRwAAAAAElFTkSuQmCC' is not JSON serializable

As far as I know, a base64-encoded-whatever (a PNG image, in this case) is just a string, so it should pose to problem to serializating. What am I missing?

据我所知,base64-encoded-whatever(在这种情况下为PNG图像)只是一个字符串,因此它应该对序列化造成问题。我错过了什么?

回答by spky

You must be careful about the datatypes.

您必须注意数据类型。

If you read a binary image, you get bytes. If you encode these bytes in base64, you get ... bytes again! (see documentation on b64encode)

如果您读取二进制图像,则会得到字节。如果你用 base64 编码这些字节,你会再次得到......字节!(请参阅有关b64encode 的文档)

json can't handle raw bytes, that's why you get the error.

json 无法处理原始字节,这就是您收到错误的原因。

I have just written some example, with comments, I hope it helps:

我刚刚写了一些例子,有评论,我希望它有帮助:

from base64 import b64encode
from json import dumps

ENCODING = 'utf-8'
IMAGE_NAME = 'spam.jpg'
JSON_NAME = 'output.json'

# first: reading the binary stuff
# note the 'rb' flag
# result: bytes
with open(IMAGE_NAME, 'rb') as open_file:
    byte_content = open_file.read()

# second: base64 encode read data
# result: bytes (again)
base64_bytes = b64encode(byte_content)

# third: decode these bytes to text
# result: string (in utf-8)
base64_string = base64_bytes.decode(ENCODING)

# optional: doing stuff with the data
# result here: some dict
raw_data = {IMAGE_NAME: base64_string}

# now: encoding the data to json
# result: string
json_data = dumps(raw_data, indent=2)

# finally: writing the json string to disk
# note the 'w' flag, no 'b' needed as we deal with text here
with open(JSON_NAME, 'w') as another_open_file:
    another_open_file.write(json_data)

回答by ssubbotin

Alternative solution would be encoding stuff on the fly with a custom encoder:

替代解决方案是使用自定义编码器动态编码内容:

import json
from base64 import b64encode

class Base64Encoder(json.JSONEncoder):
    # pylint: disable=method-hidden
    def default(self, o):
        if isinstance(o, bytes):
            return b64encode(o).decode()
        return json.JSONEncoder.default(self, o)

Having that defined you can do:

定义后,您可以执行以下操作:

m = {'key': b'\x9c\x13\xff\x00'}
json.dumps(m, cls=Base64Encoder)

It will produce:

它将产生:

'{"key": "nBP/AA=="}'

回答by Filippo Vitale

What am I missing?

我错过了什么?

The error is yelling that a binaryis not JSON serializable.

错误是大喊 abinary不是 JSON 可序列化的。

from base64 import b64encode

# *binary representation* of the base64 string
assert b64encode(b"binary content")                 == b'YmluYXJ5IGNvbnRlbnQ='

# base64 string
assert b64encode(b"binary content").decode('utf-8') ==  'YmluYXJ5IGNvbnRlbnQ='

The latter is definitely "JSON serializable" because is the base64 string representation of the binary b"binary content".

后者绝对是“JSON 可序列化”,因为它是二进制b"binary content".