如何在 Python 3 中pickle 和 unpickle 到可移植字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30469575/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:28:45  来源:igfitidea点击:

How to pickle and unpickle to portable string in Python 3

pythonpython-3.xserializationunicode

提问by Peter Hudec

I need to pickle a Python3 object to a string which I want to unpickle from an environmental variable in a Travis CI build. The problem is that I can't seem to find a way to pickle to a portable string (unicode) in Python3:

我需要将 Python3 对象pickle 为一个字符串,我想从 Travis CI 构建中的环境变量中解压该字符串。问题是我似乎找不到在 Python3 中腌制到可移植字符串(unicode)的方法:

import os, pickle    

from my_module import MyPickleableClass


obj = {'cls': MyPickleableClass, 'other_stuf': '(...)'}

pickled = pickle.dumps(obj)

# raises TypeError: str expected, not bytes
os.environ['pickled'] = pickled

# raises UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb (...)
os.environ['pickled'] = pickled.decode('utf-8')

pickle.loads(os.environ['pickled'])

Is there a way to serialize complex objects like datetime.datetimeto unicode or to some other string representation in Python3 which I can transfer to a different machine and deserialize?

有没有办法序列化复杂的对象,比如datetime.datetimeunicode 或 Python3 中的其他一些字符串表示,我可以将其传输到不同的机器并反序列化?

Update

更新

I have tested the solutions suggested by @kindall, but the pickle.dumps(obj, 0).decode()raises a UnicodeDecodeError. Nevertheless the base64approach works but it needed an extra decode/encodestep. The solution works on both Python2.x and Python3.x.

我已经测试了@kindall 建议的解决方案,但pickle.dumps(obj, 0).decode()会引发UnicodeDecodeError. 尽管如此,base64方法仍然有效,但它需要一个额外的解码/编码步骤。该解决方案适用于 Python2.x 和 Python3.x。

# encode returns bytes so it needs to be decoded to string
pickled = pickle.loads(codecs.decode(pickled.encode(), 'base64')).decode()

type(pickled)  # <class 'str'>

unpickled = pickle.loads(codecs.decode(pickled.encode(), 'base64'))

采纳答案by kindall

pickle.dumps()produces a bytesobject. Expecting these arbitrary bytes to be valid UTF-8 text (the assumption you are making by trying to decode it to a string from UTF-8) is pretty optimistic. It'd be a coincidence if it worked!

pickle.dumps()产生一个bytes对象。期望这些任意字节是有效的 UTF-8 文本(您通过尝试将其解码为 UTF-8 中的字符串所做的假设)是非常乐观的。如果它奏效了,那就是巧合了!

One solution is to use the older pickling protocol that uses entirely ASCII characters. This still comes out as bytes, but since it is ASCII-only it can be decoded to a string without stress:

一种解决方案是使用完全使用 ASCII 字符的旧酸洗协议。这仍然显示为bytes,但由于它仅是 ASCII 码,因此可以毫无压力地解码为字符串:

pickled = pickled.dumps(obj, 0).decode()

You could also use some other encoding method to encode a binary-pickled object to text, such as base64:

您还可以使用其他一些编码方法将二进制腌制对象编码为文本,例如 base64:

import codecs
pickled = codecs.encode(pickle.dumps(obj), "base64").decode()

Decoding would then be:

解码将是:

unpickled = pickle.loads(codecs.decode(pickled.encode(), "base64"))

Using picklewith protocol 0 seems to result in shorter strings than base64-encoding binary pickles (and abarnert's suggestion of hex-encoding is going to be even larger than base64), but I haven't tested it rigorously or anything. Test it with your data and see.

pickle与协议 0 一起使用似乎会导致比 base64 编码的二进制泡菜更短的字符串(并且 abarnert 建议的十六进制编码甚至比 base64 还要大),但我还没有对其进行严格的测试或任何其他东西。用您的数据进行测试并查看。

回答by abarnert

If you want to store bytes in the environment, instead of encoded text, that's what environbis for.

如果您想在环境中存储字节而不是编码文本,这就是environb目的。

This doesn't work on Windows. (As the docs imply, you should check os.supports_bytes_environif you're on 3.2+ instead of just assuming that Unix does and Windows doesn't…) So for that, you'll need to smuggle the bytes into something that can be encoded no matter what your system encoding is, e.g., using backslash-escape, or even hex. So, for example:

这在 Windows 上不起作用。(正如文档所暗示的,你应该检查os.supports_bytes_environ你是否在 3.2+ 上,而不是仅仅假设 Unix 和 Windows 没有......)因此,为此,你需要将字节走私到可以编码的东西中您的系统编码是什么,例如,使用backslash-escape, 甚至hex. 因此,例如:

if os.supports_bytes_environ:
    environb['pickled'] = pickled
else:
    environ['pickled'] = codecs.encode(pickled, 'hex')

回答by abarnert

I think the simplestanswer, especially if you don't care about Windows, is to just store the bytes in the environment, as suggested in my other answer.

我认为最简单的答案,特别是如果您不关心 Windows,就是将字节存储在环境中,正如其他答案中所建议的那样。

But if you want something clean and debuggable, you might be happier using something designed as a text-based format.

但是如果你想要一些干净和可调试的东西,你可能会更高兴使用设计为基于文本的格式的东西。

pickledoes have a "plain text" protocol 0, as explained in kindall's answer. It's certainly more readable than protocol 3 or 4, but it's still not something I'd actually wantto read.

pickle确实有“纯文本”协议 0,如kindall 的回答中所述。它当然比协议 3 或 4 更具可读性,但它仍然不是我真正想要阅读的内容。

JSONis much nicer, but it can't handle datetimeout of the box. You can come up with your own encoding (the stdlib's jsonmodule is extensible) for the handful of types you need to encode, or use something like jsonpickle. It's generally safer, more efficient, and more readable to come up with custom encodings for each type you care about than a general "pack arbitrary types in a turing-complete protocol" scheme like pickleor jsonpickle, but of course it's also more work, especially if you have a lot of extra types.

JSON好得多,但它不能datetime开箱即用。您可以json为需要编码的少数类型提出自己的编码(stdlib 的模块是可扩展的),或者使用类似jsonpickle. 为您关心的每种类型提供自定义编码通常比一般的“在图灵完备协议中打包任意类型”方案(如pickleor )更安全、更高效、更易读jsonpickle,但当然它也需要更多工作,特别是如果你有很多额外的类型。

JSON Schemalets you define languages in JSON, similar to what you'd do in XML. It comes with a built-in date-timeString format, and the jsonschemalibrary for Python knows how to use it.

JSON Schema允许您在 JSON 中定义语言,类似于您在 XML 中所做的。它带有内置的date-timeString 格式jsonschemaPython 库知道如何使用它。

YAMLhas a standard extension repository that includes many types JSON doesn't, including a timestamp. Most of the zillion 'yaml' modules for Pythonalready know how to encode datetimeobjects to and from this type. If you need additional types beyond what YAML includes, it was designed to be extensible declaratively. And there are libraries that do the equivalent of jsonpickle, defining new types on the fly, if you really need that.

YAML有一个标准的扩展存储库,其中包含许多 JSON 没有的类型,包括时间戳。大多数Python 的 'yaml' 模块已经知道如何将datetime对象与这种类型进行编码和编码。如果您需要 YAML 包含的其他类型之外的其他类型,它被设计为可声明性地扩展。jsonpickle如果你真的需要的话,还有一些库可以做相当于,动态定义新类型。

And finally, you can always write an XML language.

最后,您始终可以编写 XML 语言。