Python pickle 协议选择?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23582489/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:11:54  来源:igfitidea点击:

Python pickle protocol choice?

pythonpython-2.7numpypickle

提问by Cobry

I an using python 2.7 and trying to pickle an object. I am wondering what the real difference is between the pickle protocols.

我使用 python 2.7 并尝试腌制一个对象。我想知道泡菜协议之间的真正区别是什么。

import numpy as np
import pickle

class Data(object):
  def __init__(self):
    self.a = np.zeros((100, 37000, 3), dtype=np.float32)

d = Data()
print("data size: ", d.a.nbytes / 1000000.0)
print("highest protocol: ", pickle.HIGHEST_PROTOCOL)
pickle.dump(d, open("noProt", "w"))
pickle.dump(d, open("prot0", "w"), protocol=0)
pickle.dump(d, open("prot1", "w"), protocol=1)
pickle.dump(d, open("prot2", "w"), protocol=2)


out >> data size:  44.4
out >> highest protocol:  2

then I found that the saved files have different sizes on disk:

然后我发现保存的文件在磁盘上有不同的大小:

  • noProt: 177.6MB
  • prot0: 177.6MB
  • prot1: 44.4MB
  • prot2: 44.4MB
  • noProt:177.6MB
  • prot0:177.6MB
  • prot1:44.4MB
  • prot2:44.4MB

I know that prot0is a human readable text file, so I don't want to use it. I guess protocol 0 is the one given by default.

我知道这prot0是一个人类可读的文本文件,所以我不想使用它。我猜协议 0 是默认给出的协议。

I wonder what's the difference between protocols 1 and 2, is there a reason why I should chose one or another?

我想知道协议 1 和协议 2 之间有什么区别,是否有理由选择其中一个?

What's is the better to use, pickleor cPickle?

什么是更好的使用,pickle或者cPickle

回答by Martijn Pieters

Use the latest protocol that supports the lowest Python version you want to support reading the data. Newer protocol versions support new language features and include optimisations.

使用支持您希望支持读取数据的最低 Python 版本的最新协议。较新的协议版本支持新的语言功能并包括优化。

From the picklemodule data format documentation:

pickle模块数据格式文档

There are currently 6 different protocols which can be used for pickling. The higher the protocol used, the more recent the version of Python needed to read the pickle produced.

  • Protocol version 0 is the original “human-readable” protocol and is backwards compatible with earlier versions of Python.
  • Protocol version 1 is an old binary format which is also compatible with earlier versions of Python.
  • Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes. Refer to PEP 307for information about improvements brought by protocol 2.
  • Protocol version 3 was added in Python 3.0. It has explicit support for bytesobjects and cannot be unpickled by Python 2.x. This was the default protocol in Python 3.0–3.7.
  • Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. It is the default protocol starting with Python 3.8. Refer to PEP 3154for information about improvements brought by protocol 4.
  • Protocol version 5 was added in Python 3.8. It adds support for out-of-band data and speedup for in-band data. Refer to PEP 574for information about improvements brought by protocol 5.

If a protocolis not specified, protocol 0 is used. If protocolis specified as a negative value or HIGHEST_PROTOCOL, the highest protocol version available will be used.

目前有 6 种不同的协议可用于酸洗。使用的协议越高,读取生成的泡菜所需的 Python 版本就越新。

  • 协议版本 0 是原始的“人类可读”协议,向后兼容早期版本的 Python。
  • 协议版本 1 是一种旧的二进制格式,它也与早期版本的 Python 兼容。
  • 协议版本 2 是在 Python 2.3 中引入的。它提供了更有效的新型类酸洗。有关协议 2 带来的改进的信息,请参阅PEP 307
  • Python 3.0 中添加了协议版本 3。它对bytes对象有明确的支持,并且不能被 Python 2.x 取消。这是 Python 3.0-3.7 中的默认协议。
  • Python 3.4 中添加了协议版本 4。它增加了对超大对象的支持,酸洗更多种类的对象,以及一些数据格式优化。它是从 Python 3.8 开始的默认协议。有关协议 4 带来的改进的信息,请参阅PEP 3154
  • Python 3.8 中添加了协议版本 5。它增加了对带外数据的支持和对带内数据的加速。有关协议 5 带来的改进的信息,请参阅PEP 574

如果一个协议没有指定,协议0被使用。如果协议指定为负值或HIGHEST_PROTOCOL,将使用可用的最高协议版本。

So when you want to support loading the pickled data with Python 3.4 or newer, pick protocol 4. If you need to support Python 2.7 still, pick protocol 2, especiallyif you are using custom classes derived from object(new-style classes) (which any modern code does, these days).

因此,当您想支持使用 Python 3.4 或更高版本加载腌制数据时,请选择协议 4。如果您仍然需要支持 Python 2.7,请选择协议 2, 特别是如果您使用派生自object(新式类)的自定义类(其中如今,任何现代代码都可以)。

However, if you are exchanging pickled data with other Python versions or otherwise need to maintain backwards compatibility with older Python versions, it's easiest to just stick with the highest protocol version you can lay your hands on:

但是,如果您要与其他 Python 版本交换腌制数据,或者需要保持与旧 Python 版本的向后兼容性,最简单的方法是坚持使用您可以使用的最高协议版本:

with open("prot2", 'wb') as pfile:
    pickle.dump(d, pfile, protocol=pickle.HIGHEST_PROTOCOL)

pickle.HIGHEST_PROTOCOLwill always be the right version for the current Python version. Because this is a binary format, make sure to use 'wb'as the file mode!

pickle.HIGHEST_PROTOCOL将始终是当前 Python 版本的正确版本。因为这是二进制格式,请务必使用'wb'as 文件模式!

Python 3 no longer distinguishes between cPickleand pickle, always use picklewhen using Python 3. It uses a compiled C extension under the hood.

Python 3 不再区分cPicklepicklepickle在使用 Python 3 时总是使用。它在底层使用编译的 C 扩展。

If you are still using Python 2, then cPickleand pickleare mostly compatible, the differences lie in the API offered. For most use-cases, just stick with cPickle; it is faster. Quoting the documentationagain:

如果你还在使用Python 2,然后cPicklepickle多兼容的不同之处在于所提供的的API中。对于大多数用例,只需坚持使用cPickle; 它更快。再次引用文档

First, cPicklecan be up to 1000 times faster than pickle because the former is implemented in C. Second, in the cPicklemodule the callables Pickler()and Unpickler()are functions, not classes. This means that you cannot use them to derive custom pickling and unpickling subclasses. Most applications have no need for this functionality and should benefit from the greatly improved performance of the cPicklemodule.

首先,cPickle可高达1000倍的速度比咸菜,因为前者是C.第二实施中,在cPickle模块的可调用Pickler()Unpickler()是函数,不是类。这意味着您不能使用它们来派生自定义酸洗和取消酸洗子类。大多数应用程序不需要此功能,应该受益于cPickle模块性能的极大改进。

回答by patapouf_ai

For people using Python 3, there are, as of Python 3.5, five possible protocols to choose from:

对于使用 Python 3 的人,从 Python 3.5 开始,有五种可能的协议可供选择:

There are currently 5 different protocols which can be used for pickling. The higher the protocol used, the more recent the version of Python needed to read the pickle produced [doc]:

目前有 5 种不同的协议可用于酸洗。使用的协议越高,读取生成的泡菜所需的 Python 版本就越新 [ doc]:

  • Protocol version 0 is the original “human-readable” protocol and is backwards compatible with earlier versions of Python.

  • Protocol version 1 is an old binary format which is also compatible with earlier versions of Python.

  • Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes. Refer to PEP 307 for information about improvements brought by protocol 2.
  • Protocol version 3 was added in Python 3.0. It has explicit support for bytes objects and cannot be unpickled by Python 2.x. This is the default protocol, and the recommended protocol when compatibility with other Python 3 versions is required.
  • Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. Refer to PEP 3154 for information about improvements brought by protocol 4.
  • Protocol version 5 was added in Python 3.8. It adds support for out-of-band data and speedup for in-band data. Refer to PEP 574 for information about improvements brought by protocol 5.
  • 协议版本 0 是原始的“人类可读”协议,向后兼容早期版本的 Python。

  • 协议版本 1 是一种旧的二进制格式,它也与早期版本的 Python 兼容。

  • 协议版本 2 是在 Python 2.3 中引入的。它提供了更有效的新型类酸洗。有关协议 2 带来的改进的信息,请参阅 PEP 307。
  • Python 3.0 中添加了协议版本 3。它对字节对象有明确的支持,并且不能被 Python 2.x 取消。这是默认协议,当需要兼容其他 Python 3 版本时推荐使用的协议。
  • Python 3.4 中添加了协议版本 4。它增加了对超大对象的支持,酸洗更多种类的对象,以及一些数据格式优化。有关协议 4 带来的改进的信息,请参阅 PEP 3154。
  • Python 3.8 中添加了协议版本 5。它增加了对带外数据的支持和对带内数据的加速。有关协议 5 带来的改进的信息,请参阅 PEP 574。

A general rule is that you should use the highest possible protocol that is backward compatible with what you want to use it for. So if you want it to be backward compatible with Python 2, then protocol version 2 is a good choice, if you want it to be backward compatible with all Python versions then version 1 is good. If you do not care about backward compatibility then using pickle.HIGHEST_PROTOCOLautomatically gives you the highest protocol for your Python version.

一般规则是您应该使用与您想要使用它的目的向后兼容的最高协议。因此,如果您希望它与 Python 2 向后兼容,那么协议版本 2 是一个不错的选择,如果您希望它与所有 Python 版本向后兼容,那么版本 1 是不错的选择。如果您不关心向后兼容性,则使用pickle.HIGHEST_PROTOCOL自动为您提供 Python 版本的最高协议。

Also in Python 3, importing pickleautomatically imports the C implementation.

同样在 Python 3 中,导入会pickle自动导入 C 实现。

Another point to note in terms of compatibility is that, by default protocols 3 and 4 use unicode encoding of strings whereas earlier protocols do not. So in Python 3, if you load a pickled file which was pickled in Python 2, you will probably have to explicitly specify the encoding in order to load it properly.

在兼容性方面要注意的另一点是,默认情况下,协议 3 和 4 使用字符串的 unicode 编码,而较早的协议则不使用。因此,在 Python 3 中,如果您加载在 Python 2 中腌制的腌制文件,您可能必须明确指定编码才能正确加载它。