python3中的_pickle不适用于大数据保存

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29704139/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 04:54:18  来源:igfitidea点击:

_pickle in python3 doesn't work for large data saving

pythonpickle

提问by Jake0x32

I am trying to apply _pickleto save data onto disk. But when calling _pickle.dump, I got an error

我正在尝试申请_pickle将数据保存到磁盘上。但是在调用时_pickle.dump出现错误

OverflowError: cannot serialize a bytes object larger than 4 GiB

Is this a hard limitation to use _pickle? (cPicklefor python2)

这是一个难以使用的限制_pickle吗?(cPickle对于python2)

回答by Martijn Pieters

Yes, this is a hard-coded limit; from save_bytesfunction:

是的,这是一个硬编码限制;从save_bytes功能

else if (size <= 0xffffffffL) {
    // ...
}
else {
    PyErr_SetString(PyExc_OverflowError,
                    "cannot serialize a bytes object larger than 4 GiB");
    return -1;          /* string too large */
}

The protocol uses 4 bytes to write the size of the object to disk, which means you can only track sizes of up to 232== 4GB.

该协议使用 4 个字节将对象的大小写入磁盘,这意味着您最多只能跟踪 2 32== 4GB 的大小。

If you can break up the bytesobject into multiple objects, each smaller than 4GB, you can still save the data to a pickle, of course.

如果您可以将bytes对象分解为多个对象,每个对象都小于 4GB,当然您仍然可以将数据保存到泡菜中。

回答by Eric Levieil

Not anymore in Python 3.4 which has PEP 3154 and Pickle 4.0
https://www.python.org/dev/peps/pep-3154/

在 Python 3.4 中不再有 PEP 3154 和 Pickle 4.0
https://www.python.org/dev/peps/pep-3154/

But you need to say you want to use version 4 of the protocol:
https://docs.python.org/3/library/pickle.html

但是你需要说你想使用协议的第 4 版:https:
//docs.python.org/3/library/pickle.html

pickle.dump(d, open("file", 'w'), protocol=4)

回答by rts1

There is a great answers above for why pickle doesn't work. But it still doesn't work for Python 2.7, which is a problem if you are are still at Python 2.7 and want to support large files, especially NumPy (NumPy arrays over 4G fail).

关于泡菜不起作用的原因,上面有一个很好的答案。但它仍然不适用于 Python 2.7,如果你还在 Python 2.7 并且想要支持大文件,尤其是 NumPy(超过 4G 的 NumPy 数组失败),这是一个问题。

You can use OC serialization, which has been updated to work for data over 4Gig. There is a Python C Extension module available from:

您可以使用 OC 序列化,该序列化已更新为适用于超过 4Gig 的数据。有一个 Python C 扩展模块可用:

http://www.picklingtools.com/Downloads

http://www.picklingtools.com/Downloads

Take a look at the Documentation:

看一下文档:

http://www.picklingtools.com/html/faq.html#python-c-extension-modules-new-as-of-picklingtools-1-6-0-and-1-3-3

http://www.picklingtools.com/html/faq.html#python-c-extension-modules-new-as-of-picklingtools-1-6-0-and-1-3-3

But, here's a quick summary: there's ocdumps and ocloads, very much like pickle's dumps and loads::

但是,这里有一个快速总结:有 ocdumps 和 ocloads,非常像 pickle 的转储和加载:

from pyocser import ocdumps, ocloads
ser = ocdumps(pyobject)   : Serialize pyobject into string ser
pyobject = ocloads(ser)   : Deserialize from string ser into pyobject

The OC Serialization is 1.5-2x faster and also works with C++ (if you are mixing langauges). It works with all built-in types, but not classes (partly because it is cross-language and it's hard to build C++ classes from Python).

OC 序列化速度提高了 1.5-2 倍,并且也适用于 C++(如果您正在混合语言)。它适用于所有内置类型,但不适用于类(部分原因是它是跨语言的,并且很难从 Python 构建 C++ 类)。