python3中的_pickle不适用于大数据保存

Question

提问by Jake0x32

I am trying to apply _pickleto save data onto disk. But when calling _pickle.dump, I got an error

我正在尝试申请_pickle将数据保存到磁盘上。但是在调用时_pickle.dump出现错误

OverflowError: cannot serialize a bytes object larger than 4 GiB

Is this a hard limitation to use _pickle? (cPicklefor python2)

这是一个难以使用的限制_pickle吗？（cPickle对于python2）

Answer 1

回答by Martijn Pieters

Yes, this is a hard-coded limit; from save_bytesfunction:

是的，这是一个硬编码限制；从save_bytes功能：

else if (size <= 0xffffffffL) {
    // ...
}
else {
    PyErr_SetString(PyExc_OverflowError,
                    "cannot serialize a bytes object larger than 4 GiB");
    return -1;          /* string too large */
}

The protocol uses 4 bytes to write the size of the object to disk, which means you can only track sizes of up to 2³²== 4GB.

该协议使用 4 个字节将对象的大小写入磁盘，这意味着您最多只能跟踪 2 ³²== 4GB 的大小。

If you can break up the bytesobject into multiple objects, each smaller than 4GB, you can still save the data to a pickle, of course.

如果您可以将bytes对象分解为多个对象，每个对象都小于 4GB，当然您仍然可以将数据保存到泡菜中。

Answer 2

回答by Eric Levieil

Not anymore in Python 3.4 which has PEP 3154 and Pickle 4.0
https://www.python.org/dev/peps/pep-3154/

在 Python 3.4 中不再有 PEP 3154 和 Pickle 4.0
https://www.python.org/dev/peps/pep-3154/

But you need to say you want to use version 4 of the protocol:
https://docs.python.org/3/library/pickle.html

但是你需要说你想使用协议的第 4 版：https:
//docs.python.org/3/library/pickle.html

pickle.dump(d, open("file", 'w'), protocol=4)

Answer 3

回答by rts1

There is a great answers above for why pickle doesn't work. But it still doesn't work for Python 2.7, which is a problem if you are are still at Python 2.7 and want to support large files, especially NumPy (NumPy arrays over 4G fail).

关于泡菜不起作用的原因，上面有一个很好的答案。但它仍然不适用于 Python 2.7，如果你还在 Python 2.7 并且想要支持大文件，尤其是 NumPy（超过 4G 的 NumPy 数组失败），这是一个问题。

You can use OC serialization, which has been updated to work for data over 4Gig. There is a Python C Extension module available from:

您可以使用 OC 序列化，该序列化已更新为适用于超过 4Gig 的数据。有一个 Python C 扩展模块可用：

http://www.picklingtools.com/Downloads

Take a look at the Documentation:

看一下文档：

http://www.picklingtools.com/html/faq.html#python-c-extension-modules-new-as-of-picklingtools-1-6-0-and-1-3-3

But, here's a quick summary: there's ocdumps and ocloads, very much like pickle's dumps and loads::

但是，这里有一个快速总结：有 ocdumps 和 ocloads，非常像 pickle 的转储和加载：

from pyocser import ocdumps, ocloads
ser = ocdumps(pyobject)   : Serialize pyobject into string ser
pyobject = ocloads(ser)   : Deserialize from string ser into pyobject

The OC Serialization is 1.5-2x faster and also works with C++ (if you are mixing langauges). It works with all built-in types, but not classes (partly because it is cross-language and it's hard to build C++ classes from Python).

OC 序列化速度提高了 1.5-2 倍，并且也适用于 C++（如果您正在混合语言）。它适用于所有内置类型，但不适用于类（部分原因是它是跨语言的，并且很难从 Python 构建 C++ 类）。

python3中的_pickle不适用于大数据保存

提问by Jake0x32

回答by Martijn Pieters

回答by Eric Levieil

回答by rts1

相关推荐

最近更新

标签

python3中的_pickle不适用于大数据保存

提问by Jake0x32

回答by Martijn Pieters

回答by Eric Levieil

回答by rts1

相关推荐

Python 在最后一个正斜杠之前删除部分字符串

使用 Selenium 和 Python 搜索 Google

在 Python 中创建垂直 NumPy 数组

Python 如何在熊猫中读取带有分号分隔符的文件

相关推荐

最近更新

标签