python3中的_pickle不适用于大数据保存
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29704139/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
_pickle in python3 doesn't work for large data saving
提问by Jake0x32
I am trying to apply _pickle
to save data onto disk. But when calling _pickle.dump
, I got an error
我正在尝试申请_pickle
将数据保存到磁盘上。但是在调用时_pickle.dump
出现错误
OverflowError: cannot serialize a bytes object larger than 4 GiB
Is this a hard limitation to use _pickle
? (cPickle
for python2)
这是一个难以使用的限制_pickle
吗?(cPickle
对于python2)
回答by Martijn Pieters
Yes, this is a hard-coded limit; from save_bytes
function:
是的,这是一个硬编码限制;从save_bytes
功能:
else if (size <= 0xffffffffL) {
// ...
}
else {
PyErr_SetString(PyExc_OverflowError,
"cannot serialize a bytes object larger than 4 GiB");
return -1; /* string too large */
}
The protocol uses 4 bytes to write the size of the object to disk, which means you can only track sizes of up to 232== 4GB.
该协议使用 4 个字节将对象的大小写入磁盘,这意味着您最多只能跟踪 2 32== 4GB 的大小。
If you can break up the bytes
object into multiple objects, each smaller than 4GB, you can still save the data to a pickle, of course.
如果您可以将bytes
对象分解为多个对象,每个对象都小于 4GB,当然您仍然可以将数据保存到泡菜中。
回答by Eric Levieil
Not anymore in Python 3.4 which has PEP 3154 and Pickle 4.0
https://www.python.org/dev/peps/pep-3154/
在 Python 3.4 中不再有 PEP 3154 和 Pickle 4.0
https://www.python.org/dev/peps/pep-3154/
But you need to say you want to use version 4 of the protocol:
https://docs.python.org/3/library/pickle.html
但是你需要说你想使用协议的第 4 版:https:
//docs.python.org/3/library/pickle.html
pickle.dump(d, open("file", 'w'), protocol=4)
回答by rts1
There is a great answers above for why pickle doesn't work. But it still doesn't work for Python 2.7, which is a problem if you are are still at Python 2.7 and want to support large files, especially NumPy (NumPy arrays over 4G fail).
关于泡菜不起作用的原因,上面有一个很好的答案。但它仍然不适用于 Python 2.7,如果你还在 Python 2.7 并且想要支持大文件,尤其是 NumPy(超过 4G 的 NumPy 数组失败),这是一个问题。
You can use OC serialization, which has been updated to work for data over 4Gig. There is a Python C Extension module available from:
您可以使用 OC 序列化,该序列化已更新为适用于超过 4Gig 的数据。有一个 Python C 扩展模块可用:
http://www.picklingtools.com/Downloads
http://www.picklingtools.com/Downloads
Take a look at the Documentation:
看一下文档:
But, here's a quick summary: there's ocdumps and ocloads, very much like pickle's dumps and loads::
但是,这里有一个快速总结:有 ocdumps 和 ocloads,非常像 pickle 的转储和加载:
from pyocser import ocdumps, ocloads
ser = ocdumps(pyobject) : Serialize pyobject into string ser
pyobject = ocloads(ser) : Deserialize from string ser into pyobject
The OC Serialization is 1.5-2x faster and also works with C++ (if you are mixing langauges). It works with all built-in types, but not classes (partly because it is cross-language and it's hard to build C++ classes from Python).
OC 序列化速度提高了 1.5-2 倍,并且也适用于 C++(如果您正在混合语言)。它适用于所有内置类型,但不适用于类(部分原因是它是跨语言的,并且很难从 Python 构建 C++ 类)。