Pandas msgpack 与泡菜

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30651724/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:25:59  来源:igfitidea点击:

Pandas msgpack vs pickle

pythonpandasmsgpack

提问by Alexander

msgpackin Pandas is supposed to be a replacement for pickle.

msgpack在 Pandas 中应该是pickle.

Per the Pandas docs on msgpack:

根据msgpack 上Pandas 文档

This is a lightweight portable binary format, similar to binary JSON, that is highly space efficient, and provides good performance both on the writing (serialization), and reading (deserialization).

这是一种轻量级的可移植二进制格式,类似于二进制 JSON,具有很高的空间效率,并且在写入(序列化)和读取(反序列化)方面都提供了良好的性能。

I find, however, that its performance does not appear to stack up against pickle.

然而,我发现它的性能似乎无法与泡菜相提并论。

df = pd.DataFrame(np.random.randn(10000, 100))

>>> %timeit df.to_pickle('test.p')
10 loops, best of 3: 22.4 ms per loop

>>> %timeit df.to_msgpack('test.msg')
10 loops, best of 3: 36.4 ms per loop

>>> %timeit pd.read_pickle('test.p')
100 loops, best of 3: 10.5 ms per loop

>>> %timeit pd.read_msgpack('test.msg')
10 loops, best of 3: 24.6 ms per loop

Question:Asides from potential security issues with pickle, what are the benefits of msgpack over pickle? Is pickle still the preferred method of serializing data, or do better alternatives currently exist?

问题:除了pickle 的潜在安全问题之外,msgpack 相对于pickle 的好处是什么?pickle 仍然是序列化数据的首选方法,还是目前存在更好的替代方法?

回答by MRocklin

Pickle is better for the following:

Pickle 更适合以下情况:

  1. Numerical data or anything that uses the buffer protocol (numpy arrays) (though only if you use a somewhat recent protocol=)
  2. Python specific objects like classes, functions, etc.. (although here you should look at cloudpickle)
  1. 数值数据或任何使用缓冲协议(numpy 数组)的数据(但前提是您使用的是最近的protocol=
  2. Python 特定对象,如类、函数等。(尽管在这里您应该查看cloudpickle

MsgPack is better for the following:

MsgPack 更适合以下情况:

  1. Cross language interoperation. It's an alternative to JSON with some improvements
  2. Performance on text data and Python objects. It's a decent factor faster than Pickle at this under any setting.
  1. 跨语言互操作。它是 JSON 的替代品,有一些改进
  2. 文本数据和 Python 对象的性能。在任何设置下,这都是比 Pickle 更快的一个不错的因素。

As @Jeff noted above this blogpostmay be of interest

正如@Jeff 上面提到的,这篇博文可能很有趣