python memcached 中对象的最佳序列化方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/499593/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 20:14:44  来源:igfitidea点击:

What's the best serialization method for objects in memcached?

pythonserializationxml-serializationmemcachedprotocol-buffers

提问by mb.

My Python application currently uses the python-memcached APIto set and get objects in memcached. This API uses Python's native pickle moduleto serialize and de-serialize Python objects.

我的 Python 应用程序当前使用python-memcached API来设置和获取 memcached 中的对象。此 API 使用 Python 的本机pickle 模块来序列化和反序列化 Python 对象。

This API makes it simple and fast to store nested Python lists, dictionaries and tuples in memcached, and reading these objects back into the application is completely transparent -- it just works.

这个 API 使得在 memcached 中存储嵌套的 Python 列表、字典和元组变得简单和快速,并且将这些对象读回应用程序是完全透明的——它可以正常工作。

But I don't want to be limited to using Python exclusively, and if all the memcached objects are serialized with pickle, then clients written in other languages won't work.

但是我不想局限于只使用 Python,如果所有的 memcached 对象都用 pickle 序列化,那么其他语言编写的客户端将无法运行。

Here are the cross-platform serialization options I've considered:

以下是我考虑过的跨平台序列化选项:

  1. XML - the main benefit is that it's human-readable, but that's not important in this application. XML also takes a lot space, and it's expensive to parse.

  2. JSON - seems like a good cross-platform standard, but I'm not sure it retains the character of object types when read back from memcached. For example, according to this posttuples are transformed into lists when using simplejson; also, it seems like adding elements to the JSON structure could break code written to the old structure

  3. Google Protocol Buffers- I'm really interested in this because it seems very fast and compact -- at least 10 times smaller and faster than XML; it's not human-readable, but that's not important for this app; and it seems designed to support growing the structure without breaking old code

  1. XML - 主要好处是它是人类可读的,但这在这个应用程序中并不重要。XML 还占用大量空间,而且解析成本很高。

  2. JSON - 似乎是一个很好的跨平台标准,但我不确定从 memcached 读回时它是否保留了对象类型的特征。例如,根据这篇文章,元组在使用simplejson时被转换为列表;此外,似乎向 JSON 结构添加元素可能会破坏写入旧结构的代码

  3. Google Protocol Buffers——我对此非常感兴趣,因为它看起来非常快速和紧凑——至少比 XML 小 10 倍,速度也快;它不是人类可读的,但这对这个应用程序并不重要;它似乎旨在支持在不破坏旧代码的情况下扩展结构

Considering the priorities for this app, what's the ideal object serialization method for memcached?

考虑到这个应用程序的优先级,memcached 的理想对象序列化方法是什么?

  1. Cross-platform support (Python, Java, C#, C++, Ruby, Perl)

  2. Handling nested data structures

  3. Fast serialization/de-serialization

  4. Minimum memory footprint

  5. Flexibility to change structure without breaking old code
  1. 跨平台支持(Python、Java、C#、C++、Ruby、Perl)

  2. 处理嵌套数据结构

  3. 快速序列化/反序列化

  4. 最小内存占用

  5. 无需破坏旧代码即可灵活更改结构

采纳答案by mb.

I tried several methods and settled on compressed JSON as the best balance between speed and memory footprint. Python's native Pickle function is slightly faster, but the resulting objects can't be used with non-Python clients.

我尝试了几种方法,最终选择压缩 JSON 作为速度和内存占用之间的最佳平衡。Python 的本机 Pickle 函数稍微快一些,但生成的对象不能用于非 Python 客户端。

I'm seeing 3:1 compression so all the data fits in memcache and the app gets sub-10ms response times including page rendering.

我看到的是 3:1 压缩,因此所有数据都适合内存缓存,并且该应用程序的响应时间低于 10 毫秒,包括页面渲染。

Here's a comparison of JSON, Thrift, Protocol Buffers and YAML, with and without compression:

这是 JSON、Thrift、Protocol Buffers 和 YAML 的比较,有和没有压缩:

http://bouncybouncy.net/ramblings/posts/more_on_json_vs_thrift_and_protocol_buffers/

http://bouncybouncy.net/ramblings/posts/more_on_json_vs_thrift_and_protocol_buffers/

Looks like this test got the same results I did with compressed JSON. Since I don't need to pre-define each structure, this seems like the fastest and smallest cross-platform answer.

看起来这个测试得到了与压缩 JSON 相同的结果。由于我不需要预先定义每个结构,这似乎是最快和最小的跨平台答案。

回答by gahooa

One major consideration is "do you want to have to specify each structure definition"?

一个主要考虑因素是“您是否要指定每个结构定义”

If you are OK with that, then you could take a look at:

如果你同意,那么你可以看看:

  1. Protocol Buffers - http://code.google.com/apis/protocolbuffers/docs/overview.html
  2. Thrift - http://developers.facebook.com/thrift/(more geared toward services)
  1. 协议缓冲区 - http://code.google.com/apis/protocolbuffers/docs/overview.html
  2. Thrift - http://developers.facebook.com/thrift/(更适合服务)

Both of these solutions require supporting files to define each data structure.

这两种解决方案都需要支持文件来定义每个数据结构。



If you would prefer not to incur the developer overhead of pre-defining each structure, then take a look at:

如果您不想招致预先定义每个结构的开发人员开销,请查看:

  1. JSON (via python cjson, and native PHP json). Both are really really fast if you don't need to transmit binary content (such as images, etc...).
  2. Yet Another Markup Language @ http://www.yaml.org/. Also really fast if you get the right library.
  1. JSON(通过 python cjson 和本机 PHP json)。如果您不需要传输二进制内容(例如图像等),两者都非常快。
  2. 另一种标记语言@ http://www.yaml.org/。如果你得到正确的库,速度也会非常快。

However, I believe that both of these have had issues with transporting binary content, which is why they were ruled out for our usage. Note:YAML may have good binary support, you will have to check the client libraries -- see here: http://yaml.org/type/binary.html

但是,我相信这两者都在传输二进制内容方面存在问题,这就是为什么我们将其排除在外的原因。 注意:YAML 可能有很好的二进制支持,您必须检查客户端库——请参见此处:http: //yaml.org/type/binary.html



At our company, we rolled our own library (Extruct) for cross-language serialization with binary support. We currently have (decently) fast implementations in Python and PHP, although it isn't very human readable due to using base64 on all the strings (binary support). Eventually we will port them to C and use more standard encoding.

在我们公司,我们推出了自己的库 (Extruct),用于具有二进制支持的跨语言序列化。我们目前在 Python 和 PHP 中有(体面的)快速实现,尽管由于在所有字符串上使用 base64(二进制支持),它的可读性不是很好。最终我们会将它们移植到 C 并使用更标准的编码。

Dynamic languages like PHP and Python get really slow if you have too many iterations in a loop or have to look at each character. C on the other hand shines at such operations.

如果循环中的迭代次数过多或必须查看每个字符,那么像 PHP 和 Python 这样的动态语言会变得非常慢。另一方面,C 在此类操作中大放异彩。

If you'd like to see the implementation of Extruct, please let me know. (contact info at http://blog.gahooa.com/under "About Me")

如果您想查看 Extruct 的实现,请告诉我。(联系信息在http://blog.gahooa.com/“关于我”下)

回答by GrosBedo

You might be interested into this link :

您可能对此链接感兴趣:

http://kbyanc.blogspot.com/2007/07/python-serializer-benchmarks.html

http://kbyanc.blogspot.com/2007/07/python-serializer-benchmarks.html

An alternative : MessagePack seems to be the fastest serializer out there. Maybe you can give it a try.

另一种选择:MessagePack 似乎是最快的序列化程序。也许你可以试一试。

回答by S.Lott

"Cross-platform support (Python, Java, C#, C++, Ruby, Perl)"

“跨平台支持(Python、Java、C#、C++、Ruby、Perl)”

Too bad this criteria is first. The intent behind most languages is to express fundamental data structures and processing differently. That's what makes multiple languages a "problem": they're all different.

太糟糕了,这个标准是第一位的。大多数语言背后的意图是以不同方式表达基本数据结构和处理。这就是使多种语言成为“问题”的原因:它们都是不同的。

A single representation that's good across many languages is generally impossible. There are compromises in richness of the representation, performance or ambiguity.

跨多种语言的单一表示通常是不可能的。在表示的丰富性、性能或模糊性方面存在折衷。

JSON meets the remaining criteria nicely. Messages are compact and parse quickly (unlike XML). Nesting is handled nicely. Changing structure without breaking code is always iffy -- if you remove something, old code will break. If you change something that was required, old code will break. If you're adding things, however, JSON handles this also.

JSON 很好地满足了其余标准。消息紧凑且解析速度快(与 XML 不同)。嵌套处理得很好。在不破坏代码的情况下改变结构总是不确定的——如果你删除一些东西,旧的代码就会破坏。如果您更改所需的内容,旧代码将中断。但是,如果您要添加内容,JSON 也会处理此问题。

I like human-readable. It helps with a lot of debugging and trouble-shooting.

我喜欢人类可读的。它有助于进行大量调试和故障排除。

The subtlety of having Python tuples turn into lists isn't an interesting problem. The receiving application already knows the structure being received, and can tweak it up (if it matters.)

将 Python 元组变成列表的微妙之处并不是一个有趣的问题。接收应用程序已经知道正在接收的结构,并且可以对其进行调整(如果重要的话)。



Edit on performance.

编辑性能。

Parsing the XML and JSON documents from http://developers.de/blogs/damir_dobric/archive/2008/12/27/performance-comparison-soap-vs-json-wcf-implementation.aspx

解析来自http://developers.de/blogs/damir_dobric/archive/2008/12/27/performance-comparison-soap-vs-json-wcf-implementation.aspx的 XML 和 JSON 文档

xmlParse 0.326 jsonParse 0.255

xmlParse 0.326 jsonParse 0.255

JSON appears to be significantly faster for the same content. I used the Python SimpleJSON and ElementTree modules in Python 2.5.2.

对于相同的内容,JSON 似乎要快得多。我在 Python 2.5.2 中使用了 Python SimpleJSON 和 ElementTree 模块。

回答by stickfigure

Hessian meets all of your requirements. There is a python library here:

Hessian 满足您的所有要求。这里有一个python库:

https://github.com/bgilmore/mustaine

https://github.com/bgilmore/mustaine

The official documentation for the protocol can be found here:

该协议的官方文档可以在这里找到:

http://hessian.caucho.com/

http://hessian.caucho.com/

I regularly use it in both Java and Python. It works and doesn't require writing protocol definition files. I couldn't tell you how the Python serializer performs, but the Java version is reasonably efficient:

我经常在 Java 和 Python 中使用它。它可以工作并且不需要编写协议定义文件。我无法告诉您 Python 序列化程序的执行方式,但 Java 版本相当高效:

https://github.com/eishay/jvm-serializers/wiki/

https://github.com/eishay/jvm-serializers/wiki/