python 如何使用safe_load使用PyYAML反序列化对象?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2627555/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-04 01:05:47  来源:igfitidea点击:

How to deserialize an object with PyYAML using safe_load?

pythondeserializationpyyaml

提问by systempuntoout

Having a snippet like this:

有这样的片段:

import yaml
class User(object):
    def __init__(self, name, surname):
       self.name= name
       self.surname= surname

user = User('spam', 'eggs')
serialized_user = yaml.dump(user)
#Network
deserialized_user = yaml.load(serialized_user)
print "name: %s, sname: %s" % (deserialized_user.name, deserialized_user.surname)

Yaml docssays that it is not safe to call yaml.loadwith any data received from an untrusted source; so, what should I modify to my snippet\class to use safe_loadmethod?
Is it possible?

Yaml 文档说使用从不受信任的来源接收的任何数据调用yaml.load是不安全的;那么,我应该修改我的代码段\类以使用safe_load方法吗?
是否可以?

回答by Petr Viktorin

Another way exists. From the PyYaml docs:

另一种方式存在。来自 PyYaml 文档:

A python object can be marked as safe and thus be recognized by yaml.safe_load. To do this, derive it from yaml.YAMLObject [...] and explicitly set its class property yaml_loader to yaml.SafeLoader.

python 对象可以被标记为安全的,因此可以被 yaml.safe_load 识别。为此,请从 yaml.YAMLObject [...] 派生它,并将其类属性 yaml_loader 显式设置为 yaml.SafeLoader。

You also have to set the yaml_tag property to make it work.

您还必须设置 yaml_tag 属性才能使其工作。

YAMLObject does some metaclass magic to make the object loadable. Note that if you do this, the objects will only be loadable by the safe loader, not with regular yaml.load().

YAMLObject 做了一些元类魔法来使对象可加载。请注意,如果您这样做,则对象只能由安全加载器加载,而不能使用常规 yaml.load() 加载。

Working example:

工作示例:

import yaml

class User(yaml.YAMLObject):
    yaml_loader = yaml.SafeLoader
    yaml_tag = u'!User'

    def __init__(self, name, surname):
       self.name= name
       self.surname= surname

user = User('spam', 'eggs')
serialized_user = yaml.dump(user)

#Network

deserialized_user = yaml.safe_load(serialized_user)
print "name: %s, sname: %s" % (deserialized_user.name, deserialized_user.surname)

The advantage of this one is that it's prety easy to do; the disadvantages are that it only works with safe_load and clutters your class with serialization-related attributes and metaclass.

这个的优点是它很容易做到;缺点是它只适用于 safe_load 并且使用与序列化相关的属性和元类来混淆你的类。

回答by Benson

It appears that safe_load, by definition, does not let you deserialize your own classes. If you want it to be safe, I'd do something like this:

根据定义,safe_load 似乎不允许您反序列化自己的类。如果你想让它安全,我会做这样的事情:

import yaml
class User(object):
    def __init__(self, name, surname):
       self.name= name
       self.surname= surname

    def yaml(self):
       return yaml.dump(self.__dict__)

    @staticmethod
    def load(data):
       values = yaml.safe_load(data)
       return User(values["name"], values["surname"])

user = User('spam', 'eggs')
serialized_user = user.yaml()
print "serialized_user:  %s" % serialized_user.strip()

#Network
deserialized_user = User.load(serialized_user)
print "name: %s, sname: %s" % (deserialized_user.name, deserialized_user.surname)

The advantage here is that you have absolute control over how your class is (de)serialized. That means that you won't get random executable code over the network and run it. The disadvantage is that you have absolute control over how your class is (de)serialized. That means you have to do a lot more work. ;-)

这里的优点是您可以绝对控制类的(反)序列化方式。这意味着您不会通过网络获得随机的可执行代码并运行它。缺点是您可以绝对控制类的(反)序列化方式。这意味着你必须做更多的工作。;-)

回答by Anthon

If you have many tags and don't want to create objects for all of them, or in case you don't care about the actual type returned, only about dotted access, you catch all undefined tags with the following code:

如果您有很多标签并且不想为所有标签创建对象,或者如果您不关心返回的实际类型,只关心点访问,您可以使用以下代码捕获所有未定义的标签:

import yaml

class Blob(object):
    def update(self, kw):
        for k in kw:
            setattr(self, k, kw[k])

from yaml.constructor import SafeConstructor

def my_construct_undefined(self, node):
    data = Blob()
    yield data
    value = self.construct_mapping(node)
    data.update(value)

SafeConstructor.add_constructor(None, my_construct_undefined)


class User(object):
    def __init__(self, name, surname):
        self.name= name
        self.surname= surname

user = User('spam', 'eggs')
serialized_user = yaml.dump(user)
#Network
deserialized_user = yaml.safe_load(serialized_user)
print "name: %s, sname: %s" % (deserialized_user.name, deserialized_user.surname)

In case you wonder why the my_construct_undefinedhas a yieldin the middle: that allows for instantiating the object separately from creation of its children. Once the object exist it can be referred to in case it has an anchor and of the children (or their children) a reference. The actual mechanisme to create the object first creates it, then does a next(x)on it to finalize it.

如果您想知道为什么中间my_construct_undefined有一个yield:这允许将对象与其子项的创建分开实例化。一旦对象存在,它就可以被引用,以防它有一个锚点和子元素(或它们的子元素)的引用。创建对象的实际机制首先创建它,然后next(x)对其进行操作以完成它。