来自嵌套字典的 Python 数据类

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/53376099/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 20:17:38  来源:igfitidea点击:

Python dataclass from a nested dict

pythonpython-3.xpython-dataclasses

提问by mbatchkarov

The standard library in 3.7 can recursively convert a dataclass into a dict (example from the docs):

3.7 中的标准库可以递归地将数据类转换为字典(文档中的示例):

from dataclasses import dataclass, asdict
from typing import List

@dataclass
class Point:
     x: int
     y: int

@dataclass
class C:
     mylist: List[Point]

p = Point(10, 20)
assert asdict(p) == {'x': 10, 'y': 20}

c = C([Point(0, 0), Point(10, 4)])
tmp = {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
assert asdict(c) == tmp

I am looking for a way to turn a dict back into a dataclass when there is nesting. Something like C(**tmp)only works if the fields of the data class are simple types and not themselves dataclasses. I am familiar with jsonpickle, which however comes with a prominent security warning.

我正在寻找一种在嵌套时将 dict 转换回数据类的方法。C(**tmp)只有当数据类的字段是简单类型而不是数据类本身时,like才有效。我熟悉jsonpickle,但是它带有一个突出的安全警告。

采纳答案by meowgoesthedog

Below is the CPython implementation of asdict– or specifically, the internal recursive helper function _asdict_innerthat it uses:

下面是 CPython 实现asdict——或者特别_asdict_inner是它使用的内部递归辅助函数:

# Source: https://github.com/python/cpython/blob/master/Lib/dataclasses.py

def _asdict_inner(obj, dict_factory):
    if _is_dataclass_instance(obj):
        result = []
        for f in fields(obj):
            value = _asdict_inner(getattr(obj, f.name), dict_factory)
            result.append((f.name, value))
        return dict_factory(result)
    elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
        # [large block of author comments]
        return type(obj)(*[_asdict_inner(v, dict_factory) for v in obj])
    elif isinstance(obj, (list, tuple)):
        # [ditto]
        return type(obj)(_asdict_inner(v, dict_factory) for v in obj)
    elif isinstance(obj, dict):
        return type(obj)((_asdict_inner(k, dict_factory),
                          _asdict_inner(v, dict_factory))
                         for k, v in obj.items())
    else:
        return copy.deepcopy(obj)

asdictsimply calls the above with some assertions, and dict_factory=dictby default.

asdict简单地调用上面的一些断言,dict_factory=dict默认情况下。

How can this be adapted to create an output dictionary with the required type-tagging, as mentioned in the comments?

如评论中所述,如何调整以创建具有所需类型标记的输出字典?



1. Adding type information

1. 添加类型信息

My attempt involved creating a custom return wrapper inheriting from dict:

我的尝试涉及创建一个继承自的自定义返回包装器dict

class TypeDict(dict):
    def __init__(self, t, *args, **kwargs):
        super(TypeDict, self).__init__(*args, **kwargs)

        if not isinstance(t, type):
            raise TypeError("t must be a type")

        self._type = t

    @property
    def type(self):
        return self._type

Looking at the original code, only the first clause needs to be modified to use this wrapper, as the other clauses only handle containersof dataclass-es:

综观原代码,只有第一条需要进行修改,以使用该包装,与其他条款只处理集装箱dataclass-es:

# only use dict for now; easy to add back later
def _todict_inner(obj):
    if is_dataclass_instance(obj):
        result = []
        for f in fields(obj):
            value = _todict_inner(getattr(obj, f.name))
            result.append((f.name, value))
        return TypeDict(type(obj), result)

    elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
        return type(obj)(*[_todict_inner(v) for v in obj])
    elif isinstance(obj, (list, tuple)):
        return type(obj)(_todict_inner(v) for v in obj)
    elif isinstance(obj, dict):
        return type(obj)((_todict_inner(k), _todict_inner(v))
                         for k, v in obj.items())
    else:
        return copy.deepcopy(obj)

Imports:

进口:

from dataclasses import dataclass, fields, is_dataclass

# thanks to Patrick Haugh
from typing import *

# deepcopy 
import copy

Functions used:

使用的功能:

# copy of the internal function _is_dataclass_instance
def is_dataclass_instance(obj):
    return is_dataclass(obj) and not is_dataclass(obj.type)

# the adapted version of asdict
def todict(obj):
    if not is_dataclass_instance(obj):
         raise TypeError("todict() should be called on dataclass instances")
    return _todict_inner(obj)

Tests with the example dataclasses:

使用示例数据类进行测试:

c = C([Point(0, 0), Point(10, 4)])

print(c)
cd = todict(c)

print(cd)
# {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}

print(cd.type)
# <class '__main__.C'>

Results are as expected.

结果如预期。



2. Converting back to a dataclass

2. 转换回 dataclass

The recursive routine used by asdictcan be re-used for the reverse process, with some relatively minor changes:

使用的递归例程asdict可以重新用于反向过程,只需进行一些相对较小的更改:

def _fromdict_inner(obj):
    # reconstruct the dataclass using the type tag
    if is_dataclass_dict(obj):
        result = {}
        for name, data in obj.items():
            result[name] = _fromdict_inner(data)
        return obj.type(**result)

    # exactly the same as before (without the tuple clause)
    elif isinstance(obj, (list, tuple)):
        return type(obj)(_fromdict_inner(v) for v in obj)
    elif isinstance(obj, dict):
        return type(obj)((_fromdict_inner(k), _fromdict_inner(v))
                         for k, v in obj.items())
    else:
        return copy.deepcopy(obj)

Functions used:

使用的功能:

def is_dataclass_dict(obj):
    return isinstance(obj, TypeDict)

def fromdict(obj):
    if not is_dataclass_dict(obj):
        raise TypeError("fromdict() should be called on TypeDict instances")
    return _fromdict_inner(obj)

Test:

测试:

c = C([Point(0, 0), Point(10, 4)])
cd = todict(c)
cf = fromdict(cd)

print(c)
# C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])

print(cf)
# C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])

Again as expected.

再次如预期。

回答by Konrad Ha?as

I'm the author of dacite- the tool that simplifies creation of data classes from dictionaries.

我是dacite- 简化从字典创建数据类的工具的作者。

This library has only one function from_dict- this is a quick example of usage:

这个库只有一个功能from_dict——这是一个快速的用法示例:

from dataclasses import dataclass
from dacite import from_dict

@dataclass
class User:
    name: str
    age: int
    is_active: bool

data = {
    'name': 'john',
    'age': 30,
    'is_active': True,
}

user = from_dict(data_class=User, data=data)

assert user == User(name='john', age=30, is_active=True)

Moreover dacitesupports following features:

此外还dacite支持以下功能:

  • nested structures
  • (basic) types checking
  • optional fields (i.e. typing.Optional)
  • unions
  • collections
  • values casting and transformation
  • remapping of fields names
  • 嵌套结构
  • (基本)类型检查
  • 可选字段(即打字。可选)
  • 工会
  • 收藏
  • 价值观的铸造和转化
  • 重新映射字段名称

... and it's well tested - 100% code coverage!

...它经过了很好的测试 - 100% 的代码覆盖率!

To install dacite, simply use pip (or pipenv):

要安装 dacite,只需使用 pip(或 pipenv):

$ pip install dacite

回答by gatopeich

All it takes is a five-liner:

只需要一个五行:

def dataclass_from_dict(klass, d):
    try:
        fieldtypes = {f.name:f.type for f in dataclasses.fields(klass)}
        return klass(**{f:dataclass_from_dict(fieldtypes[f],d[f]) for f in d})
    except:
        return d # Not a dataclass field

Sample usage:

示例用法:

from dataclasses import dataclass, asdict

@dataclass
class Point:
    x: float
    y: float

@dataclass
class Line:
    a: Point
    b: Point

line = Line(Point(1,2), Point(3,4))
assert line == dataclass_from_dict(Line, asdict(line))

Full code, including to/from json, here at gist: https://gist.github.com/gatopeich/1efd3e1e4269e1e98fae9983bb914f22

完整代码,包括到/来自 json,在 gist:https: //gist.github.com/gatopeich/1efd3e1e4269e1e98fae9983bb914f22

回答by tikhonov_a

You can use mashumarofor creating dataclass object from a dict according to the scheme. Mixin from this library adds convenient from_dictand to_dictmethods to dataclasses:

您可以使用mashumaro根据方案从 dict 创建数据类对象。这个库中的 Mixin为数据类添加了方便from_dictto_dict方法:

from dataclasses import dataclass
from typing import List
from mashumaro import DataClassDictMixin

@dataclass
class Point(DataClassDictMixin):
     x: int
     y: int

@dataclass
class C(DataClassDictMixin):
     mylist: List[Point]

p = Point(10, 20)
tmp = {'x': 10, 'y': 20}
assert p.to_dict() == tmp
assert Point.from_dict(tmp) == p

c = C([Point(0, 0), Point(10, 4)])
tmp = {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
assert c.to_dict() == tmp
assert C.from_dict(tmp) == c

回答by Martijn Pieters

If your goal is to produce JSONfrom and to existing, predefineddataclasses, then just write custom encoder and decoder hooks. Do not use dataclasses.asdict()here, instead record in JSONa (safe) reference to the original dataclass.

如果您的目标是从现有的预定义数据类生成JSON,那么只需编写自定义编码器和解码器挂钩。不要在这里使用,而是在 JSON 中记录对原始数据类的(安全)引用。dataclasses.asdict()

jsonpickleis not safe because it stores references to arbitraryPython objects and passes in data to their constructors. With such references I can get jsonpickle to reference internal Python data structures and create and execute functions, classes and modules at will. But that doesn't mean you can't handle such references unsafely. Just verify that you only import (not call) and then verify that the object is an actual dataclass type, before you use it.

jsonpickle不安全,因为它存储对任意Python 对象的引用并将数据传递给它们的构造函数。通过这样的引用,我可以让 jsonpickle 引用内部 Python 数据结构,并随意创建和执行函数、类和模块。但这并不意味着您不能不安全地处理此类引用。只需验证您仅导入(而不是调用),然后在使用它之前验证该对象是否是实际的数据类类型。

The framework can be made generic enough but still limited only to JSON-serialisable types plus dataclass-based instances:

该框架可以变得足够通用,但仍然仅限于 JSON 可序列化类型dataclass基于实例的实例

import dataclasses
import importlib
import sys

def dataclass_object_dump(ob):
    datacls = type(ob)
    if not dataclasses.is_dataclass(datacls):
        raise TypeError(f"Expected dataclass instance, got '{datacls!r}' object")
    mod = sys.modules.get(datacls.__module__)
    if mod is None or not hasattr(mod, datacls.__qualname__):
        raise ValueError(f"Can't resolve '{datacls!r}' reference")
    ref = f"{datacls.__module__}.{datacls.__qualname__}"
    fields = (f.name for f in dataclasses.fields(ob))
    return {**{f: getattr(ob, f) for f in fields}, '__dataclass__': ref}

def dataclass_object_load(d):
    ref = d.pop('__dataclass__', None)
    if ref is None:
        return d
    try:
        modname, hasdot, qualname = ref.rpartition('.')
        module = importlib.import_module(modname)
        datacls = getattr(module, qualname)
        if not dataclasses.is_dataclass(datacls) or not isinstance(datacls, type):
            raise ValueError
        return datacls(**d)
    except (ModuleNotFoundError, ValueError, AttributeError, TypeError):
        raise ValueError(f"Invalid dataclass reference {ref!r}") from None

This uses JSON-RPC-style class hintsto name the dataclass, and on loading this is verified to still be a data class with the same fields. No type checking is done on the values of the fields (as that's a whole different kettle of fish).

这使用JSON-RPC 样式的类提示来命名数据类,并在加载时验证它仍然是具有相同字段的数据类。没有对字段的值进行类型检查(因为那是完全不同的鱼)。

Use these as the defaultand object_hookarguments to json.dump[s]()and json.dump[s]():

使用这些作为defaultobject_hook参数json.dump[s]()json.dump[s]()

>>> print(json.dumps(c, default=dataclass_object_dump, indent=4))
{
    "mylist": [
        {
            "x": 0,
            "y": 0,
            "__dataclass__": "__main__.Point"
        },
        {
            "x": 10,
            "y": 4,
            "__dataclass__": "__main__.Point"
        }
    ],
    "__dataclass__": "__main__.C"
}
>>> json.loads(json.dumps(c, default=dataclass_object_dump), object_hook=dataclass_object_load)
C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
>>> json.loads(json.dumps(c, default=dataclass_object_dump), object_hook=dataclass_object_load) == c
True

or create instances of the JSONEncoderand JSONDecoderclasses with those same hooks.

或使用相同的钩子创建JSONEncoderJSONDecoder类的实例。

Instead of using fully qualifying module and class names, you could also use a separate registry to map permissible type names; check against the registry on encoding, and again on decoding to ensure you don't forget to register dataclasses as you develop.

除了使用完全限定的模块名和类名之外,您还可以使用单独的注册表来映射允许的类型名;检查编码注册表,并再次检查解码以确保您在开发时不会忘记注册数据类。

回答by killjoy

Using no additional modules, you can make use of the __post_init__function to automatically convert the dictvalues to the correct type. This function is called after __init__.

不使用附加模块,您可以利用该__post_init__函数自动将dict值转换为正确的类型。此函数在 之后调用__init__

from dataclasses import dataclass, asdict


@dataclass
class Bar:
    fee: str
    far: str

@dataclass
class Foo:
    bar: Bar

    def __post_init__(self):
        if isinstance(self.bar, dict):
            self.bar = Bar(**self.bar)

foo = Foo(bar=Bar(fee="La", far="So"))

d= asdict(foo)
print(d)  # {'bar': {'fee': 'La', 'far': 'So'}}
o = Foo(**d)
print(o)  # Foo(bar=Bar(fee='La', far='So'))

This solution has the added benefit of being able to use non-dataclass objects. As long as its strfunction can be converted back, it's fair game. For example, it can be used to keep strfields as IP4Addressinternally.

此解决方案具有能够使用非数据类对象的额外好处。只要它的str功能可以转换回来,就是公平的游戏。例如,它可用于将str字段保留为IP4Address内部。

回答by Evg

from validated_dc import ValidatedDC
from dataclasses import dataclass

from typing import List, Union


@dataclass
class Foo(ValidatedDC):
    foo: int


@dataclass
class Bar(ValidatedDC):
    bar: Union[Foo, List[Foo]]


foo = {'foo': 1}
instance = Bar(bar=foo)
print(instance.get_errors())  # None
print(instance)               # Bar(bar=Foo(foo=1))

list_foo = [{'foo': 1}, {'foo': 2}]
instance = Bar(bar=list_foo)
print(instance.get_errors())  # None
print(instance)               # Bar(bar=[Foo(foo=1), Foo(foo=2)])

validated_dc:
https://github.com/EvgeniyBurdin/validated_dc

验证的_dc:https:
//github.com/EvgeniyBurdin/validated_dc

And see a more detailed example:
https://github.com/EvgeniyBurdin/validated_dc/blob/master/examples/detailed.py

并查看更详细的示例:https:
//github.com/EvgeniyBurdin/validated_dc/blob/master/examples/detailed.py

回答by Tobias Hermann

undictifyis a library which could be of help. Here is a minimal usage example:

undictify是一个可以提供帮助的库。这是一个最小的使用示例:

import json
from dataclasses import dataclass
from typing import List, NamedTuple, Optional, Any

from undictify import type_checked_constructor


@type_checked_constructor(skip=True)
@dataclass
class Heart:
    weight_in_kg: float
    pulse_at_rest: int


@type_checked_constructor(skip=True)
@dataclass
class Human:
    id: int
    name: str
    nick: Optional[str]
    heart: Heart
    friend_ids: List[int]


tobias_dict = json.loads('''
    {
        "id": 1,
        "name": "Tobias",
        "heart": {
            "weight_in_kg": 0.31,
            "pulse_at_rest": 52
        },
        "friend_ids": [2, 3, 4, 5]
    }''')

tobias = Human(**tobias_dict)

回答by Zah

Validobjdoes just that. Compared to other libraries, it provides a simpler interface (just one function at the moment) and emphasizes informative error messages. For example, given a schema like

Validobj就是这样做的。与其他库相比,它提供了一个更简单的界面(目前只有一个功能)并强调信息性错误消息。例如,给定一个模式,如

import dataclasses
from typing import Optional, List


@dataclasses.dataclass
class User:
    name: str
    phone: Optional[str] = None
    tasks: List[str] = dataclasses.field(default_factory=list)

One gets an error like

一个错误像

>>> import validobj
>>> validobj.parse_input({
...      'phone': '555-1337-000', 'address': 'Somewhereville', 'nme': 'Zahari'}, User
... )
Traceback (most recent call last):
...
WrongKeysError: Cannot process value into 'User' because fields do not match.
The following required keys are missing: {'name'}. The following keys are unknown: {'nme', 'address'}.
Alternatives to invalid value 'nme' include:
  - name

All valid options are:
  - name
  - phone
  - tasks

for a typo on a given field.

对于给定字段的拼写错误。

回答by NOOBAF

I would like to suggest using the Composite Pattern to solve this, the main advantage is that you could continue adding classes to this pattern and have them behave the same way.

我想建议使用复合模式来解决这个问题,主要优点是您可以继续向该模式添加类并让它们以相同的方式运行。

from dataclasses import dataclass
from typing import List


@dataclass
class CompositeDict:
    def as_dict(self):
        retval = dict()
        for key, value in self.__dict__.items():
            if key in self.__dataclass_fields__.keys():
                if type(value) is list:
                    retval[key] = [item.as_dict() for item in value]
                else:
                    retval[key] = value
        return retval

@dataclass
class Point(CompositeDict):
    x: int
    y: int


@dataclass
class C(CompositeDict):
    mylist: List[Point]


c = C([Point(0, 0), Point(10, 4)])
tmp = {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
assert c.as_dict() == tmp

as a side note, you could employ a factory pattern within the CompositeDict class that would handle other cases like nested dicts, tuples and such, which would save much boilerplate.

作为旁注,您可以在 CompositeDict 类中使用工厂模式来处理其他情况,如嵌套字典、元组等,这将节省大量样板。