来自嵌套字典的 Python 数据类
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/53376099/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python dataclass from a nested dict
提问by mbatchkarov
The standard library in 3.7 can recursively convert a dataclass into a dict (example from the docs):
3.7 中的标准库可以递归地将数据类转换为字典(文档中的示例):
from dataclasses import dataclass, asdict
from typing import List
@dataclass
class Point:
x: int
y: int
@dataclass
class C:
mylist: List[Point]
p = Point(10, 20)
assert asdict(p) == {'x': 10, 'y': 20}
c = C([Point(0, 0), Point(10, 4)])
tmp = {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
assert asdict(c) == tmp
I am looking for a way to turn a dict back into a dataclass when there is nesting. Something like C(**tmp)
only works if the fields of the data class are simple types and not themselves dataclasses. I am familiar with jsonpickle, which however comes with a prominent security warning.
我正在寻找一种在嵌套时将 dict 转换回数据类的方法。C(**tmp)
只有当数据类的字段是简单类型而不是数据类本身时,like才有效。我熟悉jsonpickle,但是它带有一个突出的安全警告。
采纳答案by meowgoesthedog
Below is the CPython implementation of asdict
– or specifically, the internal recursive helper function _asdict_inner
that it uses:
下面是 CPython 实现asdict
——或者特别_asdict_inner
是它使用的内部递归辅助函数:
# Source: https://github.com/python/cpython/blob/master/Lib/dataclasses.py
def _asdict_inner(obj, dict_factory):
if _is_dataclass_instance(obj):
result = []
for f in fields(obj):
value = _asdict_inner(getattr(obj, f.name), dict_factory)
result.append((f.name, value))
return dict_factory(result)
elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
# [large block of author comments]
return type(obj)(*[_asdict_inner(v, dict_factory) for v in obj])
elif isinstance(obj, (list, tuple)):
# [ditto]
return type(obj)(_asdict_inner(v, dict_factory) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_asdict_inner(k, dict_factory),
_asdict_inner(v, dict_factory))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
asdict
simply calls the above with some assertions, and dict_factory=dict
by default.
asdict
简单地调用上面的一些断言,dict_factory=dict
默认情况下。
How can this be adapted to create an output dictionary with the required type-tagging, as mentioned in the comments?
如评论中所述,如何调整以创建具有所需类型标记的输出字典?
1. Adding type information
1. 添加类型信息
My attempt involved creating a custom return wrapper inheriting from dict
:
我的尝试涉及创建一个继承自的自定义返回包装器dict
:
class TypeDict(dict):
def __init__(self, t, *args, **kwargs):
super(TypeDict, self).__init__(*args, **kwargs)
if not isinstance(t, type):
raise TypeError("t must be a type")
self._type = t
@property
def type(self):
return self._type
Looking at the original code, only the first clause needs to be modified to use this wrapper, as the other clauses only handle containersof dataclass
-es:
综观原代码,只有第一条需要进行修改,以使用该包装,与其他条款只处理集装箱的dataclass
-es:
# only use dict for now; easy to add back later
def _todict_inner(obj):
if is_dataclass_instance(obj):
result = []
for f in fields(obj):
value = _todict_inner(getattr(obj, f.name))
result.append((f.name, value))
return TypeDict(type(obj), result)
elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
return type(obj)(*[_todict_inner(v) for v in obj])
elif isinstance(obj, (list, tuple)):
return type(obj)(_todict_inner(v) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_todict_inner(k), _todict_inner(v))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
Imports:
进口:
from dataclasses import dataclass, fields, is_dataclass
# thanks to Patrick Haugh
from typing import *
# deepcopy
import copy
Functions used:
使用的功能:
# copy of the internal function _is_dataclass_instance
def is_dataclass_instance(obj):
return is_dataclass(obj) and not is_dataclass(obj.type)
# the adapted version of asdict
def todict(obj):
if not is_dataclass_instance(obj):
raise TypeError("todict() should be called on dataclass instances")
return _todict_inner(obj)
Tests with the example dataclasses:
使用示例数据类进行测试:
c = C([Point(0, 0), Point(10, 4)])
print(c)
cd = todict(c)
print(cd)
# {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
print(cd.type)
# <class '__main__.C'>
Results are as expected.
结果如预期。
2. Converting back to a dataclass
2. 转换回 dataclass
The recursive routine used by asdict
can be re-used for the reverse process, with some relatively minor changes:
使用的递归例程asdict
可以重新用于反向过程,只需进行一些相对较小的更改:
def _fromdict_inner(obj):
# reconstruct the dataclass using the type tag
if is_dataclass_dict(obj):
result = {}
for name, data in obj.items():
result[name] = _fromdict_inner(data)
return obj.type(**result)
# exactly the same as before (without the tuple clause)
elif isinstance(obj, (list, tuple)):
return type(obj)(_fromdict_inner(v) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_fromdict_inner(k), _fromdict_inner(v))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
Functions used:
使用的功能:
def is_dataclass_dict(obj):
return isinstance(obj, TypeDict)
def fromdict(obj):
if not is_dataclass_dict(obj):
raise TypeError("fromdict() should be called on TypeDict instances")
return _fromdict_inner(obj)
Test:
测试:
c = C([Point(0, 0), Point(10, 4)])
cd = todict(c)
cf = fromdict(cd)
print(c)
# C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
print(cf)
# C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
Again as expected.
再次如预期。
回答by Konrad Ha?as
I'm the author of dacite
- the tool that simplifies creation of data classes from dictionaries.
我是dacite
- 简化从字典创建数据类的工具的作者。
This library has only one function from_dict
- this is a quick example of usage:
这个库只有一个功能from_dict
——这是一个快速的用法示例:
from dataclasses import dataclass
from dacite import from_dict
@dataclass
class User:
name: str
age: int
is_active: bool
data = {
'name': 'john',
'age': 30,
'is_active': True,
}
user = from_dict(data_class=User, data=data)
assert user == User(name='john', age=30, is_active=True)
Moreover dacite
supports following features:
此外还dacite
支持以下功能:
- nested structures
- (basic) types checking
- optional fields (i.e. typing.Optional)
- unions
- collections
- values casting and transformation
- remapping of fields names
- 嵌套结构
- (基本)类型检查
- 可选字段(即打字。可选)
- 工会
- 收藏
- 价值观的铸造和转化
- 重新映射字段名称
... and it's well tested - 100% code coverage!
...它经过了很好的测试 - 100% 的代码覆盖率!
To install dacite, simply use pip (or pipenv):
要安装 dacite,只需使用 pip(或 pipenv):
$ pip install dacite
回答by gatopeich
All it takes is a five-liner:
只需要一个五行:
def dataclass_from_dict(klass, d):
try:
fieldtypes = {f.name:f.type for f in dataclasses.fields(klass)}
return klass(**{f:dataclass_from_dict(fieldtypes[f],d[f]) for f in d})
except:
return d # Not a dataclass field
Sample usage:
示例用法:
from dataclasses import dataclass, asdict
@dataclass
class Point:
x: float
y: float
@dataclass
class Line:
a: Point
b: Point
line = Line(Point(1,2), Point(3,4))
assert line == dataclass_from_dict(Line, asdict(line))
Full code, including to/from json, here at gist: https://gist.github.com/gatopeich/1efd3e1e4269e1e98fae9983bb914f22
完整代码,包括到/来自 json,在 gist:https: //gist.github.com/gatopeich/1efd3e1e4269e1e98fae9983bb914f22
回答by tikhonov_a
You can use mashumarofor creating dataclass object from a dict according to the scheme. Mixin from this library adds convenient from_dict
and to_dict
methods to dataclasses:
您可以使用mashumaro根据方案从 dict 创建数据类对象。这个库中的 Mixin为数据类添加了方便from_dict
和to_dict
方法:
from dataclasses import dataclass
from typing import List
from mashumaro import DataClassDictMixin
@dataclass
class Point(DataClassDictMixin):
x: int
y: int
@dataclass
class C(DataClassDictMixin):
mylist: List[Point]
p = Point(10, 20)
tmp = {'x': 10, 'y': 20}
assert p.to_dict() == tmp
assert Point.from_dict(tmp) == p
c = C([Point(0, 0), Point(10, 4)])
tmp = {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
assert c.to_dict() == tmp
assert C.from_dict(tmp) == c
回答by Martijn Pieters
If your goal is to produce JSONfrom and to existing, predefineddataclasses, then just write custom encoder and decoder hooks. Do not use dataclasses.asdict()
here, instead record in JSONa (safe) reference to the original dataclass.
如果您的目标是从现有的预定义数据类生成JSON,那么只需编写自定义编码器和解码器挂钩。不要在这里使用,而是在 JSON 中记录对原始数据类的(安全)引用。dataclasses.asdict()
jsonpickle
is not safe because it stores references to arbitraryPython objects and passes in data to their constructors. With such references I can get jsonpickle to reference internal Python data structures and create and execute functions, classes and modules at will. But that doesn't mean you can't handle such references unsafely. Just verify that you only import (not call) and then verify that the object is an actual dataclass type, before you use it.
jsonpickle
不安全,因为它存储对任意Python 对象的引用并将数据传递给它们的构造函数。通过这样的引用,我可以让 jsonpickle 引用内部 Python 数据结构,并随意创建和执行函数、类和模块。但这并不意味着您不能不安全地处理此类引用。只需验证您仅导入(而不是调用),然后在使用它之前验证该对象是否是实际的数据类类型。
The framework can be made generic enough but still limited only to JSON-serialisable types plus dataclass
-based instances:
该框架可以变得足够通用,但仍然仅限于 JSON 可序列化类型和dataclass
基于实例的实例:
import dataclasses
import importlib
import sys
def dataclass_object_dump(ob):
datacls = type(ob)
if not dataclasses.is_dataclass(datacls):
raise TypeError(f"Expected dataclass instance, got '{datacls!r}' object")
mod = sys.modules.get(datacls.__module__)
if mod is None or not hasattr(mod, datacls.__qualname__):
raise ValueError(f"Can't resolve '{datacls!r}' reference")
ref = f"{datacls.__module__}.{datacls.__qualname__}"
fields = (f.name for f in dataclasses.fields(ob))
return {**{f: getattr(ob, f) for f in fields}, '__dataclass__': ref}
def dataclass_object_load(d):
ref = d.pop('__dataclass__', None)
if ref is None:
return d
try:
modname, hasdot, qualname = ref.rpartition('.')
module = importlib.import_module(modname)
datacls = getattr(module, qualname)
if not dataclasses.is_dataclass(datacls) or not isinstance(datacls, type):
raise ValueError
return datacls(**d)
except (ModuleNotFoundError, ValueError, AttributeError, TypeError):
raise ValueError(f"Invalid dataclass reference {ref!r}") from None
This uses JSON-RPC-style class hintsto name the dataclass, and on loading this is verified to still be a data class with the same fields. No type checking is done on the values of the fields (as that's a whole different kettle of fish).
这使用JSON-RPC 样式的类提示来命名数据类,并在加载时验证它仍然是具有相同字段的数据类。没有对字段的值进行类型检查(因为那是完全不同的鱼)。
Use these as the default
and object_hook
arguments to json.dump[s]()
and json.dump[s]()
:
使用这些作为default
和object_hook
参数json.dump[s]()
和json.dump[s]()
:
>>> print(json.dumps(c, default=dataclass_object_dump, indent=4))
{
"mylist": [
{
"x": 0,
"y": 0,
"__dataclass__": "__main__.Point"
},
{
"x": 10,
"y": 4,
"__dataclass__": "__main__.Point"
}
],
"__dataclass__": "__main__.C"
}
>>> json.loads(json.dumps(c, default=dataclass_object_dump), object_hook=dataclass_object_load)
C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
>>> json.loads(json.dumps(c, default=dataclass_object_dump), object_hook=dataclass_object_load) == c
True
or create instances of the JSONEncoder
and JSONDecoder
classes with those same hooks.
或使用相同的钩子创建JSONEncoder
和JSONDecoder
类的实例。
Instead of using fully qualifying module and class names, you could also use a separate registry to map permissible type names; check against the registry on encoding, and again on decoding to ensure you don't forget to register dataclasses as you develop.
除了使用完全限定的模块名和类名之外,您还可以使用单独的注册表来映射允许的类型名;检查编码注册表,并再次检查解码以确保您在开发时不会忘记注册数据类。
回答by killjoy
Using no additional modules, you can make use of the __post_init__
function to automatically convert the dict
values to the correct type. This function is called after __init__
.
不使用附加模块,您可以利用该__post_init__
函数自动将dict
值转换为正确的类型。此函数在 之后调用__init__
。
from dataclasses import dataclass, asdict
@dataclass
class Bar:
fee: str
far: str
@dataclass
class Foo:
bar: Bar
def __post_init__(self):
if isinstance(self.bar, dict):
self.bar = Bar(**self.bar)
foo = Foo(bar=Bar(fee="La", far="So"))
d= asdict(foo)
print(d) # {'bar': {'fee': 'La', 'far': 'So'}}
o = Foo(**d)
print(o) # Foo(bar=Bar(fee='La', far='So'))
This solution has the added benefit of being able to use non-dataclass objects. As long as its str
function can be converted back, it's fair game. For example, it can be used to keep str
fields as IP4Address
internally.
此解决方案具有能够使用非数据类对象的额外好处。只要它的str
功能可以转换回来,就是公平的游戏。例如,它可用于将str
字段保留为IP4Address
内部。
回答by Evg
from validated_dc import ValidatedDC
from dataclasses import dataclass
from typing import List, Union
@dataclass
class Foo(ValidatedDC):
foo: int
@dataclass
class Bar(ValidatedDC):
bar: Union[Foo, List[Foo]]
foo = {'foo': 1}
instance = Bar(bar=foo)
print(instance.get_errors()) # None
print(instance) # Bar(bar=Foo(foo=1))
list_foo = [{'foo': 1}, {'foo': 2}]
instance = Bar(bar=list_foo)
print(instance.get_errors()) # None
print(instance) # Bar(bar=[Foo(foo=1), Foo(foo=2)])
validated_dc:
https://github.com/EvgeniyBurdin/validated_dc
验证的_dc:https:
//github.com/EvgeniyBurdin/validated_dc
And see a more detailed example:
https://github.com/EvgeniyBurdin/validated_dc/blob/master/examples/detailed.py
并查看更详细的示例:https:
//github.com/EvgeniyBurdin/validated_dc/blob/master/examples/detailed.py
回答by Tobias Hermann
undictifyis a library which could be of help. Here is a minimal usage example:
undictify是一个可以提供帮助的库。这是一个最小的使用示例:
import json
from dataclasses import dataclass
from typing import List, NamedTuple, Optional, Any
from undictify import type_checked_constructor
@type_checked_constructor(skip=True)
@dataclass
class Heart:
weight_in_kg: float
pulse_at_rest: int
@type_checked_constructor(skip=True)
@dataclass
class Human:
id: int
name: str
nick: Optional[str]
heart: Heart
friend_ids: List[int]
tobias_dict = json.loads('''
{
"id": 1,
"name": "Tobias",
"heart": {
"weight_in_kg": 0.31,
"pulse_at_rest": 52
},
"friend_ids": [2, 3, 4, 5]
}''')
tobias = Human(**tobias_dict)
回答by Zah
Validobjdoes just that. Compared to other libraries, it provides a simpler interface (just one function at the moment) and emphasizes informative error messages. For example, given a schema like
Validobj就是这样做的。与其他库相比,它提供了一个更简单的界面(目前只有一个功能)并强调信息性错误消息。例如,给定一个模式,如
import dataclasses
from typing import Optional, List
@dataclasses.dataclass
class User:
name: str
phone: Optional[str] = None
tasks: List[str] = dataclasses.field(default_factory=list)
One gets an error like
一个错误像
>>> import validobj
>>> validobj.parse_input({
... 'phone': '555-1337-000', 'address': 'Somewhereville', 'nme': 'Zahari'}, User
... )
Traceback (most recent call last):
...
WrongKeysError: Cannot process value into 'User' because fields do not match.
The following required keys are missing: {'name'}. The following keys are unknown: {'nme', 'address'}.
Alternatives to invalid value 'nme' include:
- name
All valid options are:
- name
- phone
- tasks
for a typo on a given field.
对于给定字段的拼写错误。
回答by NOOBAF
I would like to suggest using the Composite Pattern to solve this, the main advantage is that you could continue adding classes to this pattern and have them behave the same way.
我想建议使用复合模式来解决这个问题,主要优点是您可以继续向该模式添加类并让它们以相同的方式运行。
from dataclasses import dataclass
from typing import List
@dataclass
class CompositeDict:
def as_dict(self):
retval = dict()
for key, value in self.__dict__.items():
if key in self.__dataclass_fields__.keys():
if type(value) is list:
retval[key] = [item.as_dict() for item in value]
else:
retval[key] = value
return retval
@dataclass
class Point(CompositeDict):
x: int
y: int
@dataclass
class C(CompositeDict):
mylist: List[Point]
c = C([Point(0, 0), Point(10, 4)])
tmp = {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
assert c.as_dict() == tmp
as a side note, you could employ a factory pattern within the CompositeDict class that would handle other cases like nested dicts, tuples and such, which would save much boilerplate.
作为旁注,您可以在 CompositeDict 类中使用工厂模式来处理其他情况,如嵌套字典、元组等,这将节省大量样板。