Python 扁平化多级 JSON
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51359783/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python flatten multilevel JSON
提问by Bostjan
I am trying to convert JSON to CSV file, that I can use for further analysis. Issue with my structure is that I have quite some nested dict/lists when I convert my JSON file.
我正在尝试将 JSON 转换为 CSV 文件,以便进一步分析。我的结构的问题是我在转换 JSON 文件时有很多嵌套的字典/列表。
I tried to use pandas json_normalize()
, but it only flattens first level.
我尝试使用 pandas json_normalize()
,但它只会压平第一级。
import json
import pandas as pd
from pandas.io.json import json_normalize
from cs import CloudStack
api_key = xxxx
secret = xxxx
endpoint = xxxx
cs = CloudStack(endpoint=endpoint,
key=api_key,
secret=secret)
virtual_machines = cs.virtMach()
test = json_normalize(virtual_machines["virtualmachine"])
test.to_csv("test.csv", sep="|", index=False)
Any idea how to flatter whole JSON file, so I can create single line input to CSV file for single (in this case virtual machine) entry? I have tried couple of solutions posted here, but my result was always only first level was flattened.
知道如何修饰整个 JSON 文件,以便我可以为单个(在这种情况下为虚拟机)条目创建 CSV 文件的单行输入?我试过这里发布的几个解决方案,但我的结果总是只有第一级被压平。
This is sample JSON (in this case, I still get "securitygroup" and "nic" output as JSON format:
这是示例 JSON(在这种情况下,我仍然以 JSON 格式获得“securitygroup”和“nic”输出:
{
"count": 13,
"virtualmachine": [
{
"id": "1082e2ed-ff66-40b1-a41b-26061afd4a0b",
"name": "test-2",
"displayname": "test-2",
"securitygroup": [
{
"id": "9e649fbc-3e64-4395-9629-5e1215b34e58",
"name": "test",
"tags": []
}
],
"nic": [
{
"id": "79568b14-b377-4d4f-b024-87dc22492b8e",
"networkid": "05c0e278-7ab4-4a6d-aa9c-3158620b6471"
},
{
"id": "3d7f2818-1f19-46e7-aa98-956526c5b1ad",
"networkid": "b4648cfd-0795-43fc-9e50-6ee9ddefc5bd"
"traffictype": "Guest"
}
],
"hypervisor": "KVM",
"affinitygroup": [],
"isdynamicallyscalable": false
}
]
}
Thank you and best regards, Bostjan
谢谢你,最好的问候,Bostjan
回答by Bostjan
Thanks to gyx-hh, this has been resolved:
感谢 gyx-hh,这已经解决了:
I used following function (details can be found here):
我使用了以下功能(详细信息可以在这里找到):
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
This unfortunately completely flattens whole JSON, meaning that if you have multi-level JSON (many nested dictionaries), it might flatten everything into single line with tons of columns.
不幸的是,这完全扁平化了整个 JSON,这意味着如果您有多级 JSON(许多嵌套字典),它可能会将所有内容扁平化为包含大量列的单行。
What I used in the end was json_normalize()
and specified structure that I required. Nice example of how to do it that way can be found here.
我最终使用的是json_normalize()
并指定了我需要的结构。可以在此处找到有关如何以这种方式执行此操作的好示例。
Hopefully this hepls someone and again thank to gyx-hh for solution.
希望这对某人有所帮助,并再次感谢 gyx-hh 的解决方案。
Best regards
此致
回答by Azat Ibrakov
IMO accepted answerdoesn't properly handle JSON array.
IMO接受的答案不能正确处理 JSON 数组。
If JSON object has array as value then it should be flattened to array of objects like
如果 JSON 对象具有数组作为值,那么它应该被展平为对象数组,例如
{'a': [1, 2]} -> [{'a': 1}, {'a': 2}]
instead of adding index to key.
而不是向键添加索引。
And nested objects should be flattened by concatenating keys (e.g. with dot as separator) like
嵌套对象应该通过连接键(例如用点作为分隔符)来展平,例如
{'a': {'b': 1}} -> {'a.b': 1}
(and this is done correctly in accepted one).
(这在接受的一个中正确完成)。
With all these requirements I've ended up with following (developed and used in CPython3.5.3):
有了所有这些要求,我最终得到了以下内容(在CPython3.5.3 中开发和使用):
from functools import (partial,
singledispatch)
from itertools import chain
from typing import (Dict,
List,
TypeVar)
Serializable = TypeVar('Serializable', None, int, bool, float, str,
dict, list, tuple)
Array = List[Serializable]
Object = Dict[str, Serializable]
def flatten(object_: Object,
*,
path_separator: str = '.') -> Array[Object]:
"""
Flattens given JSON object into list of objects with non-nested values.
>>> flatten({'a': 1})
[{'a': 1}]
>>> flatten({'a': [1, 2]})
[{'a': 1}, {'a': 2}]
>>> flatten({'a': {'b': None}})
[{'a.b': None}]
"""
keys = set(object_)
result = [dict(object_)]
while keys:
key = keys.pop()
new_result = []
for index, record in enumerate(result):
try:
value = record[key]
except KeyError:
new_result.append(record)
else:
if isinstance(value, dict):
del record[key]
new_value = flatten_nested_objects(
value,
prefix=key + path_separator,
path_separator=path_separator)
keys.update(new_value.keys())
new_result.append({**new_value, **record})
elif isinstance(value, list):
del record[key]
new_records = [
flatten_nested_objects(sub_value,
prefix=key + path_separator,
path_separator=path_separator)
for sub_value in value]
keys.update(chain.from_iterable(map(dict.keys,
new_records)))
new_result.extend({**new_record, **record}
for new_record in new_records)
else:
new_result.append(record)
result = new_result
return result
@singledispatch
def flatten_nested_objects(object_: Serializable,
*,
prefix: str = '',
path_separator: str) -> Object:
return {prefix[:-len(path_separator)]: object_}
@flatten_nested_objects.register(dict)
def _(object_: Object,
*,
prefix: str = '',
path_separator: str) -> Object:
result = dict(object_)
for key in list(result):
result.update(flatten_nested_objects(result.pop(key),
prefix=(prefix + key
+ path_separator),
path_separator=path_separator))
return result
@flatten_nested_objects.register(list)
def _(object_: Array,
*,
prefix: str = '',
path_separator: str) -> Object:
return {prefix[:-len(path_separator)]: list(map(partial(
flatten_nested_objects,
path_separator=path_separator),
object_))}
回答by ak4zh
I use this simple function to normalize and flatten data to json. It accepts list, dict, tuple and flattens it to a json.
我使用这个简单的函数将数据规范化和扁平化为 json。它接受列表、字典、元组并将其展平为 json。
def normalize_data_to_json(raw_data: [list, dict, tuple], parent=""):
from datetime import datetime
from decimal import Decimal
result = {}
# key name normalise to snake case (single underscore)
parent = parent.lower().replace(" ", "_") if isinstance(parent, str) else parent
if isinstance(parent, str) and parent.startswith("__"):
# if parent has no parent remove double underscore and treat as int if digit else as str
# treating as int is better if passed data is a list so you output is index based dict
parent = int(parent.lstrip("_")) if parent.lstrip("_").isdigit() else parent.lstrip("_")
# handle str, int, float, and decimal.
# you can easily add more data types as er your data
if type(raw_data) in [str, int, float, Decimal]:
result[parent] = float(raw_data) if isinstance(raw_data, Decimal) else raw_data
# normalise datetime object
elif isinstance(raw_data, datetime):
result[parent] = raw_data.strftime("%Y-%m-%d %H:%M:%S")
# normalise dict and all nested dicts.
# all nests are joined with double underscore to identify parent key name with it's children
elif isinstance(raw_data, dict):
for k, v in raw_data.items():
k = f'{parent}__{k}' if parent else k
result.update(normalize_data_to_json(v, parent=k))
# normalise list and tuple
elif type(raw_data) in [list, tuple]:
for i, sub_item in enumerate(raw_data, start=1):
result.update(normalize_data_to_json(sub_item, f"{parent}__{i}"))
# any data which did not matched above data types, normalise them using it's __str__
else:
result[parent] = str(raw_data)
return result
回答by Paul Whipp
In case anyone else finds themselves here and is looking for a solution better suited to subsequent programmatic treatment:
如果其他人发现自己在这里并正在寻找更适合后续程序化处理的解决方案:
Flattening the lists creates the need to process the headings for list lengths etc. I wanted a solution where if there are 2 lists of e.g. 2 elements then there would be four rows generated yielding each valid potential data row (see below for actual examples):
展平列表需要处理列表长度等的标题。我想要一个解决方案,如果有 2 个列表,例如 2 个元素,那么将生成四行,产生每个有效的潜在数据行(有关实际示例,请参见下文):
class MapFlattener:
def __init__(self):
self.headings = []
self.rows = []
def add_rows(self, headings, rows):
self.headings = [*self.headings, *headings]
if self.rows:
new_rows = []
for base_row in self.rows:
for row in rows:
new_rows.append([*base_row, *row])
self.rows = new_rows
else:
self.rows = rows
def __call__(self, mapping):
for heading, value in mapping.items():
if isinstance(value, Mapping):
sub_headings, sub_rows = MapFlattener()(value)
sub_headings = [f'{heading}:{sub_heading}' for sub_heading in sub_headings]
self.add_rows(sub_headings, sub_rows)
continue
if isinstance(value, list):
self.add_rows([heading], [[e] for e in value])
continue
self.add_rows([heading], [[value]])
return self.headings, self.rows
def map_flatten(mapping):
return MapFlattener()(mapping)
This creates output more in line with relational data:
这将创建更符合关系数据的输出:
In [22]: map_flatten({'l': [1,2]})
Out[22]: (['l'], [[1], [2]])
In [23]: map_flatten({'l': [1,2], 'n': 7})
Out[23]: (['l', 'n'], [[1, 7], [2, 7]])
In [24]: map_flatten({'l': [1,2], 'n': 7, 'o': {'a': 1, 'b': 2}})
Out[24]: (['l', 'n', 'o:a', 'o:b'], [[1, 7, 1, 2], [2, 7, 1, 2]])
This is particularly useful if you are using the csv in spreadsheets etc. and need to process the flattened data.
如果您在电子表格等中使用 csv 并且需要处理扁平化的数据,这将特别有用。
回答by TrickiDicki
Outputting in jsonpath format:
以jsonpath格式输出:
def convert(f):
out = {}
def flatten(x, name=None):
if type(x) is dict:
for a in x:
val = '.'.join((name, a)) if name else a
flatten(x[a], val)
elif type(x) is list:
for (i, a) in enumerate(x):
flatten(a, name + f'[{str(i)}]')
else:
out[name] = x if x else ""
flatten(f)
return out
回答by Utkarsh Shukla
Just pass your dictionary here:
只需在此处传递您的字典:
def getKeyValuePair(dic,master_dic = {},master_key = None):
keys = list(dic.keys())
for key in keys:
if type(dic[key]) == dict:
getKeyValuePair(dic[key],master_dic = master_dic,master_key = key)
else:
if master_key == None:
master_dic[key] = dic[key]
else:
master_dic[str(master_key)+'_'+str(key)] = dic[key]
return master_dic