pandas 熊猫分组到嵌套的json
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24374062/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas groupby to nested json
提问by Don
I often use pandas groupby to generate stacked tables. But then I often want to output the resulting nested relations to json. Is there any way to extract a nested json filed from the stacked table it produces?
我经常使用pandas groupby 来生成堆叠表。但是后来我经常想将生成的嵌套关系输出到 json。有没有办法从它产生的堆叠表中提取嵌套的json?
Let's say I have a df like:
假设我有一个 df 像:
year office candidate amount
2010 mayor joe smith 100.00
2010 mayor jay gould 12.00
2010 govnr pati mara 500.00
2010 govnr jess rapp 50.00
2010 govnr jess rapp 30.00
I can do:
我可以:
grouped = df.groupby('year', 'office', 'candidate').sum()
print grouped
amount
year office candidate
2010 mayor joe smith 100
jay gould 12
govnr pati mara 500
jess rapp 80
Beautiful! Of course, what I'd real like to do is get nested json via a command along the lines of grouped.to_json. But that feature isn't available. Any workarounds?
美丽的!当然,我真正想做的是通过 grouped.to_json 中的命令获得嵌套的 json。但该功能不可用。任何解决方法?
So, what I really want is something like:
所以,我真正想要的是:
{"2010": {"mayor": [
{"joe smith": 100},
{"jay gould": 12}
]
},
{"govnr": [
{"pati mara":500},
{"jess rapp": 80}
]
}
}
Don
大学教师
采纳答案by chrisb
I don't think think there is anything built-in to pandas to create a nested dictionary of the data. Below is some code that should work in general for a series with a MultiIndex, using a defaultdict
我认为 Pandas 没有内置任何内容来创建数据的嵌套字典。下面是一些通常适用于具有 MultiIndex 的系列的代码,使用defaultdict
The nesting code iterates through each level of the MultIndex, adding layers to the dictionary until the deepest layer is assigned to the Series value.
嵌套代码遍历 MultIndex 的每一层,将层添加到字典中,直到将最深的层分配给 Series 值。
In [99]: from collections import defaultdict
In [100]: results = defaultdict(lambda: defaultdict(dict))
In [101]: for index, value in grouped.itertuples():
...: for i, key in enumerate(index):
...: if i == 0:
...: nested = results[key]
...: elif i == len(index) - 1:
...: nested[key] = value
...: else:
...: nested = nested[key]
In [102]: results
Out[102]: defaultdict(<function <lambda> at 0x7ff17c76d1b8>, {2010: defaultdict(<type 'dict'>, {'govnr': {'pati mara': 500.0, 'jess rapp': 80.0}, 'mayor': {'joe smith': 100.0, 'jay gould': 12.0}})})
In [106]: print json.dumps(results, indent=4)
{
"2010": {
"govnr": {
"pati mara": 500.0,
"jess rapp": 80.0
},
"mayor": {
"joe smith": 100.0,
"jay gould": 12.0
}
}
}
回答by Shivam K. Thakkar
I had a look at the solution above and figured out that it only works for 3 levels of nesting. This solution will work for any number of levels.
我查看了上面的解决方案,发现它仅适用于 3 级嵌套。此解决方案适用于任意数量的级别。
import json
levels = len(grouped.index.levels)
dicts = [{} for i in range(levels)]
last_index = None
for index,value in grouped.itertuples():
if not last_index:
last_index = index
for (ii,(i,j)) in enumerate(zip(index, last_index)):
if not i == j:
ii = levels - ii -1
dicts[:ii] = [{} for _ in dicts[:ii]]
break
for i, key in enumerate(reversed(index)):
dicts[i][key] = value
value = dicts[i]
last_index = index
result = json.dumps(dicts[-1])
回答by iNecas
Here is a generic recursive solution for this problem:
这是此问题的通用递归解决方案:
def df_to_dict(df):
if df.ndim == 1:
return df.to_dict()
ret = {}
for key in df.index.get_level_values(0):
sub_df = df.xs(key)
ret[key] = df_to_dict(sub_df)
return ret
回答by Tom Dugovic
I'm aware this is an old question, but I came across the same issue recently. Here's my solution. I borrowed a lot of stuff from chrisb's example (Thank you!).
我知道这是一个老问题,但我最近遇到了同样的问题。这是我的解决方案。我从 chrisb 的例子中借了很多东西(谢谢!)。
This has the advantage that you can pass a lambda to get the final value from whatever enumerable you want, as well as for each group.
这样做的好处是您可以传递一个 lambda 来从您想要的任何枚举以及每个组中获取最终值。
from collections import defaultdict
def dict_from_enumerable(enumerable, final_value, *groups):
d = defaultdict(lambda: defaultdict(dict))
group_count = len(groups)
for item in enumerable:
nested = d
item_result = final_value(item) if callable(final_value) else item.get(final_value)
for i, group in enumerate(groups, start=1):
group_val = str(group(item) if callable(group) else item.get(group))
if i == group_count:
nested[group_val] = item_result
else:
nested = nested[group_val]
return d
In the question, you'd call this function like:
在这个问题中,你会像这样调用这个函数:
dict_from_enumerable(grouped.itertuples(), 'amount', 'year', 'office', 'candidate')
The first argument can be an array of data as well, not even requiring pandas.
第一个参数也可以是数据数组,甚至不需要Pandas。

