pandas 熊猫分组到嵌套的json

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24374062/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:11:23  来源:igfitidea点击:

pandas groupby to nested json

pythonjsonpandas

提问by Don

I often use pandas groupby to generate stacked tables. But then I often want to output the resulting nested relations to json. Is there any way to extract a nested json filed from the stacked table it produces?

我经常使用pandas groupby 来生成堆叠表。但是后来我经常想将生成的嵌套关系输出到 json。有没有办法从它产生的堆叠表中提取嵌套的json?

Let's say I have a df like:

假设我有一个 df 像:

year office candidate  amount
2010 mayor  joe smith  100.00
2010 mayor  jay gould   12.00
2010 govnr  pati mara  500.00
2010 govnr  jess rapp   50.00
2010 govnr  jess rapp   30.00

I can do:

我可以:

grouped = df.groupby('year', 'office', 'candidate').sum()

print grouped
                       amount
year office candidate 
2010 mayor  joe smith   100
            jay gould    12
     govnr  pati mara   500
            jess rapp    80

Beautiful! Of course, what I'd real like to do is get nested json via a command along the lines of grouped.to_json. But that feature isn't available. Any workarounds?

美丽的!当然,我真正想做的是通过 grouped.to_json 中的命令获得嵌套的 json。但该功能不可用。任何解决方法?

So, what I really want is something like:

所以,我真正想要的是:

{"2010": {"mayor": [
                    {"joe smith": 100},
                    {"jay gould": 12}
                   ]
         }, 
          {"govnr": [
                     {"pati mara":500}, 
                     {"jess rapp": 80}
                    ]
          }
}

Don

大学教师

采纳答案by chrisb

I don't think think there is anything built-in to pandas to create a nested dictionary of the data. Below is some code that should work in general for a series with a MultiIndex, using a defaultdict

我认为 Pandas 没有内置任何内容来创建数据的嵌套字典。下面是一些通常适用于具有 MultiIndex 的系列的代码,使用defaultdict

The nesting code iterates through each level of the MultIndex, adding layers to the dictionary until the deepest layer is assigned to the Series value.

嵌套代码遍历 MultIndex 的每一层,将层添加到字典中,直到将最深的层分配给 Series 值。

In  [99]: from collections import defaultdict

In [100]: results = defaultdict(lambda: defaultdict(dict))

In [101]: for index, value in grouped.itertuples():
     ...:     for i, key in enumerate(index):
     ...:         if i == 0:
     ...:             nested = results[key]
     ...:         elif i == len(index) - 1:
     ...:             nested[key] = value
     ...:         else:
     ...:             nested = nested[key]

In [102]: results
Out[102]: defaultdict(<function <lambda> at 0x7ff17c76d1b8>, {2010: defaultdict(<type 'dict'>, {'govnr': {'pati mara': 500.0, 'jess rapp': 80.0}, 'mayor': {'joe smith': 100.0, 'jay gould': 12.0}})})

In [106]: print json.dumps(results, indent=4)
{
    "2010": {
        "govnr": {
            "pati mara": 500.0, 
            "jess rapp": 80.0
        }, 
        "mayor": {
            "joe smith": 100.0, 
            "jay gould": 12.0
        }
    }
}

回答by Shivam K. Thakkar

I had a look at the solution above and figured out that it only works for 3 levels of nesting. This solution will work for any number of levels.

我查看了上面的解决方案,发现它仅适用于 3 级嵌套。此解决方案适用于任意数量的级别。

import json
levels = len(grouped.index.levels)
dicts = [{} for i in range(levels)]
last_index = None

for index,value in grouped.itertuples():

    if not last_index:
        last_index = index

    for (ii,(i,j)) in enumerate(zip(index, last_index)):
        if not i == j:
            ii = levels - ii -1
            dicts[:ii] =  [{} for _ in dicts[:ii]]
            break

    for i, key in enumerate(reversed(index)):
        dicts[i][key] = value
        value = dicts[i]

    last_index = index


result = json.dumps(dicts[-1])

回答by iNecas

Here is a generic recursive solution for this problem:

这是此问题的通用递归解决方案:

def df_to_dict(df):
    if df.ndim == 1:
        return df.to_dict()

    ret = {}
    for key in df.index.get_level_values(0):
        sub_df = df.xs(key)
        ret[key] = df_to_dict(sub_df)
    return ret

回答by Tom Dugovic

I'm aware this is an old question, but I came across the same issue recently. Here's my solution. I borrowed a lot of stuff from chrisb's example (Thank you!).

我知道这是一个老问题,但我最近遇到了同样的问题。这是我的解决方案。我从 chrisb 的例子中借了很多东西(谢谢!)。

This has the advantage that you can pass a lambda to get the final value from whatever enumerable you want, as well as for each group.

这样做的好处是您可以传递一个 lambda 来从您想要的任何枚举以及每个组中获取最终值。

from collections import defaultdict

def dict_from_enumerable(enumerable, final_value, *groups):
    d = defaultdict(lambda: defaultdict(dict))
    group_count = len(groups)
    for item in enumerable:
        nested = d
        item_result = final_value(item) if callable(final_value) else item.get(final_value)
        for i, group in enumerate(groups, start=1):
            group_val = str(group(item) if callable(group) else item.get(group))
            if i == group_count:
                nested[group_val] = item_result
            else:
                nested = nested[group_val]
    return d

In the question, you'd call this function like:

在这个问题中,你会像这样调用这个函数:

dict_from_enumerable(grouped.itertuples(), 'amount', 'year', 'office', 'candidate')

The first argument can be an array of data as well, not even requiring pandas.

第一个参数也可以是数据数组,甚至不需要Pandas。