pandas 熊猫分组到嵌套的json

Question

提问by Don

I often use pandas groupby to generate stacked tables. But then I often want to output the resulting nested relations to json. Is there any way to extract a nested json filed from the stacked table it produces?

我经常使用pandas groupby 来生成堆叠表。但是后来我经常想将生成的嵌套关系输出到 json。有没有办法从它产生的堆叠表中提取嵌套的json？

Let's say I have a df like:

假设我有一个 df 像：

year office candidate  amount
2010 mayor  joe smith  100.00
2010 mayor  jay gould   12.00
2010 govnr  pati mara  500.00
2010 govnr  jess rapp   50.00
2010 govnr  jess rapp   30.00

I can do:

我可以：

grouped = df.groupby('year', 'office', 'candidate').sum()

print grouped
                       amount
year office candidate 
2010 mayor  joe smith   100
            jay gould    12
     govnr  pati mara   500
            jess rapp    80

Beautiful! Of course, what I'd real like to do is get nested json via a command along the lines of grouped.to_json. But that feature isn't available. Any workarounds?

美丽的！当然，我真正想做的是通过 grouped.to_json 中的命令获得嵌套的 json。但该功能不可用。任何解决方法？

So, what I really want is something like:

所以，我真正想要的是：

{"2010": {"mayor": [
                    {"joe smith": 100},
                    {"jay gould": 12}
                   ]
         }, 
          {"govnr": [
                     {"pati mara":500}, 
                     {"jess rapp": 80}
                    ]
          }
}

Don

大学教师

Answer 1

采纳答案by chrisb

I don't think think there is anything built-in to pandas to create a nested dictionary of the data. Below is some code that should work in general for a series with a MultiIndex, using a defaultdict

我认为 Pandas 没有内置任何内容来创建数据的嵌套字典。下面是一些通常适用于具有 MultiIndex 的系列的代码，使用defaultdict

The nesting code iterates through each level of the MultIndex, adding layers to the dictionary until the deepest layer is assigned to the Series value.

嵌套代码遍历 MultIndex 的每一层，将层添加到字典中，直到将最深的层分配给 Series 值。

In  [99]: from collections import defaultdict

In [100]: results = defaultdict(lambda: defaultdict(dict))

In [101]: for index, value in grouped.itertuples():
     ...:     for i, key in enumerate(index):
     ...:         if i == 0:
     ...:             nested = results[key]
     ...:         elif i == len(index) - 1:
     ...:             nested[key] = value
     ...:         else:
     ...:             nested = nested[key]

In [102]: results
Out[102]: defaultdict(<function <lambda> at 0x7ff17c76d1b8>, {2010: defaultdict(<type 'dict'>, {'govnr': {'pati mara': 500.0, 'jess rapp': 80.0}, 'mayor': {'joe smith': 100.0, 'jay gould': 12.0}})})

In [106]: print json.dumps(results, indent=4)
{
    "2010": {
        "govnr": {
            "pati mara": 500.0, 
            "jess rapp": 80.0
        }, 
        "mayor": {
            "joe smith": 100.0, 
            "jay gould": 12.0
        }
    }
}

Answer 2

回答by Shivam K. Thakkar

I had a look at the solution above and figured out that it only works for 3 levels of nesting. This solution will work for any number of levels.

我查看了上面的解决方案，发现它仅适用于 3 级嵌套。此解决方案适用于任意数量的级别。

import json
levels = len(grouped.index.levels)
dicts = [{} for i in range(levels)]
last_index = None

for index,value in grouped.itertuples():

    if not last_index:
        last_index = index

    for (ii,(i,j)) in enumerate(zip(index, last_index)):
        if not i == j:
            ii = levels - ii -1
            dicts[:ii] =  [{} for _ in dicts[:ii]]
            break

    for i, key in enumerate(reversed(index)):
        dicts[i][key] = value
        value = dicts[i]

    last_index = index


result = json.dumps(dicts[-1])

Answer 3

回答by iNecas

Here is a generic recursive solution for this problem:

这是此问题的通用递归解决方案：

def df_to_dict(df):
    if df.ndim == 1:
        return df.to_dict()

    ret = {}
    for key in df.index.get_level_values(0):
        sub_df = df.xs(key)
        ret[key] = df_to_dict(sub_df)
    return ret

Answer 4

回答by Tom Dugovic

I'm aware this is an old question, but I came across the same issue recently. Here's my solution. I borrowed a lot of stuff from chrisb's example (Thank you!).

我知道这是一个老问题，但我最近遇到了同样的问题。这是我的解决方案。我从 chrisb 的例子中借了很多东西（谢谢！）。

This has the advantage that you can pass a lambda to get the final value from whatever enumerable you want, as well as for each group.

这样做的好处是您可以传递一个 lambda 来从您想要的任何枚举以及每个组中获取最终值。

from collections import defaultdict

def dict_from_enumerable(enumerable, final_value, *groups):
    d = defaultdict(lambda: defaultdict(dict))
    group_count = len(groups)
    for item in enumerable:
        nested = d
        item_result = final_value(item) if callable(final_value) else item.get(final_value)
        for i, group in enumerate(groups, start=1):
            group_val = str(group(item) if callable(group) else item.get(group))
            if i == group_count:
                nested[group_val] = item_result
            else:
                nested = nested[group_val]
    return d

In the question, you'd call this function like:

在这个问题中，你会像这样调用这个函数：

dict_from_enumerable(grouped.itertuples(), 'amount', 'year', 'office', 'candidate')

The first argument can be an array of data as well, not even requiring pandas.

第一个参数也可以是数据数组，甚至不需要Pandas。

pandas 熊猫分组到嵌套的json

提问by Don

采纳答案by chrisb

回答by Shivam K. Thakkar

回答by iNecas

回答by Tom Dugovic

相关推荐

最近更新

标签

pandas 熊猫分组到嵌套的json

提问by Don

采纳答案by chrisb

回答by Shivam K. Thakkar

回答by iNecas

回答by Tom Dugovic

相关推荐

使用 XlsxWriter 在 Pandas 中导出到“xlsx”时应用样式

Python 中的 Fama Macbeth 回归（Pandas 或 Statsmodels）

numpy genfromtxt/pandas read_csv；忽略引号内的逗号

Python Pandas 到 R 数据框

相关推荐

最近更新

标签