在 Pandas 数据框中提取嵌套的 JSON

Question

提问by Nickil Maveli

I am trying to unpack nested JSON in the following pandas dataframe:

我正在尝试在以下 Pandas 数据帧中解压嵌套的 JSON：

           id                                                              info
0           0  [{u'a': u'good', u'b': u'type1'}, {u'a': u'bad', u'b': u'type2'}]
1           1  [{u'a': u'bad', u'b': u'type1'}, {u'a': u'bad', u'b': u'type2'}]
2           2  [{u'a': u'good', u'b': u'type1'}, {u'a': u'good', u'b': u'type2'}]

My expected outcome is:

我的预期结果是：

           id        type1    type2
0           0        good     bad
1           1        bad      bad
2           2        good     good

I've been looking at other solutions including json_normalizebut it does not work for me unfortunately. Should I treat the JSON as a string to get what I want? Or is there a more straight forward way to do this?

我一直在寻找其他解决方案，包括json_normalize但不幸的是它对我不起作用。我应该将 JSON 视为字符串以获得我想要的吗？或者有没有更直接的方法来做到这一点？

Answer 1

回答by Nickil Maveli

Use json_normalizeto handle a listof dictionaries and break individual dicts into separate series after setting the common path, which is infohere. Then, unstack+ apply series which gets appended downwards for that level.

使用json_normalize来处理list字典和设置共同的路径，这是突破后的个别类型的字典成独立的系列信息在这里。然后，unstack+ apply 系列向下附加到该级别。

from pandas.io.json import json_normalize

df_info = json_normalize(df.to_dict('list'), ['info']).unstack().apply(pd.Series)
df_info

Pivot the DFwith an optional aggfuncto handle duplicated index axis:

DF使用一个可选的枢轴aggfunc来处理重复的索引轴：

DF = df_info.pivot_table(index=df_info.index.get_level_values(1), columns=['b'], 
                         values=['a'], aggfunc=' '.join)

DF

Finally Concatenate sideways:

最后横向连接：

pd.concat([df[['ID']], DF.xs('a', axis=1).rename_axis(None, 1)], axis=1)

Starting DFused:

开始DF使用：

df = pd.DataFrame(dict(ID=[0,1,2], info=[[{u'a': u'good', u'b': u'type1'}, {u'a': u'bad', u'b': u'type2'}], 
                                        [{u'a': u'bad', u'b': u'type1'}, {u'a': u'bad', u'b': u'type2'}],
                                        [{u'a': u'good', u'b': u'type1'}, {u'a': u'good', u'b': u'type2'}]]))

在 Pandas 数据框中提取嵌套的 JSON

提问by Nickil Maveli

回答by Nickil Maveli

相关推荐

最近更新

标签

在 Pandas 数据框中提取嵌套的 JSON

提问by Nickil Maveli

回答by Nickil Maveli

相关推荐

pandas 在熊猫中合并多索引数据框

pandas 熊猫，将多列的多个功能应用于 groupby 对象

Pandas Dataframe 到 HTML 删除索引

pandas - 数据框中出现的唯一行数

相关推荐

最近更新

标签