pandas DataFrame:规范化一个 JSON 列并与其他列合并

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49671693/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:25:25  来源:igfitidea点击:

pandas DataFrame: normalize one JSON column and merge with other columns

pythonjsonpandasdataframe

提问by stack_lech

I have a pandas DataFrame containing one column with multiple JSON data items as list of dicts. I want to normalize the JSON column and duplicate the non-JSON columns:

我有一个包含一列的 Pandas DataFrame,其中包含多个 JSON 数据项作为字典列表。我想规范化 JSON 列并复制非 JSON 列:

# creating dataframe
df_actions = pd.DataFrame(columns=['id', 'actions'])
rows = [[12,json.loads('[{"type": "a","value": "17"},{"type": "b","value": "19"}]')],
   [15, json.loads('[{"type": "a","value": "1"},{"type": "b","value": "3"},{"type": "c","value": "5"}]')]]
df_actions.loc[0] = rows[0]
df_actions.loc[1] = rows[1]

>>>df_actions
   id                                            actions
0  12  [{'type': 'a', 'value': '17'}, {'type': 'b', '...
1  15  [{'type': 'a', 'value': '1'}, {'type': 'b', 'v...

I want

我想要

>>>df_actions_parsed
   id      type    value
   12      a        17
   12      b        19
   15      a        1
   15      b        3
   15      c        5

I can normalize JSON data using:

我可以使用以下方法规范化 JSON 数据:

pd.concat([pd.DataFrame(json_normalize(x)) for x in df_actions['actions']],ignore_index=True)

but I don't know how to join that back to the id column of the original DataFrame.

但我不知道如何将其连接回原始 DataFrame 的 id 列。

回答by jezrael

You can use concatwith dict comprehensionwith popfor extract column, remove second level and jointo original:

您可以使用concatdict comprehensionpop用于提取塔,除去二级和join原始:

df1 = (pd.concat({i: pd.DataFrame(x) for i, x in df_actions.pop('actions').items()})
         .reset_index(level=1, drop=True)
         .join(df_actions)
         .reset_index(drop=True))

What is same as:

什么是相同的:

df1 = (pd.concat({i: json_normalize(x) for i, x in df_actions.pop('actions').items()})
         .reset_index(level=1, drop=True)
         .join(df_actions)
         .reset_index(drop=True))


print (df1)
  type value  id
0    a    17  12
1    b    19  12
2    a     1  15
3    b     3  15
4    c     5  15