pandas 展平双重嵌套的 JSON

Question

提问by spaine

I am trying to flatten a JSON file that looks like this:

我试图展平一个看起来像这样的 JSON 文件：

{
"teams": [
  {
    "teamname": "1",
    "members": [
      {
        "firstname": "John", 
        "lastname": "Doe",
        "orgname": "Anon",
        "phone": "916-555-1234",
        "mobile": "",
        "email": "[email protected]"
      },
      {
        "firstname": "Jane",
        "lastname": "Doe",
        "orgname": "Anon",
        "phone": "916-555-4321",
        "mobile": "916-555-7890",
        "email": "[email protected]"
      }
    ]
  },
  {
    "teamname": "2",
    "members": [
      {
        "firstname": "Mickey",
        "lastname": "Moose",
        "orgname": "Moosers",
        "phone": "916-555-0000",
        "mobile": "916-555-1111",
        "email": "[email protected]"
      },
      {
        "firstname": "Minny",
        "lastname": "Moose",
        "orgname": "Moosers",
        "phone": "916-555-2222",
        "mobile": "",
        "email": "[email protected]"
      }
    ]
  }       
]

}

I wish to export this to an excel table. My current code is this:

我希望将其导出到 excel 表。我目前的代码是这样的：

from pandas.io.json import json_normalize
import json
import pandas as pd

inputFile = 'E:\teams.json'
outputFile = 'E:\teams.xlsx'

f = open(inputFile)
data = json.load(f)
f.close()

df = pd.DataFrame(data)

result1 = json_normalize(data, 'teams' )
print result1

results in this output:

导致此输出：

members                                              teamname
0  [{u'firstname': u'John', u'phone': u'916-555-...        1
1  [{u'firstname': u'Mickey', u'phone': u'916-555-...      2

There are 2 members's data nested within each row. I would like to have an output table that displays all 4 members' data plus their associated teamname.

每行嵌套有 2 个成员的数据。我想要一个输出表，显示所有 4 个成员的数据以及他们关联的团队名称。

Answer 1

采纳答案by piRSquared

This is one way to do it. Should give you some ideas.

这是一种方法。应该给你一些想法。

df = pd.concat(
    [
        pd.concat([pd.Series(m) for m in t['members']], axis=1) for t in data['teams']
    ], keys=[t['teamname'] for t in data['teams']]
)

                                     0                         1
1 email          [email protected]     [email protected]
  firstname                       John                      Jane
  lastname                         Doe                       Doe
  mobile                                            916-555-7890
  orgname                         Anon                      Anon
  phone                   916-555-1234              916-555-4321
2 email      [email protected]  [email protected]
  firstname                     Mickey                     Minny
  lastname                       Moose                     Moose
  mobile                  916-555-1111                          
  orgname                      Moosers                   Moosers
  phone                   916-555-0000              916-555-2222

To get a nice table with team name and members as rows, all attributes in columns:

要获得一个以团队名称和成员为行的漂亮表格，列中的所有属性：

df.index.levels[0].name = 'teamname'
df.columns.name = 'member'

df.T.stack(0).swaplevel(0, 1).sort_index()

To get team name and member as actual columns, just reset the index.

要将团队名称和成员作为实际列，只需重置索引。

df.index.levels[0].name = 'teamname'
df.columns.name = 'member'

df.T.stack(0).swaplevel(0, 1).sort_index().reset_index()

The whole thing

整个东西

import json
import pandas as pd

json_text = """{
"teams": [
  {
    "teamname": "1",
    "members": [
      {
        "firstname": "John", 
        "lastname": "Doe",
        "orgname": "Anon",
        "phone": "916-555-1234",
        "mobile": "",
        "email": "[email protected]"
      },
      {
        "firstname": "Jane",
        "lastname": "Doe",
        "orgname": "Anon",
        "phone": "916-555-4321",
        "mobile": "916-555-7890",
        "email": "[email protected]"
      }
    ]
  },
  {
    "teamname": "2",
    "members": [
      {
        "firstname": "Mickey",
        "lastname": "Moose",
        "orgname": "Moosers",
        "phone": "916-555-0000",
        "mobile": "916-555-1111",
        "email": "[email protected]"
      },
      {
        "firstname": "Minny",
        "lastname": "Moose",
        "orgname": "Moosers",
        "phone": "916-555-2222",
        "mobile": "",
        "email": "[email protected]"
      }
    ]
  }       
]
}"""


data = json.loads(json_text)

df = pd.concat(
    [
        pd.concat([pd.Series(m) for m in t['members']], axis=1) for t in data['teams']
    ], keys=[t['teamname'] for t in data['teams']]
)

df.index.levels[0].name = 'teamname'
df.columns.name = 'member'

df.T.stack(0).swaplevel(0, 1).sort_index().reset_index()

Answer 2

回答by Jason Chiu

Use `pandas.io.json.json_normalize`

用 `pandas.io.json.json_normalize`

json_normalize(data,record_path=['teams','members'],meta=[['teams','teamname']])

output:
         email                firstname lastname mobile      orgname    phone       teams.teamname
0   [email protected]       John    Doe                   Anon      916-555-1234    1
1   [email protected]       Jane    Doe     916-555-7890  Anon      916-555-4321    1
2   [email protected]   Mickey  Moose   916-555-1111  Moosers   916-555-0000    2
3   [email protected]    Minny   Moose                 Moosers   916-555-2222    2

Explanation

解释

from pandas.io.json import json_normalize
import pandas as pd

I've only learned how to use the json_normalize function recently so my explanation might not be right.

我最近才学会了如何使用 json_normalize 函数，所以我的解释可能不对。

Start with what I'm calling 'Layer 0'

从我所说的“第 0 层”开始

json_normalize(data)

output:
     teams
0   [{'teamname': '1', 'members': [{'firstname': '...

There is 1 Column and 1 Row. Everything is inside the 'team' column.

有 1 列和 1 行。一切都在“团队”栏中。

Look into what I'm calling 'Layer 1' by using record_path=

使用 record_path= 查看我所说的“第 1 层”

json_normalize(data,record_path='teams')

output:
     members                                          teamname
0   [{'firstname': 'John', 'lastname': 'Doe', 'org...    1
1   [{'firstname': 'Mickey', 'lastname': 'Moose', ...    2

In Layer 1 we have have flattened 'teamname' but there is more inside 'members'.

在第 1 层中，我们已经扁平化了“teamname”，但在“members”内部还有更多。

Look into Layer 2 with record_path=. The notation is unintuitive at first. I now remember it by ['layer','deeperlayer'] where the result is layer.deeperlayer.

使用 record_path= 查看第 2 层。该符号起初是不直观的。我现在通过 ['layer','deeperlayer'] 记住它，结果是 layer.deeperlayer。

json_normalize(data,record_path=['teams','members'])

output:
           email              firstname lastname   mobile     orgname   phone
0   [email protected]      John        Doe                  Anon    916-555-1234
1   [email protected]       Jane        Doe   916-555-7890  Anon    916-555-4321
2   [email protected]   Mickey     Moose   916-555-1111 Moosers 916-555-0000
3   [email protected]    Minny       Moose               Moosers 916-555-2222

Excuse my output, I don't know how to make tables in a response.

请原谅我的输出，我不知道如何在响应中制作表格。

Finally we add in Layer 1 columns using meta=

最后，我们使用 meta= 添加第 1 层列

json_normalize(data,record_path=['teams','members'],meta=[['teams','teamname']])

output:
         email                firstname lastname mobile      orgname    phone       teams.teamname
0   [email protected]       John    Doe                   Anon      916-555-1234    1
1   [email protected]       Jane    Doe     916-555-7890  Anon      916-555-4321    1
2   [email protected]   Mickey  Moose   916-555-1111  Moosers   916-555-0000    2
3   [email protected]    Minny   Moose                 Moosers   916-555-2222    2

Notice how we needed a list of lists for meta=[[]] to reference Layer 1. If there was a column we want from Layer 0 and Layer 1 we could do this:

请注意我们如何需要一个用于 meta=[[]] 的列表列表来引用第 1 层。如果我们想要来自第 0 层和第 1 层的列，我们可以这样做：

json_normalize(data,record_path=['layer1','layer2'],meta=['layer0',['layer0','layer1']])

The result of the json_normalize is a pandas dataframe.

json_normalize 的结果是一个Pandas数据框。

pandas 展平双重嵌套的 JSON

提问by spaine

采纳答案by piRSquared

The whole thing

整个东西

回答by Jason Chiu

Use `pandas.io.json.json_normalize`

用 `pandas.io.json.json_normalize`

相关推荐

最近更新

标签

pandas 展平双重嵌套的 JSON

提问by spaine

采纳答案by piRSquared

The whole thing

整个东西

回答by Jason Chiu

Use pandas.io.json.json_normalize

用 pandas.io.json.json_normalize

相关推荐

pandas 来自熊猫数据框python的barh图中行的不同颜色

Pandas 从列表创建数据框列

pandas 如何在python中使用matplotlib创建曼哈顿图？

为 Pandas DataFrame 图设置 xlim

相关推荐

最近更新

标签

Use `pandas.io.json.json_normalize`

用 `pandas.io.json.json_normalize`