pandas 展平双重嵌套的 JSON
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37668291/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Flatten double nested JSON
提问by spaine
I am trying to flatten a JSON file that looks like this:
我试图展平一个看起来像这样的 JSON 文件:
{
"teams": [
{
"teamname": "1",
"members": [
{
"firstname": "John",
"lastname": "Doe",
"orgname": "Anon",
"phone": "916-555-1234",
"mobile": "",
"email": "[email protected]"
},
{
"firstname": "Jane",
"lastname": "Doe",
"orgname": "Anon",
"phone": "916-555-4321",
"mobile": "916-555-7890",
"email": "[email protected]"
}
]
},
{
"teamname": "2",
"members": [
{
"firstname": "Mickey",
"lastname": "Moose",
"orgname": "Moosers",
"phone": "916-555-0000",
"mobile": "916-555-1111",
"email": "[email protected]"
},
{
"firstname": "Minny",
"lastname": "Moose",
"orgname": "Moosers",
"phone": "916-555-2222",
"mobile": "",
"email": "[email protected]"
}
]
}
]
}
}
I wish to export this to an excel table. My current code is this:
我希望将其导出到 excel 表。我目前的代码是这样的:
from pandas.io.json import json_normalize
import json
import pandas as pd
inputFile = 'E:\teams.json'
outputFile = 'E:\teams.xlsx'
f = open(inputFile)
data = json.load(f)
f.close()
df = pd.DataFrame(data)
result1 = json_normalize(data, 'teams' )
print result1
results in this output:
导致此输出:
members teamname
0 [{u'firstname': u'John', u'phone': u'916-555-... 1
1 [{u'firstname': u'Mickey', u'phone': u'916-555-... 2
There are 2 members's data nested within each row. I would like to have an output table that displays all 4 members' data plus their associated teamname.
每行嵌套有 2 个成员的数据。我想要一个输出表,显示所有 4 个成员的数据以及他们关联的团队名称。
采纳答案by piRSquared
This is one way to do it. Should give you some ideas.
这是一种方法。应该给你一些想法。
df = pd.concat(
[
pd.concat([pd.Series(m) for m in t['members']], axis=1) for t in data['teams']
], keys=[t['teamname'] for t in data['teams']]
)
0 1
1 email [email protected] [email protected]
firstname John Jane
lastname Doe Doe
mobile 916-555-7890
orgname Anon Anon
phone 916-555-1234 916-555-4321
2 email [email protected] [email protected]
firstname Mickey Minny
lastname Moose Moose
mobile 916-555-1111
orgname Moosers Moosers
phone 916-555-0000 916-555-2222
To get a nice table with team name and members as rows, all attributes in columns:
要获得一个以团队名称和成员为行的漂亮表格,列中的所有属性:
df.index.levels[0].name = 'teamname'
df.columns.name = 'member'
df.T.stack(0).swaplevel(0, 1).sort_index()
To get team name and member as actual columns, just reset the index.
要将团队名称和成员作为实际列,只需重置索引。
df.index.levels[0].name = 'teamname'
df.columns.name = 'member'
df.T.stack(0).swaplevel(0, 1).sort_index().reset_index()
The whole thing
整个东西
import json
import pandas as pd
json_text = """{
"teams": [
{
"teamname": "1",
"members": [
{
"firstname": "John",
"lastname": "Doe",
"orgname": "Anon",
"phone": "916-555-1234",
"mobile": "",
"email": "[email protected]"
},
{
"firstname": "Jane",
"lastname": "Doe",
"orgname": "Anon",
"phone": "916-555-4321",
"mobile": "916-555-7890",
"email": "[email protected]"
}
]
},
{
"teamname": "2",
"members": [
{
"firstname": "Mickey",
"lastname": "Moose",
"orgname": "Moosers",
"phone": "916-555-0000",
"mobile": "916-555-1111",
"email": "[email protected]"
},
{
"firstname": "Minny",
"lastname": "Moose",
"orgname": "Moosers",
"phone": "916-555-2222",
"mobile": "",
"email": "[email protected]"
}
]
}
]
}"""
data = json.loads(json_text)
df = pd.concat(
[
pd.concat([pd.Series(m) for m in t['members']], axis=1) for t in data['teams']
], keys=[t['teamname'] for t in data['teams']]
)
df.index.levels[0].name = 'teamname'
df.columns.name = 'member'
df.T.stack(0).swaplevel(0, 1).sort_index().reset_index()
回答by Jason Chiu
Use pandas.io.json.json_normalize
用 pandas.io.json.json_normalize
json_normalize(data,record_path=['teams','members'],meta=[['teams','teamname']])
output:
email firstname lastname mobile orgname phone teams.teamname
0 [email protected] John Doe Anon 916-555-1234 1
1 [email protected] Jane Doe 916-555-7890 Anon 916-555-4321 1
2 [email protected] Mickey Moose 916-555-1111 Moosers 916-555-0000 2
3 [email protected] Minny Moose Moosers 916-555-2222 2
Explanation
解释
from pandas.io.json import json_normalize
import pandas as pd
I've only learned how to use the json_normalize function recently so my explanation might not be right.
我最近才学会了如何使用 json_normalize 函数,所以我的解释可能不对。
Start with what I'm calling 'Layer 0'
从我所说的“第 0 层”开始
json_normalize(data)
output:
teams
0 [{'teamname': '1', 'members': [{'firstname': '...
There is 1 Column and 1 Row. Everything is inside the 'team' column.
有 1 列和 1 行。一切都在“团队”栏中。
Look into what I'm calling 'Layer 1' by using record_path=
使用 record_path= 查看我所说的“第 1 层”
json_normalize(data,record_path='teams')
output:
members teamname
0 [{'firstname': 'John', 'lastname': 'Doe', 'org... 1
1 [{'firstname': 'Mickey', 'lastname': 'Moose', ... 2
In Layer 1 we have have flattened 'teamname' but there is more inside 'members'.
在第 1 层中,我们已经扁平化了“teamname”,但在“members”内部还有更多。
Look into Layer 2 with record_path=. The notation is unintuitive at first. I now remember it by ['layer','deeperlayer'] where the result is layer.deeperlayer.
使用 record_path= 查看第 2 层。该符号起初是不直观的。我现在通过 ['layer','deeperlayer'] 记住它,结果是 layer.deeperlayer。
json_normalize(data,record_path=['teams','members'])
output:
email firstname lastname mobile orgname phone
0 [email protected] John Doe Anon 916-555-1234
1 [email protected] Jane Doe 916-555-7890 Anon 916-555-4321
2 [email protected] Mickey Moose 916-555-1111 Moosers 916-555-0000
3 [email protected] Minny Moose Moosers 916-555-2222
Excuse my output, I don't know how to make tables in a response.
请原谅我的输出,我不知道如何在响应中制作表格。
Finally we add in Layer 1 columns using meta=
最后,我们使用 meta= 添加第 1 层列
json_normalize(data,record_path=['teams','members'],meta=[['teams','teamname']])
output:
email firstname lastname mobile orgname phone teams.teamname
0 [email protected] John Doe Anon 916-555-1234 1
1 [email protected] Jane Doe 916-555-7890 Anon 916-555-4321 1
2 [email protected] Mickey Moose 916-555-1111 Moosers 916-555-0000 2
3 [email protected] Minny Moose Moosers 916-555-2222 2
Notice how we needed a list of lists for meta=[[]] to reference Layer 1. If there was a column we want from Layer 0 and Layer 1 we could do this:
请注意我们如何需要一个用于 meta=[[]] 的列表列表来引用第 1 层。如果我们想要来自第 0 层和第 1 层的列,我们可以这样做:
json_normalize(data,record_path=['layer1','layer2'],meta=['layer0',['layer0','layer1']])
The result of the json_normalize is a pandas dataframe.
json_normalize 的结果是一个Pandas数据框。