Python 熊猫读取嵌套的json

Question

提问by Georg Heiler

I am curious how I can use pandas to read nested json of the following structure:

我很好奇如何使用 Pandas 读取以下结构的嵌套 json：

{
    "number": "",
    "date": "01.10.2016",
    "name": "R 3932",
    "locations": [
        {
            "depTimeDiffMin": "0",
            "name": "Spital am Pyhrn Bahnhof",
            "arrTime": "",
            "depTime": "06:32",
            "platform": "2",
            "stationIdx": "0",
            "arrTimeDiffMin": "",
            "track": "R 3932"
        },
        {
            "depTimeDiffMin": "0",
            "name": "Windischgarsten Bahnhof",
            "arrTime": "06:37",
            "depTime": "06:40",
            "platform": "2",
            "stationIdx": "1",
            "arrTimeDiffMin": "1",
            "track": ""
        },
        {
            "depTimeDiffMin": "",
            "name": "Linz/Donau Hbf",
            "arrTime": "08:24",
            "depTime": "",
            "platform": "1A-B",
            "stationIdx": "22",
            "arrTimeDiffMin": "1",
            "track": ""
        }
    ]
}

This here keeps the array as json. I would rather prefer it to be expanded into columns.

这在这里将数组保留为 json。我宁愿将其扩展为列。

pd.read_json("/myJson.json", orient='records')

edit

编辑

Thanks for the first answers. I should refine my question: A flattening of the nested attributes in the array is not mandatory. It would be ok to just [A, B, C] concatenate the df.locations['name'].

感谢您的第一个答案。我应该改进我的问题：数组中嵌套属性的展平不是强制性的。只需 [A, B, C] 连接 df.locations['name'] 就可以了。

My file contains multiple JSON objects (1 per line) I would like to keep number, date, name, and locations column. However, I would need to join the locations.

我的文件包含多个 JSON 对象（每行 1 个）我想保留数字、日期、名称和位置列。但是，我需要加入这些地点。

allLocations = ""
isFirst = True
for location in result.locations:
    if isFirst:
        isFirst = False
        allLocations = location['name']
    else:
        allLocations += "; " + location['name']
allLocations

My approach here does not seem to be efficient / pandas style.

我在这里的方法似乎并不高效/熊猫风格。

Answer 1

回答by jezrael

You can use json_normalize:

您可以使用json_normalize：

import json
from pandas.io.json import json_normalize    

with open('myJson.json') as data_file:    
    data = json.load(data_file)  

df = json_normalize(data, 'locations', ['date', 'number', 'name'], 
                    record_prefix='locations_')
print (df)
  locations_arrTime locations_arrTimeDiffMin locations_depTime  \
0                                                        06:32   
1             06:37                        1             06:40   
2             08:24                        1                     

  locations_depTimeDiffMin           locations_name locations_platform  \
0                        0  Spital am Pyhrn Bahnhof                  2   
1                        0  Windischgarsten Bahnhof                  2   
2                                    Linz/Donau Hbf               1A-B   

  locations_stationIdx locations_track number    name        date  
0                    0          R 3932         R 3932  01.10.2016  
1                    1                         R 3932  01.10.2016  
2                   22                         R 3932  01.10.2016

EDIT:

编辑：

You can use read_jsonwith parsing nameby DataFrameconstructor and last groupbywith apply join:

你可以用read_json与解析name的DataFrame构造函数，并最后groupby与应用join：

df = pd.read_json("myJson.json")
df.locations = pd.DataFrame(df.locations.values.tolist())['name']
df = df.groupby(['date','name','number'])['locations'].apply(','.join).reset_index()
print (df)
        date    name number                                          locations
0 2016-01-10  R 3932         Spital am Pyhrn Bahnhof,Windischgarsten Bahnho...

Python 熊猫读取嵌套的json

提问by Georg Heiler

edit

编辑

回答by jezrael

相关推荐

最近更新

标签

Python 熊猫读取嵌套的json

提问by Georg Heiler

edit

编辑

回答by jezrael

相关推荐

在python中确定变量的类型是NoneType

Python 获取 TypeError: __init__() 缺少 1 个必需的位置参数：'on_delete' 尝试在带有条目的子表之后添加父表时

Python ValueError：模型的特征数必须与输入匹配

Jupyter python3 notebook 无法识别熊猫

相关推荐

最近更新

标签

Python 获取 TypeError: init() 缺少 1 个必需的位置参数：'on_delete' 尝试在带有条目的子表之后添加父表时