如何通过 Python Pandas 正确规范化 json

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46091362/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:27:51  来源:igfitidea点击:

How to normalize json correctly by Python Pandas

pythonjsonpython-2.7pandas

提问by chris198725

I am a beginner in Python. What I want to do is load a json file of forex historical price data by Pandas and do statistic with the data. I have go through many topics on Pandas and parsing json file. I want to pass a json file with extra value and nested list to a pandas data frame. I got a problem stuck here.

我是 Python 的初学者。我想要做的是通过 Pandas 加载外汇历史价格数据的 json 文件并对数据进行统计。我已经浏览了许多关于 Pandas 和解析 json 文件的主题。我想将带有额外值和嵌套列表的 json 文件传递​​给 Pandas 数据框。我有一个问题卡在这里。

I got a json file 'EUR_JPY_H8.json'

我有一个 json 文件 'EUR_JPY_H8.json'

First I import the lib that required,

首先我导入所需的库,

import pandas as pd
import json
from pandas.io.json import json_normalize

Then load the json file,

然后加载json文件,

with open('EUR_JPY_H8.json') as data_file:    
data = json.load(data_file)

I got a list below:

我在下面得到了一个列表:

[{u'complete': True,
u'mid': {u'c': u'119.743',
  u'h': u'119.891',
  u'l': u'119.249',
  u'o': u'119.341'},
u'time': u'1488319200.000000000',
u'volume': 14651},
{u'complete': True,
u'mid': {u'c': u'119.893',
  u'h': u'119.954',
  u'l': u'119.552',
  u'o': u'119.738'},
u'time': u'1488348000.000000000',
u'volume': 10738},
{u'complete': True,
u'mid': {u'c': u'119.946',
  u'h': u'120.221',
  u'l': u'119.840',
  u'o': u'119.888'},
u'time': u'1488376800.000000000',
u'volume': 10041}]

Then I pass the list to json_normalize. Try to get price which is in the nested list under 'mid'

然后我将列表传递给 json_normalize。尝试获取“mid”下嵌套列表中的价格

result = json_normalize(data,'time',['time','volume','complete',['mid','h'],['mid','l'],['mid','c'],['mid','o']])

But I got such result, json_normalize output

但是我得到了这样的结果, json_normalize 输出

The 'time' data got breakdown into each integer row by row. I have checked related document. I have to pass a string or list object to the 2nd parameter of json_normalize. How can I pass the timestamp there without breaking down.

“时间”数据逐行细分为每个整数。我检查了相关文件。我必须将字符串或列表对象传递给 json_normalize 的第二个参数。如何在不崩溃的情况下传递时间戳。

My expected output is:

我的预期输出是:

column = 
  index  |  time  | volumn  |  completed  |  mid.h  |  mid.l  |  mid.c  |  mid.o 

回答by cs95

You could just pass datawithout any extra params.

你可以data不带任何额外的参数就通过。

df = pd.io.json.json_normalize(data)
df

   complete    mid.c    mid.h    mid.l    mid.o                  time  volume
0      True  119.743  119.891  119.249  119.341  1488319200.000000000   14651
1      True  119.893  119.954  119.552  119.738  1488348000.000000000   10738
2      True  119.946  120.221  119.840  119.888  1488376800.000000000   10041


If you want to change the column order, use df.reindex:

如果要更改列顺序,请使用df.reindex

df = df.reindex(columns=['time', 'volume', 'complete', 'mid.h', 'mid.l', 'mid.c', 'mid.o'])
df

                   time  volume  complete    mid.h    mid.l    mid.c    mid.o
0  1488319200.000000000   14651      True  119.891  119.249  119.743  119.341
1  1488348000.000000000   10738      True  119.954  119.552  119.893  119.738
2  1488376800.000000000   10041      True  120.221  119.840  119.946  119.888