如何通过 Python Pandas 正确规范化 json

Question

提问by chris198725

I am a beginner in Python. What I want to do is load a json file of forex historical price data by Pandas and do statistic with the data. I have go through many topics on Pandas and parsing json file. I want to pass a json file with extra value and nested list to a pandas data frame. I got a problem stuck here.

我是 Python 的初学者。我想要做的是通过 Pandas 加载外汇历史价格数据的 json 文件并对数据进行统计。我已经浏览了许多关于 Pandas 和解析 json 文件的主题。我想将带有额外值和嵌套列表的 json 文件传递给 Pandas 数据框。我有一个问题卡在这里。

I got a json file 'EUR_JPY_H8.json'

我有一个 json 文件 'EUR_JPY_H8.json'

First I import the lib that required,

首先我导入所需的库，

import pandas as pd
import json
from pandas.io.json import json_normalize

Then load the json file,

然后加载json文件，

with open('EUR_JPY_H8.json') as data_file:    
data = json.load(data_file)

I got a list below:

我在下面得到了一个列表：

[{u'complete': True,
u'mid': {u'c': u'119.743',
  u'h': u'119.891',
  u'l': u'119.249',
  u'o': u'119.341'},
u'time': u'1488319200.000000000',
u'volume': 14651},
{u'complete': True,
u'mid': {u'c': u'119.893',
  u'h': u'119.954',
  u'l': u'119.552',
  u'o': u'119.738'},
u'time': u'1488348000.000000000',
u'volume': 10738},
{u'complete': True,
u'mid': {u'c': u'119.946',
  u'h': u'120.221',
  u'l': u'119.840',
  u'o': u'119.888'},
u'time': u'1488376800.000000000',
u'volume': 10041}]

Then I pass the list to json_normalize. Try to get price which is in the nested list under 'mid'

然后我将列表传递给 json_normalize。尝试获取“mid”下嵌套列表中的价格

result = json_normalize(data,'time',['time','volume','complete',['mid','h'],['mid','l'],['mid','c'],['mid','o']])

But I got such result, json_normalize output

但是我得到了这样的结果， json_normalize 输出

The 'time' data got breakdown into each integer row by row. I have checked related document. I have to pass a string or list object to the 2nd parameter of json_normalize. How can I pass the timestamp there without breaking down.

“时间”数据逐行细分为每个整数。我检查了相关文件。我必须将字符串或列表对象传递给 json_normalize 的第二个参数。如何在不崩溃的情况下传递时间戳。

My expected output is:

我的预期输出是：

column = 
  index  |  time  | volumn  |  completed  |  mid.h  |  mid.l  |  mid.c  |  mid.o

Answer 1

回答by cs95

You could just pass datawithout any extra params.

你可以data不带任何额外的参数就通过。

df = pd.io.json.json_normalize(data)
df

   complete    mid.c    mid.h    mid.l    mid.o                  time  volume
0      True  119.743  119.891  119.249  119.341  1488319200.000000000   14651
1      True  119.893  119.954  119.552  119.738  1488348000.000000000   10738
2      True  119.946  120.221  119.840  119.888  1488376800.000000000   10041

If you want to change the column order, use df.reindex:

如果要更改列顺序，请使用df.reindex：

df = df.reindex(columns=['time', 'volume', 'complete', 'mid.h', 'mid.l', 'mid.c', 'mid.o'])
df

                   time  volume  complete    mid.h    mid.l    mid.c    mid.o
0  1488319200.000000000   14651      True  119.891  119.249  119.743  119.341
1  1488348000.000000000   10738      True  119.954  119.552  119.893  119.738
2  1488376800.000000000   10041      True  120.221  119.840  119.946  119.888

如何通过 Python Pandas 正确规范化 json

提问by chris198725

回答by cs95

相关推荐

最近更新

标签

如何通过 Python Pandas 正确规范化 json

提问by chris198725

回答by cs95

相关推荐

Python 为什么每次 PyQt5 项目都会收到警告“QStandardPaths：XDG_RUNTIME_DIR 未设置”

Python 连接到 boto3 S3 时如何指定凭据？

Python 熊猫过滤和比较日期

Python 如何在 Jupyter Notebook 中做上标和下标？

相关推荐

最近更新

标签