Pandas.read_json(JSON_URL)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46578128/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:35:32  来源:igfitidea点击:

Pandas.read_json(JSON_URL)

pythonjsonpandas

提问by Sarfraz

I am using Pandas to get data from an API. The API returns data in JSON format. However the json has some values that I don't want in the dataframe. Because of these values, I am not able to assign an index to data frame. Following is the format.

我正在使用 Pandas 从 API 获取数据。API 以 JSON 格式返回数据。但是 json 有一些我不想要的值出现在数据框中。由于这些值,我无法为数据框分配索引。以下是格式。

{
"Success": true,
"message": "",
"result": [{"id":12312312, "TimeStamp":"2017-10-04T17:39:53.92","Quantity":3.03046306,},{"id": 2342344, "TimeStamp":"2017-10-04T17:39:53.92","Quantity":3.03046306,}]}

I am only interested in the "result" part. One way to do this is to import json with request.get(request_URL)and then after extracting the "result" part, convert the result into the dataframe. 2nd way can be to import the data with Pandas.read_json(JSON_URL)convert the returning dataframe back to a json, then after extracting "result" part, convert the result into the dataframe.

我只对“结果”部分感兴趣。一种方法是导入 json ,request.get(request_URL)然后在提取“结果”部分后,将结果转换为数据帧。第二种方法可以是导入数据Pandas.read_json(JSON_URL)并将返回的数据帧转换回json,然后在提取“结果”部分后,将结果转换为数据帧。

Is there any other way to do this? What is the best approach and why? Thanks.

有没有其他方法可以做到这一点?什么是最好的方法,为什么?谢谢。

回答by jezrael

Use json_normalize:

使用json_normalize

import pandas as pd

df = pd.json_normalize(json['result'])
print (df)

   Quantity               TimeStamp        id
0  3.030463  2017-10-04T17:39:53.92  12312312
1  3.030463  2017-10-04T17:39:53.92   2342344

Also here working:

也在这里工作:

df = pd.DataFrame(d['result'])
print (df)
   Quantity               TimeStamp        id
0  3.030463  2017-10-04T17:39:53.92  12312312
1  3.030463  2017-10-04T17:39:53.92   2342344

For DatetimeIndexconvert column to_datetimeand set_index:

对于DatetimeIndex转换列to_datetimeset_index

df['TimeStamp'] = pd.to_datetime(df['TimeStamp'])
df = df.set_index('TimeStamp')
print (df)

                         Quantity        id
TimeStamp                                  
2017-10-04 17:39:53.920  3.030463  12312312
2017-10-04 17:39:53.920  3.030463   2342344

EDIT:

编辑:

Solution with load data:

负载数据的解决方案:

from urllib.request import urlopen
import json
import pandas as pd

response = urlopen("https://bittrex.com/api/v1.1/public/getmarkethistory?market=BTC-ETC")
json_data = response.read().decode('utf-8', 'replace')

d = json.loads(json_data)
df = pd.json_normalize(d['result'])
df['TimeStamp'] = pd.to_datetime(df['TimeStamp'])
df = df.set_index('TimeStamp')

print (df.head())
                          Quantity     Total  
TimeStamp                                     
2017-10-05 06:05:06.510   3.579201  0.010000  
2017-10-05 06:04:34.060  45.614760  0.127444  
2017-10-05 06:04:34.060   5.649898  0.015785  
2017-10-05 06:04:34.060   1.769847  0.004945  
2017-10-05 06:02:25.063   0.250000  0.000698  

Another solution:

另一种解决方案:

df = pd.read_json('https://bittrex.com/api/v1.1/public/getmarkethistory?market=BTC-ETC')
df = pd.DataFrame(df['result'].values.tolist())
df['TimeStamp'] = pd.to_datetime(df['TimeStamp'])
df = df.set_index('TimeStamp')
print (df.head())

                          Quantity     Total  
TimeStamp                                     
2017-10-05 06:11:25.100   5.620957  0.015704  
2017-10-05 06:11:11.427  22.853546  0.063851  
2017-10-05 06:10:30.600   6.999213  0.019555  
2017-10-05 06:10:29.163  20.000000  0.055878  
2017-10-05 06:10:29.163   0.806039  0.002252  

回答by Anton vBR

Another solution, based on jezrael's using requests:

另一个解决方案,基于 jezrael 的使用请求:

import requests
import pandas as pd

d = requests.get("https://bittrex.com/api/v1.1/public/getmarkethistory?market=BTC-ETC").json()
df = pd.DataFrame.from_dict(d['result'])
df['TimeStamp'] = pd.to_datetime(df['TimeStamp'])
df = df.set_index('TimeStamp')

df