Python JSON 到 Pandas DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21104592/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
JSON to pandas DataFrame
提问by pbreach
What I am trying to do is extract elevation data from a google maps API along a path specified by latitude and longitude coordinates as follows:
我想要做的是沿着由纬度和经度坐标指定的路径从谷歌地图 API 中提取高程数据,如下所示:
from urllib2 import Request, urlopen
import json
path1 = '42.974049,-81.205203|42.974298,-81.195755'
request=Request('http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false')
response = urlopen(request)
elevations = response.read()
This gives me a data that looks like this:
这给了我一个看起来像这样的数据:
elevations.splitlines()
['{',
' "results" : [',
' {',
' "elevation" : 243.3462677001953,',
' "location" : {',
' "lat" : 42.974049,',
' "lng" : -81.205203',
' },',
' "resolution" : 19.08790397644043',
' },',
' {',
' "elevation" : 244.1318664550781,',
' "location" : {',
' "lat" : 42.974298,',
' "lng" : -81.19575500000001',
' },',
' "resolution" : 19.08790397644043',
' }',
' ],',
' "status" : "OK"',
'}']
when putting into as DataFrame here is what I get:
当作为 DataFrame 放入这里时,我得到的是:


pd.read_json(elevations)
and here is what I want:
这是我想要的:


I'm not sure if this is possible, but mainly what I am looking for is a way to be able to put the elevation, latitude and longitude data together in a pandas dataframe (doesn't have to have fancy mutiline headers).
我不确定这是否可行,但主要是我正在寻找一种能够将高程、纬度和经度数据放在 Pandas 数据框中的方法(不必有花哨的多行标题)。
If any one can help or give some advice on working with this data that would be great! If you can't tell I haven't worked much with json data before...
如果有人可以帮助或提供一些关于处理这些数据的建议,那就太好了!如果你不知道我以前没有用过 json 数据...
EDIT:
编辑:
This method isn't all that attractive but seems to work:
这种方法并不是那么吸引人,但似乎有效:
data = json.loads(elevations)
lat,lng,el = [],[],[]
for result in data['results']:
lat.append(result[u'location'][u'lat'])
lng.append(result[u'location'][u'lng'])
el.append(result[u'elevation'])
df = pd.DataFrame([lat,lng,el]).T
ends up dataframe having columns latitude, longitude, elevation
最终数据框具有纬度、经度、海拔等列


采纳答案by pbreach
I found a quick and easy solution to what I wanted using json_normalize()included in pandas 0.13.
我找到了一个快速简便的解决方案来解决我想要使用json_normalize()的pandas 0.13.
from urllib2 import Request, urlopen
import json
from pandas.io.json import json_normalize
path1 = '42.974049,-81.205203|42.974298,-81.195755'
request=Request('http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false')
response = urlopen(request)
elevations = response.read()
data = json.loads(elevations)
json_normalize(data['results'])
This gives a nice flattened dataframe with the json data that I got from the Google Maps API.
这提供了一个很好的扁平数据框,其中包含我从 Google Maps API 获得的 json 数据。
回答by Rapha?l Braud
You could first import your json data in a Python dictionnary :
您可以先在 Python 字典中导入 json 数据:
data = json.loads(elevations)
Then modify data on the fly :
然后动态修改数据:
for result in data['results']:
result[u'lat']=result[u'location'][u'lat']
result[u'lng']=result[u'location'][u'lng']
del result[u'location']
Rebuild json string :
重建 json 字符串:
elevations = json.dumps(data)
Finally :
最后 :
pd.read_json(elevations)
You can, also, probably avoid to dump data back to a string, I assume Panda can directly create a DataFrame from a dictionnary (I haven't used it since a long time :p)
您也可以避免将数据转储回字符串,我假设 Panda 可以直接从字典创建 DataFrame(我很久没用过它了:p)
回答by billmanH
The problem is that you have several columns in the data frame that contain dicts with smaller dicts inside them. Useful Json is often heavily nested. I have been writing small functions that pull the info I want out into a new column. That way I have it in the format that I want to use.
问题是数据框中有几列包含其中包含较小字典的字典。有用的 Json 通常是大量嵌套的。我一直在编写一些小函数,将我想要的信息提取到一个新列中。这样我就有了我想要使用的格式。
for row in range(len(data)):
#First I load the dict (one at a time)
n = data.loc[row,'dict_column']
#Now I make a new column that pulls out the data that I want.
data.loc[row,'new_column'] = n.get('key')
回答by Rishu
Check this snip out.
检查这个片段。
# reading the JSON data using json.load()
file = 'data.json'
with open(file) as train_file:
dict_train = json.load(train_file)
# converting json dataset from dictionary to dataframe
train = pd.DataFrame.from_dict(dict_train, orient='index')
train.reset_index(level=0, inplace=True)
Hope it helps :)
希望能帮助到你 :)
回答by niltoid
billmanH's solution helped me but didn't work until i switched from:
billmanH 的解决方案帮助了我,但直到我从以下位置切换后才起作用:
n = data.loc[row,'json_column']
to:
到:
n = data.iloc[[row]]['json_column']
here's the rest of it, converting to a dictionary is helpful for working with json data.
这是它的其余部分,转换为字典有助于处理 json 数据。
import json
for row in range(len(data)):
n = data.iloc[[row]]['json_column'].item()
jsonDict = json.loads(n)
if ('mykey' in jsonDict):
display(jsonDict['mykey'])
回答by AB Abhi
Just a new version of the accepted answer, as python3.xdoes not support urllib2
只是接受答案的新版本,因为python3.x不支持urllib2
from requests import request
import json
from pandas.io.json import json_normalize
path1 = '42.974049,-81.205203|42.974298,-81.195755'
response=request(url='http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false', method='get')
elevations = response.json()
elevations
data = json.loads(elevations)
json_normalize(data['results'])
回答by MIKHIL NAGARALE
#Use the small trick to make the data json interpret-able
#Since your data is not directly interpreted by json.loads()
>>> import json
>>> f=open("sampledata.txt","r+")
>>> data = f.read()
>>> for x in data.split("\n"):
... strlist = "["+x+"]"
... datalist=json.loads(strlist)
... for y in datalist:
... print(type(y))
... print(y)
...
...
<type 'dict'>
{u'0': [[10.8, 36.0], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'1': [[10.8, 36.1], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'2': [[10.8, 36.2], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'3': [[10.8, 36.300000000000004], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'4': [[10.8, 36.4], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'5': [[10.8, 36.5], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'6': [[10.8, 36.6], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'7': [[10.8, 36.7], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'8': [[10.8, 36.800000000000004], {u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'9': [[10.8, 36.9], {u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
回答by Siva
Here is small utility class that converts JSON to DataFrame and back: Hope you find this helpful.
这是将 JSON 转换为 DataFrame 并返回的小型实用程序类:希望您觉得这有帮助。
# -*- coding: utf-8 -*-
from pandas.io.json import json_normalize
class DFConverter:
#Converts the input JSON to a DataFrame
def convertToDF(self,dfJSON):
return(json_normalize(dfJSON))
#Converts the input DataFrame to JSON
def convertToJSON(self, df):
resultJSON = df.to_json(orient='records')
return(resultJSON)
回答by loganbvh
Once you have the flattened DataFrameobtained by the accepted answer, you can make the columns a MultiIndex("fancy multiline header") like this:
一旦您DataFrame通过接受的答案获得扁平化,您可以MultiIndex像这样使列成为(“花式多行标题”):
df.columns = pd.MultiIndex.from_tuples([tuple(c.split('.')) for c in df.columns])
回答by DisabledWhale
Optimization of the accepted answer:
已接受答案的优化:
The accepted answer has some functioning problems, so I want to share my code that does not rely on urllib2:
接受的答案有一些功能问题,所以我想分享我不依赖 urllib2 的代码:
import requests
from pandas.io.json import json_normalize
url = 'https://api.energidataservice.dk/resource_id=nordpoolmarket&limit=1000'
r = requests.get(url)
dictr = r.json()
recs = dictr['result']['records']
df = json_normalize(recs)
print(df)
Output:
输出:
_id HourUTC HourDK ... ElbasAveragePriceEUR ElbasMaxPriceEUR ElbasMinPriceEUR
0 264028 2019-01-01T00:00:00+00:00 2019-01-01T01:00:00 ... NaN NaN NaN
1 138428 2017-09-03T15:00:00+00:00 2017-09-03T17:00:00 ... 33.28 33.4 32.0
2 138429 2017-09-03T16:00:00+00:00 2017-09-03T18:00:00 ... 35.20 35.7 34.9
3 138430 2017-09-03T17:00:00+00:00 2017-09-03T19:00:00 ... 37.50 37.8 37.3
4 138431 2017-09-03T18:00:00+00:00 2017-09-03T20:00:00 ... 39.65 42.9 35.3
.. ... ... ... ... ... ... ...
995 139290 2017-10-09T13:00:00+00:00 2017-10-09T15:00:00 ... 38.40 38.4 38.4
996 139291 2017-10-09T14:00:00+00:00 2017-10-09T16:00:00 ... 41.90 44.3 33.9
997 139292 2017-10-09T15:00:00+00:00 2017-10-09T17:00:00 ... 46.26 49.5 41.4
998 139293 2017-10-09T16:00:00+00:00 2017-10-09T18:00:00 ... 56.22 58.5 49.1
999 139294 2017-10-09T17:00:00+00:00 2017-10-09T19:00:00 ... 56.71 65.4 42.2
PS: API is for Danish electricity prices
PS:API 是针对丹麦电价的

