Python JSON 到 Pandas DataFrame

Question

提问by pbreach

What I am trying to do is extract elevation data from a google maps API along a path specified by latitude and longitude coordinates as follows:

我想要做的是沿着由纬度和经度坐标指定的路径从谷歌地图 API 中提取高程数据，如下所示：

from urllib2 import Request, urlopen
import json

path1 = '42.974049,-81.205203|42.974298,-81.195755'
request=Request('http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false')
response = urlopen(request)
elevations = response.read()

This gives me a data that looks like this:

这给了我一个看起来像这样的数据：

elevations.splitlines()

['{',
 '   "results" : [',
 '      {',
 '         "elevation" : 243.3462677001953,',
 '         "location" : {',
 '            "lat" : 42.974049,',
 '            "lng" : -81.205203',
 '         },',
 '         "resolution" : 19.08790397644043',
 '      },',
 '      {',
 '         "elevation" : 244.1318664550781,',
 '         "location" : {',
 '            "lat" : 42.974298,',
 '            "lng" : -81.19575500000001',
 '         },',
 '         "resolution" : 19.08790397644043',
 '      }',
 '   ],',
 '   "status" : "OK"',
 '}']

when putting into as DataFrame here is what I get:

当作为 DataFrame 放入这里时，我得到的是：

enter image description here

在此处输入图片说明

pd.read_json(elevations)

and here is what I want:

这是我想要的：

enter image description here

在此处输入图片说明

I'm not sure if this is possible, but mainly what I am looking for is a way to be able to put the elevation, latitude and longitude data together in a pandas dataframe (doesn't have to have fancy mutiline headers).

我不确定这是否可行，但主要是我正在寻找一种能够将高程、纬度和经度数据放在 Pandas 数据框中的方法（不必有花哨的多行标题）。

If any one can help or give some advice on working with this data that would be great! If you can't tell I haven't worked much with json data before...

如果有人可以帮助或提供一些关于处理这些数据的建议，那就太好了！如果你不知道我以前没有用过 json 数据...

EDIT:

编辑：

This method isn't all that attractive but seems to work:

这种方法并不是那么吸引人，但似乎有效：

data = json.loads(elevations)
lat,lng,el = [],[],[]
for result in data['results']:
    lat.append(result[u'location'][u'lat'])
    lng.append(result[u'location'][u'lng'])
    el.append(result[u'elevation'])
df = pd.DataFrame([lat,lng,el]).T

ends up dataframe having columns latitude, longitude, elevation

最终数据框具有纬度、经度、海拔等列

enter image description here

在此处输入图片说明

Answer 1

采纳答案by pbreach

I found a quick and easy solution to what I wanted using json_normalize()included in pandas 0.13.

我找到了一个快速简便的解决方案来解决我想要使用json_normalize()的pandas 0.13.

from urllib2 import Request, urlopen
import json
from pandas.io.json import json_normalize

path1 = '42.974049,-81.205203|42.974298,-81.195755'
request=Request('http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false')
response = urlopen(request)
elevations = response.read()
data = json.loads(elevations)
json_normalize(data['results'])

This gives a nice flattened dataframe with the json data that I got from the Google Maps API.

这提供了一个很好的扁平数据框，其中包含我从 Google Maps API 获得的 json 数据。

Answer 2

回答by Rapha?l Braud

You could first import your json data in a Python dictionnary :

您可以先在 Python 字典中导入 json 数据：

data = json.loads(elevations)

Then modify data on the fly :

然后动态修改数据：

for result in data['results']:
    result[u'lat']=result[u'location'][u'lat']
    result[u'lng']=result[u'location'][u'lng']
    del result[u'location']

Rebuild json string :

重建 json 字符串：

elevations = json.dumps(data)

Finally :

最后：

pd.read_json(elevations)

You can, also, probably avoid to dump data back to a string, I assume Panda can directly create a DataFrame from a dictionnary (I haven't used it since a long time :p)

您也可以避免将数据转储回字符串，我假设 Panda 可以直接从字典创建 DataFrame（我很久没用过它了：p）

Answer 3

回答by billmanH

The problem is that you have several columns in the data frame that contain dicts with smaller dicts inside them. Useful Json is often heavily nested. I have been writing small functions that pull the info I want out into a new column. That way I have it in the format that I want to use.

问题是数据框中有几列包含其中包含较小字典的字典。有用的 Json 通常是大量嵌套的。我一直在编写一些小函数，将我想要的信息提取到一个新列中。这样我就有了我想要使用的格式。

for row in range(len(data)):
    #First I load the dict (one at a time)
    n = data.loc[row,'dict_column']
    #Now I make a new column that pulls out the data that I want.
    data.loc[row,'new_column'] = n.get('key')

Answer 4

回答by Rishu

Check this snip out.

检查这个片段。

# reading the JSON data using json.load()
file = 'data.json'
with open(file) as train_file:
    dict_train = json.load(train_file)

# converting json dataset from dictionary to dataframe
train = pd.DataFrame.from_dict(dict_train, orient='index')
train.reset_index(level=0, inplace=True)

Hope it helps :)

希望能帮助到你：）

Answer 5

回答by niltoid

billmanH's solution helped me but didn't work until i switched from:

billmanH 的解决方案帮助了我，但直到我从以下位置切换后才起作用：

n = data.loc[row,'json_column']

to:

到：

n = data.iloc[[row]]['json_column']

here's the rest of it, converting to a dictionary is helpful for working with json data.

这是它的其余部分，转换为字典有助于处理 json 数据。

import json

for row in range(len(data)):
    n = data.iloc[[row]]['json_column'].item()
    jsonDict = json.loads(n)
    if ('mykey' in jsonDict):
        display(jsonDict['mykey'])

Answer 6

回答by AB Abhi

Just a new version of the accepted answer, as python3.xdoes not support urllib2

只是接受答案的新版本，因为python3.x不支持urllib2

from requests import request
import json
from pandas.io.json import json_normalize

path1 = '42.974049,-81.205203|42.974298,-81.195755'
response=request(url='http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false', method='get')
elevations = response.json()
elevations
data = json.loads(elevations)
json_normalize(data['results'])

Answer 7

回答by MIKHIL NAGARALE

#Use the small trick to make the data json interpret-able
#Since your data is not directly interpreted by json.loads()

>>> import json
>>> f=open("sampledata.txt","r+")
>>> data = f.read()
>>> for x in data.split("\n"):
...     strlist = "["+x+"]"
...     datalist=json.loads(strlist)
...     for y in datalist:
...             print(type(y))
...             print(y)
...
...
<type 'dict'>
{u'0': [[10.8, 36.0], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'1': [[10.8, 36.1], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'2': [[10.8, 36.2], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'3': [[10.8, 36.300000000000004], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'4': [[10.8, 36.4], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'5': [[10.8, 36.5], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'6': [[10.8, 36.6], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'7': [[10.8, 36.7], {u'10': 0, u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'8': [[10.8, 36.800000000000004], {u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}
<type 'dict'>
{u'9': [[10.8, 36.9], {u'1': 0, u'0': 0, u'3': 0, u'2': 0, u'5': 0, u'4': 0, u'7': 0, u'6': 0, u'9': 0, u'8': 0}]}

Answer 8

回答by Siva

Here is small utility class that converts JSON to DataFrame and back: Hope you find this helpful.

这是将 JSON 转换为 DataFrame 并返回的小型实用程序类：希望您觉得这有帮助。

# -*- coding: utf-8 -*-
from pandas.io.json import json_normalize

class DFConverter:

    #Converts the input JSON to a DataFrame
    def convertToDF(self,dfJSON):
        return(json_normalize(dfJSON))

    #Converts the input DataFrame to JSON 
    def convertToJSON(self, df):
        resultJSON = df.to_json(orient='records')
        return(resultJSON)

Answer 9

回答by loganbvh

Once you have the flattened DataFrameobtained by the accepted answer, you can make the columns a MultiIndex("fancy multiline header") like this:

一旦您DataFrame通过接受的答案获得扁平化，您可以MultiIndex像这样使列成为（“花式多行标题”）：

df.columns = pd.MultiIndex.from_tuples([tuple(c.split('.')) for c in df.columns])

Answer 10

回答by DisabledWhale

Optimization of the accepted answer:

已接受答案的优化：

The accepted answer has some functioning problems, so I want to share my code that does not rely on urllib2:

接受的答案有一些功能问题，所以我想分享我不依赖 urllib2 的代码：

import requests
from pandas.io.json import json_normalize
url = 'https://api.energidataservice.dk/resource_id=nordpoolmarket&limit=1000'

r = requests.get(url)
dictr = r.json()
recs = dictr['result']['records']
df = json_normalize(recs)
print(df)

Output:

输出：

        _id                    HourUTC               HourDK  ... ElbasAveragePriceEUR  ElbasMaxPriceEUR  ElbasMinPriceEUR
0    264028  2019-01-01T00:00:00+00:00  2019-01-01T01:00:00  ...                  NaN               NaN               NaN
1    138428  2017-09-03T15:00:00+00:00  2017-09-03T17:00:00  ...                33.28              33.4              32.0
2    138429  2017-09-03T16:00:00+00:00  2017-09-03T18:00:00  ...                35.20              35.7              34.9
3    138430  2017-09-03T17:00:00+00:00  2017-09-03T19:00:00  ...                37.50              37.8              37.3
4    138431  2017-09-03T18:00:00+00:00  2017-09-03T20:00:00  ...                39.65              42.9              35.3
..      ...                        ...                  ...  ...                  ...               ...               ...
995  139290  2017-10-09T13:00:00+00:00  2017-10-09T15:00:00  ...                38.40              38.4              38.4
996  139291  2017-10-09T14:00:00+00:00  2017-10-09T16:00:00  ...                41.90              44.3              33.9
997  139292  2017-10-09T15:00:00+00:00  2017-10-09T17:00:00  ...                46.26              49.5              41.4
998  139293  2017-10-09T16:00:00+00:00  2017-10-09T18:00:00  ...                56.22              58.5              49.1
999  139294  2017-10-09T17:00:00+00:00  2017-10-09T19:00:00  ...                56.71              65.4              42.2

PS: API is for Danish electricity prices

PS：API 是针对丹麦电价的

Python JSON 到 Pandas DataFrame

提问by pbreach

采纳答案by pbreach

回答by Rapha?l Braud

回答by billmanH

回答by Rishu

回答by niltoid

回答by AB Abhi

回答by MIKHIL NAGARALE

回答by Siva

回答by loganbvh

回答by DisabledWhale

相关推荐

最近更新

标签

Python JSON 到 Pandas DataFrame

提问by pbreach

采纳答案by pbreach

回答by Rapha?l Braud

回答by billmanH

回答by Rishu

回答by niltoid

回答by AB Abhi

回答by MIKHIL NAGARALE

回答by Siva

回答by loganbvh

回答by DisabledWhale

相关推荐

被python文件模式“w+”弄糊涂了

Python 使用 apache 和 mod_wsgi 的 Flask hello world 仅显示 webroot 中的文件

Python 全局变量和类功能

Python sys.stdin.readline() 在没有提示的情况下读取，返回“中间没有任何内容”

相关推荐

最近更新

标签