pandas 从熊猫数据帧格式化json
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26180523/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Formatting json from pandas dataframe
提问by JonnyD
I'm trying to build out a JSON file from my dataframe that looks similar to this:
我正在尝试从我的数据框中构建一个类似于以下内容的 JSON 文件:
{'249' : [
{'candidateId': 751,
'votes':7528,
'vote_pct':0.132
},
{'candidateId': 803,
'votes':7771,
'vote_pct':0.138
}...
],
'274': [
{'candidateId': 891,
....
My dataframe looks like this:
我的数据框如下所示:
officeId candidateId votes vote_pct
0 249 751 7528 0.132198
1 249 803 7771 0.136465
2 249 818 7569 0.132918
3 249 827 9089 0.159610
4 249 856 2271 0.039881
5 249 877 7491 0.131548
6 249 878 8758 0.153798
7 249 895 6267 0.110054
8 249 1161 201 0.003530
9 274 736 4664 0.073833
10 274 737 6270 0.099256
11 274 757 4953 0.078407
12 274 769 5239 0.082935
13 274 770 7134 0.112933
14 274 783 7673 0.121466
15 274 862 6361 0.100697
16 274 901 7671 0.121434
Using a function I can flip the dataframe's index and return it as a JSON string for each office ID, like this:
使用一个函数,我可以翻转数据帧的索引并将其作为每个办公室 ID 的 JSON 字符串返回,如下所示:
def clean_results(votes):
#trying to get a well structured json file
return votes.reset_index().to_json(orient='index', double_precision=2)
res_json = results.groupby(['officeId']).apply(clean_results)
But when I do that I end up with a new dataframe, with a JSON object for each officeID, and the JSON uses the numbered index as the top level, like so:
但是当我这样做时,我最终会得到一个新的数据框,每个 officeID 都有一个 JSON 对象,JSON 使用编号索引作为顶层,如下所示:
{"0":{"index":0.0,"officeId":249.0,"candidateId":751.0,"total_votes":7528.0,"vote_pct":0.13},"1":{"index":1.0,"officeId":249.0,"candidateId":803.0,"total_votes":7771.0,"vote_pct":0.14},"2":...
采纳答案by chrisb
This is one approach, there may be something cleaner.
这是一种方法,可能有更清洁的方法。
results = {}
for key, df_gb in df.groupby('officeId'):
results[str(key)] = df_gb.to_dict('records')
import json
print json.dumps(results, indent=4)
####
{
"274": [
{
"votes": 4664.0,
"candidateId": 736.0,
"vote_pct": 0.07383300000000001,
"officeId": 274.0
},
{
"votes": 6270.0,
"candidateId": 737.0,
"vote_pct": 0.099255999999999997,
"officeId": 274.0
......

