pandas 从熊猫数据帧格式化json

Question

提问by JonnyD

I'm trying to build out a JSON file from my dataframe that looks similar to this:

我正在尝试从我的数据框中构建一个类似于以下内容的 JSON 文件：

{'249' : [
          {'candidateId': 751,
           'votes':7528,
           'vote_pct':0.132
          },
          {'candidateId': 803,
           'votes':7771,
           'vote_pct':0.138
          }...
          ],
'274': [
         {'candidateId': 891,
         ....

My dataframe looks like this:

我的数据框如下所示：

         officeId  candidateId    votes  vote_pct
0        249          751         7528  0.132198
1        249          803         7771  0.136465
2        249          818         7569  0.132918
3        249          827         9089  0.159610
4        249          856         2271  0.039881
5        249          877         7491  0.131548
6        249          878         8758  0.153798 
7        249          895         6267  0.110054
8        249         1161          201  0.003530
9        274          736         4664  0.073833
10       274          737         6270  0.099256
11       274          757         4953  0.078407
12       274          769         5239  0.082935
13       274          770         7134  0.112933
14       274          783         7673  0.121466
15       274          862         6361  0.100697
16       274          901         7671  0.121434

Using a function I can flip the dataframe's index and return it as a JSON string for each office ID, like this:

使用一个函数，我可以翻转数据帧的索引并将其作为每个办公室 ID 的 JSON 字符串返回，如下所示：

def clean_results(votes):
    #trying to get a well structured json file
    return votes.reset_index().to_json(orient='index', double_precision=2)

res_json = results.groupby(['officeId']).apply(clean_results)

But when I do that I end up with a new dataframe, with a JSON object for each officeID, and the JSON uses the numbered index as the top level, like so:

但是当我这样做时，我最终会得到一个新的数据框，每个 officeID 都有一个 JSON 对象，JSON 使用编号索引作为顶层，如下所示：

{"0":{"index":0.0,"officeId":249.0,"candidateId":751.0,"total_votes":7528.0,"vote_pct":0.13},"1":{"index":1.0,"officeId":249.0,"candidateId":803.0,"total_votes":7771.0,"vote_pct":0.14},"2":...

Answer 1

采纳答案by chrisb

This is one approach, there may be something cleaner.

这是一种方法，可能有更清洁的方法。

results = {}
for key, df_gb in df.groupby('officeId'):
    results[str(key)] = df_gb.to_dict('records')


import json
print json.dumps(results, indent=4)
####
{
    "274": [
        {
            "votes": 4664.0, 
            "candidateId": 736.0, 
            "vote_pct": 0.07383300000000001, 
            "officeId": 274.0
        }, 
        {
            "votes": 6270.0, 
            "candidateId": 737.0, 
            "vote_pct": 0.099255999999999997, 
            "officeId": 274.0
 ......

pandas 从熊猫数据帧格式化json

提问by JonnyD

采纳答案by chrisb

相关推荐

最近更新

标签

pandas 从熊猫数据帧格式化json

提问by JonnyD

采纳答案by chrisb

相关推荐

为什么 Pandas 默认遍历 DataFrame 列？

Pandas dataframe groupby 计算总体标准差

为什么我得到只有一列与系列的 Pandas 数据框？

pandas 中日期时间索引的算术运算

相关推荐

最近更新

标签