pandas Python 中的字典分组和聚合列表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24195825/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:09:41  来源:igfitidea点击:

Group By & Aggregate List of Dictionaries in Python

pythonlistdictionarypandas

提问by user40721

I have a list of dictionaries which I need to aggregate in Python:

我有一个需要在 Python 中聚合的字典列表:

data = [{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 10}, 
{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 50}, 
{"startDate": 456, "endDate": 789, "campaignName": "def", "campaignCfid": 123, "budgetImpressions": 80}]

and I'm looking to aggregate based on budgetImpressions.

我希望根据budgetImpressions 进行汇总。

So the final result should be:

所以最终的结果应该是:

data = [{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 60}, 
{"startDate": 456, "endDate": 789, "campaignName": "def", "campaignCfid": 123, "budgetImpressions": 80}]

Note every entry with a certain campaignName will always have the same corresponding campaignCfid, startDate and endDate.

请注意,具有特定活动名称的每个条目将始终具有相同的对应活动 Cfid、开始日期和结束日期。

Can this be done in Python? I've tried using itertools without much success. Would it be a better approach to use Pandas?

这可以在 Python 中完成吗?我试过使用 itertools 没有太大成功。使用 Pandas 会是更好的方法吗?

采纳答案by bananafish

Yes, use pandas. It's great. You can use the groupbyfunctionality and aggregate by sums, then convert the output to a list of dicts if that is exactly what you want.

是的,使用Pandas。这很棒。您可以使用该groupby功能并按总和进行聚合,然后将输出转换为字典列表(如果这正是您想要的)。

import pandas as pd

data = [{"startDate": 123, "endDate": 456, "campaignName": 'abc',
         "campaignCfid": 789, "budgetImpressions": 10},
        {"startDate": 123, "endDate": 456, "campaignName": 'abc',
         "campaignCfid": 789, "budgetImpressions": 50},
        {"startDate": 456, "endDate": 789, "campaignName": 'def',
         "campaignCfid": 123, "budgetImpressions": 80}]

df = pd.DataFrame(data)

grouped = df.groupby(['startDate', 'endDate', 'campaignCfid',
                      'campaignName']).agg(sum)

print grouped.reset_index().to_dict('records')

This prints:

这打印:

[{'startDate': 123L, 'campaignCfid': 789L, 'endDate': 456L, 'budgetImpressions': 60L, 'campaignName': 'abc'}, {'startDate': 456L, 'campaignCfid': 123L, 'endDate': 789L, 'budgetImpressions': 80L, 'campaignName': 'def'}]

回答by Andy Hayden

Just to demonstrate that sometimes python is perfectly fine to do this kind of stuff in:

只是为了证明有时python非常适合在以下情况下执行此类操作:

In [11]: from collections import Counter
         from itertools import groupby

In [12]: data = [{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 10}, {"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 50}, {"startDate": 456, "endDate": 789, "campaignName": "def", "campaignCfid": 123, "budgetImpressions": 80}]

In [13]: g = groupby(data, lambda x: x.pop('campaignName'))

In [14]: d = {}
         for campaign, campaign_data in g:
             c = Counter()
             for row in campaign_data: c.update(row)
             d[campaign] = c  # if you want a dict rather than Counter, return dict(c) here

In [15]: d
Out[15]:
{'abc': Counter({'campaignCfid': 1578, 'endDate': 912, 'startDate': 246, 'budgetImpressions': 60}),
 'def': Counter({'endDate': 789, 'startDate': 456, 'campaignCfid': 123, 'budgetImpressions': 80})}

If you already have this collection of lists/dicts, it doesn't really make sense to promote this to a DataFrame, it's often cheaper to stay in pure python.

如果您已经拥有这个列表/字典集合,那么将其提升到 DataFrame 并没有什么意义,留在纯 python 中通常更便宜。