pandas Python 中的字典分组和聚合列表

Question

提问by user40721

I have a list of dictionaries which I need to aggregate in Python:

我有一个需要在 Python 中聚合的字典列表：

data = [{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 10}, 
{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 50}, 
{"startDate": 456, "endDate": 789, "campaignName": "def", "campaignCfid": 123, "budgetImpressions": 80}]

and I'm looking to aggregate based on budgetImpressions.

我希望根据budgetImpressions 进行汇总。

So the final result should be:

所以最终的结果应该是：

data = [{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 60}, 
{"startDate": 456, "endDate": 789, "campaignName": "def", "campaignCfid": 123, "budgetImpressions": 80}]

Note every entry with a certain campaignName will always have the same corresponding campaignCfid, startDate and endDate.

请注意，具有特定活动名称的每个条目将始终具有相同的对应活动 Cfid、开始日期和结束日期。

Can this be done in Python? I've tried using itertools without much success. Would it be a better approach to use Pandas?

这可以在 Python 中完成吗？我试过使用 itertools 没有太大成功。使用 Pandas 会是更好的方法吗？

Answer 1

采纳答案by bananafish

Yes, use pandas. It's great. You can use the groupbyfunctionality and aggregate by sums, then convert the output to a list of dicts if that is exactly what you want.

是的，使用Pandas。这很棒。您可以使用该groupby功能并按总和进行聚合，然后将输出转换为字典列表（如果这正是您想要的）。

import pandas as pd

data = [{"startDate": 123, "endDate": 456, "campaignName": 'abc',
         "campaignCfid": 789, "budgetImpressions": 10},
        {"startDate": 123, "endDate": 456, "campaignName": 'abc',
         "campaignCfid": 789, "budgetImpressions": 50},
        {"startDate": 456, "endDate": 789, "campaignName": 'def',
         "campaignCfid": 123, "budgetImpressions": 80}]

df = pd.DataFrame(data)

grouped = df.groupby(['startDate', 'endDate', 'campaignCfid',
                      'campaignName']).agg(sum)

print grouped.reset_index().to_dict('records')

This prints:

这打印：

[{'startDate': 123L, 'campaignCfid': 789L, 'endDate': 456L, 'budgetImpressions': 60L, 'campaignName': 'abc'}, {'startDate': 456L, 'campaignCfid': 123L, 'endDate': 789L, 'budgetImpressions': 80L, 'campaignName': 'def'}]

Answer 2

回答by Andy Hayden

Just to demonstrate that sometimes python is perfectly fine to do this kind of stuff in:

只是为了证明有时python非常适合在以下情况下执行此类操作：

In [11]: from collections import Counter
         from itertools import groupby

In [12]: data = [{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 10}, {"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 50}, {"startDate": 456, "endDate": 789, "campaignName": "def", "campaignCfid": 123, "budgetImpressions": 80}]

In [13]: g = groupby(data, lambda x: x.pop('campaignName'))

In [14]: d = {}
         for campaign, campaign_data in g:
             c = Counter()
             for row in campaign_data: c.update(row)
             d[campaign] = c  # if you want a dict rather than Counter, return dict(c) here

In [15]: d
Out[15]:
{'abc': Counter({'campaignCfid': 1578, 'endDate': 912, 'startDate': 246, 'budgetImpressions': 60}),
 'def': Counter({'endDate': 789, 'startDate': 456, 'campaignCfid': 123, 'budgetImpressions': 80})}

If you already have this collection of lists/dicts, it doesn't really make sense to promote this to a DataFrame, it's often cheaper to stay in pure python.

如果您已经拥有这个列表/字典集合，那么将其提升到 DataFrame 并没有什么意义，留在纯 python 中通常更便宜。

pandas Python 中的字典分组和聚合列表

提问by user40721

采纳答案by bananafish

回答by Andy Hayden

相关推荐

最近更新

标签

pandas Python 中的字典分组和聚合列表

提问by user40721

采纳答案by bananafish

回答by Andy Hayden

相关推荐

pandas Python - 熊猫 - 将系列附加到空白数据帧中

无法在 Pandas 0.14.0 中查询局部变量

在 Pandas 的 groupby 中排名

Pandas DataFrame 将多种类型转换为列

相关推荐

最近更新

标签