分组并聚合 Python 中字典列表的值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18066269/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Group by and aggregate the values of a list of dictionaries in Python
提问by Kyle Getrost
I'm trying to write a function, in an elegant way, that will group a list of dictionaries and aggregate (sum) the values of like-keys.
我正在尝试以一种优雅的方式编写一个函数,它将对字典列表进行分组并聚合(求和)like-keys 的值。
Example:
例子:
my_dataset = [
{
'date': datetime.date(2013, 1, 1),
'id': 99,
'value1': 10,
'value2': 10
},
{
'date': datetime.date(2013, 1, 1),
'id': 98,
'value1': 10,
'value2': 10
},
{
'date': datetime.date(2013, 1, 2),
'id' 99,
'value1': 10,
'value2': 10
}
]
group_and_sum_dataset(my_dataset, 'date', ['value1', 'value2'])
"""
Should return:
[
{
'date': datetime.date(2013, 1, 1),
'value1': 20,
'value2': 20
},
{
'date': datetime.date(2013, 1, 2),
'value1': 10,
'value2': 10
}
]
"""
I've tried doing this using itertools for the groupby and summing each like-key value pair, but am missing something here. Here's what my function currently looks like:
我已经尝试使用 itertools 为 groupby 执行此操作并对每个类似键值对求和,但我在这里遗漏了一些东西。这是我的功能目前的样子:
def group_and_sum_dataset(dataset, group_by_key, sum_value_keys):
keyfunc = operator.itemgetter(group_by_key)
dataset.sort(key=keyfunc)
new_dataset = []
for key, index in itertools.groupby(dataset, keyfunc):
d = {group_by_key: key}
d.update({k:sum([item[k] for item in index]) for k in sum_value_keys})
new_dataset.append(d)
return new_dataset
采纳答案by Ashwini Chaudhary
You can use collections.Counter
and collections.defaultdict
.
您可以使用collections.Counter
和collections.defaultdict
。
Using a dict this can be done in O(N)
, while sorting requires O(NlogN)
time.
使用 dict 这可以在 中完成O(N)
,而排序需要O(NlogN)
时间。
from collections import defaultdict, Counter
def solve(dataset, group_by_key, sum_value_keys):
dic = defaultdict(Counter)
for item in dataset:
key = item[group_by_key]
vals = {k:item[k] for k in sum_value_keys}
dic[key].update(vals)
return dic
...
>>> d = solve(my_dataset, 'date', ['value1', 'value2'])
>>> d
defaultdict(<class 'collections.Counter'>,
{
datetime.date(2013, 1, 2): Counter({'value2': 10, 'value1': 10}),
datetime.date(2013, 1, 1): Counter({'value2': 20, 'value1': 20})
})
The advantage of Counter
is that it'll automatically sum the values of similar keys.:
的优点Counter
是它会自动对相似键的值求和。:
Example:
例子:
>>> c = Counter(**{'value1': 10, 'value2': 5})
>>> c.update({'value1': 7, 'value2': 3})
>>> c
Counter({'value1': 17, 'value2': 8})
回答by Kyle Getrost
Thanks, I forgot about Counter. I still wanted to maintain the output format and sorting of my returned dataset, so here's what my final function looks like:
谢谢,我忘记了计数器。我仍然想保持我返回的数据集的输出格式和排序,所以我的最终函数如下所示:
def group_and_sum_dataset(dataset, group_by_key, sum_value_keys):
container = defaultdict(Counter)
for item in dataset:
key = item[group_by_key]
values = {k:item[k] for k in sum_value_keys}
container[key].update(values)
new_dataset = [
dict([(group_by_key, item[0])] + item[1].items())
for item in container.items()
]
new_dataset.sort(key=lambda item: item[group_by_key])
return new_dataset
回答by pylang
Here's an approach using more_itertools
where you simply focus on how to construct output.
这是一种使用方法more_itertools
,您只需专注于如何构建输出。
Given
给定的
import datetime
import collections as ct
import more_itertools as mit
dataset = [
{"date": datetime.date(2013, 1, 1), "id": 99, "value1": 10, "value2": 10},
{"date": datetime.date(2013, 1, 1), "id": 98, "value1": 10, "value2": 10},
{"date": datetime.date(2013, 1, 2), "id": 99, "value1": 10, "value2": 10}
]
Code
代码
# Step 1: Build helper functions
kfunc = lambda d: d["date"]
vfunc = lambda d: {k:v for k, v in d.items() if k.startswith("val")}
rfunc = lambda lst: sum((ct.Counter(d) for d in lst), ct.Counter())
# Step 2: Build a dict
reduced = mit.map_reduce(dataset, keyfunc=kfunc, valuefunc=vfunc, reducefunc=rfunc)
reduced
Output
输出
defaultdict(None,
{datetime.date(2013, 1, 1): Counter({'value1': 20, 'value2': 20}),
datetime.date(2013, 1, 2): Counter({'value1': 10, 'value2': 10})})
The items are grouped by date and pertinent values are reduced as Counters
.
项目按日期分组,相关值减少为Counters
。
Details
细节
Steps
脚步
- build helper functions to customize construction of keys, valuesand reducedvalues in the final
defaultdict
. Here we want to:- group by date (
kfunc
) - built dicts keeping the "value*" parameters (
vfunc
) - aggregate the dicts (
rfunc
) by converting tocollections.Counters
and summing them. See an equivalentrfunc
below+.
- group by date (
- pass in the helper functions to
more_itertools.map_reduce
.
- 构建辅助函数以自定义最终的键、值和减少值的构造
defaultdict
。在这里,我们想:- 按日期分组 (
kfunc
) - 内置字典保留“值*”参数(
vfunc
) rfunc
通过转换collections.Counters
和求和来聚合 dicts( ) 。请参阅rfunc
下面的等效项+。
- 按日期分组 (
- 将辅助函数传递给
more_itertools.map_reduce
.
Simple Groupby
简单分组
... say in that example you wanted to group by id and date?
...在那个例子中说你想按 id 和 date 分组?
No problem.
没问题。
>>> kfunc2 = lambda d: (d["date"], d["id"])
>>> mit.map_reduce(dataset, keyfunc=kfunc2, valuefunc=vfunc, reducefunc=rfunc)
defaultdict(None,
{(datetime.date(2013, 1, 1),
99): Counter({'value1': 10, 'value2': 10}),
(datetime.date(2013, 1, 1),
98): Counter({'value1': 10, 'value2': 10}),
(datetime.date(2013, 1, 2),
99): Counter({'value1': 10, 'value2': 10})})
Customized Output
定制输出
While the resulting data structure clearly and concisely presents the outcome, the OP's expected output can be rebuilt as a simple list of dicts:
虽然生成的数据结构清晰简洁地呈现了结果,但可以将 OP 的预期输出重建为一个简单的 dicts 列表:
>>> [{**dict(date=k), **v} for k, v in reduced.items()]
[{'date': datetime.date(2013, 1, 1), 'value1': 20, 'value2': 20},
{'date': datetime.date(2013, 1, 2), 'value1': 10, 'value2': 10}]
For more on map_reduce
, see the docs. Install via > pip install more_itertools
.
有关更多信息map_reduce
,请参阅文档。通过> pip install more_itertools
.
+An equivalent reducing function:
+等效的归约函数:
def rfunc(lst: typing.List[dict]) -> ct.Counter:
"""Return reduced mappings from map-reduce values."""
c = ct.Counter()
for d in lst:
c += ct.Counter(d)
return c