分组并聚合 Python 中字典列表的值

Question

提问by Kyle Getrost

I'm trying to write a function, in an elegant way, that will group a list of dictionaries and aggregate (sum) the values of like-keys.

我正在尝试以一种优雅的方式编写一个函数，它将对字典列表进行分组并聚合（求和）like-keys 的值。

Example:

例子：

my_dataset = [  
    {
        'date': datetime.date(2013, 1, 1),
        'id': 99,
        'value1': 10,
        'value2': 10
    },
    {
        'date': datetime.date(2013, 1, 1),
        'id': 98,
        'value1': 10,
        'value2': 10
    },
    {
        'date': datetime.date(2013, 1, 2),
        'id' 99,
        'value1': 10,
        'value2': 10
    }
]

group_and_sum_dataset(my_dataset, 'date', ['value1', 'value2'])

"""
Should return:
[
    {
        'date': datetime.date(2013, 1, 1),
        'value1': 20,
        'value2': 20
    },
    {
        'date': datetime.date(2013, 1, 2),
        'value1': 10,
        'value2': 10
    }
]
"""

I've tried doing this using itertools for the groupby and summing each like-key value pair, but am missing something here. Here's what my function currently looks like:

我已经尝试使用 itertools 为 groupby 执行此操作并对每个类似键值对求和，但我在这里遗漏了一些东西。这是我的功能目前的样子：

def group_and_sum_dataset(dataset, group_by_key, sum_value_keys):
    keyfunc = operator.itemgetter(group_by_key)
    dataset.sort(key=keyfunc)
    new_dataset = []
    for key, index in itertools.groupby(dataset, keyfunc):
        d = {group_by_key: key}
        d.update({k:sum([item[k] for item in index]) for k in sum_value_keys})
        new_dataset.append(d)
    return new_dataset

Answer 1

采纳答案by Ashwini Chaudhary

You can use collections.Counterand collections.defaultdict.

您可以使用collections.Counter和collections.defaultdict。

Using a dict this can be done in O(N), while sorting requires O(NlogN)time.

使用 dict 这可以在中完成O(N)，而排序需要O(NlogN)时间。

from collections import defaultdict, Counter
def solve(dataset, group_by_key, sum_value_keys):
    dic = defaultdict(Counter)
    for item in dataset:
        key = item[group_by_key]
        vals = {k:item[k] for k in sum_value_keys}
        dic[key].update(vals)
    return dic
... 
>>> d = solve(my_dataset, 'date', ['value1', 'value2'])
>>> d
defaultdict(<class 'collections.Counter'>,
{
 datetime.date(2013, 1, 2): Counter({'value2': 10, 'value1': 10}),
 datetime.date(2013, 1, 1): Counter({'value2': 20, 'value1': 20})
})

The advantage of Counteris that it'll automatically sum the values of similar keys.:

的优点Counter是它会自动对相似键的值求和。：

Example:

例子：

>>> c = Counter(**{'value1': 10, 'value2': 5})
>>> c.update({'value1': 7, 'value2': 3})
>>> c
Counter({'value1': 17, 'value2': 8})

Answer 2

回答by Kyle Getrost

Thanks, I forgot about Counter. I still wanted to maintain the output format and sorting of my returned dataset, so here's what my final function looks like:

谢谢，我忘记了计数器。我仍然想保持我返回的数据集的输出格式和排序，所以我的最终函数如下所示：

def group_and_sum_dataset(dataset, group_by_key, sum_value_keys):

    container = defaultdict(Counter)

    for item in dataset:
        key = item[group_by_key]
        values = {k:item[k] for k in sum_value_keys}
        container[key].update(values)

    new_dataset = [
        dict([(group_by_key, item[0])] + item[1].items())
            for item in container.items()
    ]
    new_dataset.sort(key=lambda item: item[group_by_key])

    return new_dataset

Answer 3

回答by pylang

Here's an approach using more_itertoolswhere you simply focus on how to construct output.

这是一种使用方法more_itertools，您只需专注于如何构建输出。

Given

给定的

import datetime
import collections as ct

import more_itertools as mit


dataset = [
    {"date": datetime.date(2013, 1, 1), "id": 99, "value1": 10, "value2": 10},
    {"date": datetime.date(2013, 1, 1), "id": 98, "value1": 10, "value2": 10},
    {"date": datetime.date(2013, 1, 2), "id": 99, "value1": 10, "value2": 10}
]

Code

代码

# Step 1: Build helper functions    
kfunc = lambda d: d["date"]
vfunc = lambda d: {k:v for k, v in d.items() if k.startswith("val")}
rfunc = lambda lst: sum((ct.Counter(d) for d in lst), ct.Counter())

# Step 2: Build a dict    
reduced = mit.map_reduce(dataset, keyfunc=kfunc, valuefunc=vfunc, reducefunc=rfunc)
reduced

Output

输出

defaultdict(None,
            {datetime.date(2013, 1, 1): Counter({'value1': 20, 'value2': 20}),
             datetime.date(2013, 1, 2): Counter({'value1': 10, 'value2': 10})})

The items are grouped by date and pertinent values are reduced as Counters.

项目按日期分组，相关值减少为Counters。

Details

细节

Steps

脚步

build helper functions to customize construction of keys, valuesand reducedvalues in the final defaultdict. Here we want to:
- group by date (kfunc)
- built dicts keeping the "value*" parameters (vfunc)
- aggregate the dicts (rfunc) by converting to collections.Countersand summing them. See an equivalent rfuncbelow⁺.
pass in the helper functions to more_itertools.map_reduce.

构建辅助函数以自定义最终的键、值和减少值的构造defaultdict。在这里，我们想：
- 按日期分组 ( kfunc)
- 内置字典保留“值*”参数（vfunc）
- rfunc通过转换collections.Counters和求和来聚合 dicts( ) 。请参阅rfunc下面的等效项⁺。
将辅助函数传递给more_itertools.map_reduce.

Simple Groupby

简单分组

... say in that example you wanted to group by id and date?

...在那个例子中说你想按 id 和 date 分组？

No problem.

没问题。

>>> kfunc2 = lambda d: (d["date"], d["id"])
>>> mit.map_reduce(dataset, keyfunc=kfunc2, valuefunc=vfunc, reducefunc=rfunc)
defaultdict(None,
            {(datetime.date(2013, 1, 1),
              99): Counter({'value1': 10, 'value2': 10}),
             (datetime.date(2013, 1, 1),
              98): Counter({'value1': 10, 'value2': 10}),
             (datetime.date(2013, 1, 2),
              99): Counter({'value1': 10, 'value2': 10})})

Customized Output

定制输出

While the resulting data structure clearly and concisely presents the outcome, the OP's expected output can be rebuilt as a simple list of dicts:

虽然生成的数据结构清晰简洁地呈现了结果，但可以将 OP 的预期输出重建为一个简单的 dicts 列表：

>>> [{**dict(date=k), **v} for k, v in reduced.items()]
[{'date': datetime.date(2013, 1, 1), 'value1': 20, 'value2': 20},
 {'date': datetime.date(2013, 1, 2), 'value1': 10, 'value2': 10}]

For more on map_reduce, see the docs. Install via > pip install more_itertools.

有关更多信息map_reduce，请参阅文档。通过> pip install more_itertools.

⁺An equivalent reducing function:

⁺等效的归约函数：

def rfunc(lst: typing.List[dict]) -> ct.Counter:
    """Return reduced mappings from map-reduce values."""
    c = ct.Counter()
    for d in lst:
        c += ct.Counter(d)
    return c

分组并聚合 Python 中字典列表的值

提问by Kyle Getrost

采纳答案by Ashwini Chaudhary

回答by Kyle Getrost

回答by pylang

相关推荐

最近更新

标签

分组并聚合 Python 中字典列表的值

提问by Kyle Getrost

采纳答案by Ashwini Chaudhary

回答by Kyle Getrost

回答by pylang

相关推荐

Python sklearn 分类器获取 ValueError：输入形状错误

Python 尝试打开/写入文件时语法无效

Python 没有名为 google.protobuf 的模块

Python 如何在新的 virtualenv 上安装 PyQt5 并在 IDLE 上工作

相关推荐

最近更新

标签