Python 如何使用聚合函数在 Pandas 的列中按相同的值“合并”行？

Question

提问by Martin Thoma

I would like to group rows in a dataframe, given one column. Then I would like to receive an edited dataframe for which I can decide which aggregation function makes sense. The default should be just the value of the first entry in the group.

给定一列，我想对数据框中的行进行分组。然后我想接收一个编辑过的数据帧，我可以决定哪个聚合函数有意义。默认值应该只是组中第一个条目的值。

(it would be nice if the solution also worked for a combination of two columns)

（如果该解决方案也适用于两列的组合，那就太好了）

Example

例子

#!/usr/bin/env python

"""Test data frame grouping."""

# 3rd party modules
import pandas as pd


df = pd.DataFrame([{'id': 1, 'price': 123, 'name': 'anna', 'amount': 1},
                   {'id': 1, 'price':   7, 'name': 'anna', 'amount': 2},
                   {'id': 2, 'price':  42, 'name': 'bob', 'amount': 30},
                   {'id': 3, 'price':   1, 'name': 'charlie', 'amount': 10},
                   {'id': 3, 'price':   2, 'name': 'david', 'amount': 100}])
print(df)

gives the dataframe:

给出数据框：

   amount  id     name  price
0       1   1     anna    123
1       2   1     anna      7
2      30   2      bob     42
3      10   3  charlie      1
4     100   3    david      2

And I would like to get:

我想得到：

amount  id     name  price
     3   1     anna    130
    30   2      bob     42
   110   3  charlie      3

So:

所以：

Entries with the same value in the idcolumn belong together. After that operation, there should still be an idcolumn, but it should have only unique values.
All values in amountand pricewhich have the same idget summed up
For name, just the first one (by the current order of the dataframe) is taken.

id列中具有相同值的条目属于一起。在该操作之后，仍然应该有一个id列，但它应该只有唯一的值。
中的所有值amount和price具有相同的id总和
对于name，仅采用第一个（按数据帧的当前顺序）。

Is this possible with Pandas?

熊猫可以做到这一点吗？

Answer 1

回答by Martin Thoma

You are looking for

你正在寻找

aggregation_functions = {'price': 'sum', 'amount': 'sum', 'name': 'first'}
df_new = df.groupby(df['id']).aggregate(aggregation_functions)

which gives

这使

    price     name  amount
id                        
1     130     anna       3
2      42      bob      30
3       3  charlie     110

Answer 2

回答by jezrael

For same columns ordering is necessary add reindex, because aggregate by dict:

对于相同的列排序是必要的 add reindex，因为聚合dict：

d = {'price': 'sum', 'name': 'first', 'amount': 'sum'}
df_new = df.groupby('id', as_index=False).aggregate(d).reindex(columns=df.columns)
print (df_new)
   amount  id     name  price
0       3   1     anna    130
1      30   2      bob     42
2     110   3  charlie      3

Python 如何使用聚合函数在 Pandas 的列中按相同的值“合并”行？

提问by Martin Thoma

Example

例子

回答by Martin Thoma

回答by jezrael

相关推荐

最近更新

标签

Python 如何使用聚合函数在 Pandas 的列中按相同的值“合并”行？

提问by Martin Thoma

Example

例子

回答by Martin Thoma

回答by jezrael

相关推荐

Python 在 Numpy 中将行向量转换为列向量

Python 使用 Flask-SQLAlchemy 连接到 MSSQL 数据库

Python SyntaxError 无效令牌

Python 在 Pandas 中为日期添加月份

相关推荐

最近更新

标签