Python 如何使用聚合函数在 Pandas 的列中按相同的值“合并”行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46826773/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:51:53  来源:igfitidea点击:

How can I "merge" rows by same value in a column in Pandas with aggregation functions?

pythonpandas

提问by Martin Thoma

I would like to group rows in a dataframe, given one column. Then I would like to receive an edited dataframe for which I can decide which aggregation function makes sense. The default should be just the value of the first entry in the group.

给定一列,我想对数据框中的行进行分组。然后我想接收一个编辑过的数据帧,我可以决定哪个聚合函数有意义。默认值应该只是组中第一个条目的值。

(it would be nice if the solution also worked for a combination of two columns)

(如果该解决方案也适用于两列的组合,那就太好了)

Example

例子

#!/usr/bin/env python

"""Test data frame grouping."""

# 3rd party modules
import pandas as pd


df = pd.DataFrame([{'id': 1, 'price': 123, 'name': 'anna', 'amount': 1},
                   {'id': 1, 'price':   7, 'name': 'anna', 'amount': 2},
                   {'id': 2, 'price':  42, 'name': 'bob', 'amount': 30},
                   {'id': 3, 'price':   1, 'name': 'charlie', 'amount': 10},
                   {'id': 3, 'price':   2, 'name': 'david', 'amount': 100}])
print(df)

gives the dataframe:

给出数据框:

   amount  id     name  price
0       1   1     anna    123
1       2   1     anna      7
2      30   2      bob     42
3      10   3  charlie      1
4     100   3    david      2

And I would like to get:

我想得到:

amount  id     name  price
     3   1     anna    130
    30   2      bob     42
   110   3  charlie      3

So:

所以:

  • Entries with the same value in the idcolumn belong together. After that operation, there should still be an idcolumn, but it should have only unique values.
  • All values in amountand pricewhich have the same idget summed up
  • For name, just the first one (by the current order of the dataframe) is taken.
  • id列中具有相同值的条目属于一起。在该操作之后,仍然应该有一个id列,但它应该只有唯一的值。
  • 中的所有值amountprice具有相同的id总和
  • 对于name,仅采用第一个(按数据帧的当前顺序)。

Is this possible with Pandas?

熊猫可以做到这一点吗?

回答by Martin Thoma

You are looking for

你正在寻找

aggregation_functions = {'price': 'sum', 'amount': 'sum', 'name': 'first'}
df_new = df.groupby(df['id']).aggregate(aggregation_functions)

which gives

这使

    price     name  amount
id                        
1     130     anna       3
2      42      bob      30
3       3  charlie     110

回答by jezrael

For same columns ordering is necessary add reindex, because aggregate by dict:

对于相同的列排序是必要的 add reindex,因为聚合dict

d = {'price': 'sum', 'name': 'first', 'amount': 'sum'}
df_new = df.groupby('id', as_index=False).aggregate(d).reindex(columns=df.columns)
print (df_new)
   amount  id     name  price
0       3   1     anna    130
1      30   2      bob     42
2     110   3  charlie      3