Pandas:按两列分组以获得另一列的总和

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40553002/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:25:16  来源:igfitidea点击:

Pandas: Group by two columns to get sum of another column

pandasgroup-by

提问by add-semi-colons

I look most of the previously asked questions but was not able to find answer for my question:

我查看了之前提出的大部分问题,但无法找到我的问题的答案:

I have following data.frame

我有以下 data.frame

           id   year month score num_attempts
0      483625  2010    01   50      1
1      967799  2009    03   50      1
2      213473  2005    09  100      1
3      498110  2010    12   60      1
5      187243  2010    01  100      1
6      508311  2005    10   15      1
7      486688  2005    10   50      1
8      212550  2005    10  500      1
10     136701  2005    09   25      1
11     471651  2010    01   50      1

I want to get following data frame

我想获得以下数据框

year month sum_score sum_num_attempts
2009    03   50           1
2005    09  125           2
2010    12   60           1
2010    01  200           2
2005    10  565           3

Here is what I tried:

这是我尝试过的:

sum_df = df.groupby(by=['year','month'])['score'].sum()

But this doesn't look efficient and correct. If I have more than one column need to be aggregate this seems like a very expensive call. for example if I have another column num_attemptsand just want to sum by year month as score.

但这看起来并不高效和正确。如果我有多个列需要聚合,这似乎是一个非常昂贵的调用。例如,如果我有另一列num_attempts并且只想按年月求和作为分数。

回答by Dennis Golomazov

This should be an efficient way:

这应该是一种有效的方法:

sum_df = df.groupby(['year','month']).agg({'score': 'sum', 'num_attempts': 'sum'})