Pandas:按两列分组以获得另一列的总和
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40553002/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: Group by two columns to get sum of another column
提问by add-semi-colons
I look most of the previously asked questions but was not able to find answer for my question:
我查看了之前提出的大部分问题,但无法找到我的问题的答案:
I have following data.frame
我有以下 data.frame
id year month score num_attempts
0 483625 2010 01 50 1
1 967799 2009 03 50 1
2 213473 2005 09 100 1
3 498110 2010 12 60 1
5 187243 2010 01 100 1
6 508311 2005 10 15 1
7 486688 2005 10 50 1
8 212550 2005 10 500 1
10 136701 2005 09 25 1
11 471651 2010 01 50 1
I want to get following data frame
我想获得以下数据框
year month sum_score sum_num_attempts
2009 03 50 1
2005 09 125 2
2010 12 60 1
2010 01 200 2
2005 10 565 3
Here is what I tried:
这是我尝试过的:
sum_df = df.groupby(by=['year','month'])['score'].sum()
But this doesn't look efficient and correct. If I have more than one column need to be aggregate this seems like a very expensive call. for example if I have another column num_attempts
and just want to sum by year month as score.
但这看起来并不高效和正确。如果我有多个列需要聚合,这似乎是一个非常昂贵的调用。例如,如果我有另一列num_attempts
并且只想按年月求和作为分数。
回答by Dennis Golomazov
This should be an efficient way:
这应该是一种有效的方法:
sum_df = df.groupby(['year','month']).agg({'score': 'sum', 'num_attempts': 'sum'})