pandas Python 在将 sum() 与 groupby 一起使用时保留其他列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49783178/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:27:00  来源:igfitidea点击:

Python Keep other columns when using sum() with groupby

pythonpandas

提问by SwagZ

I have a pandas dataframe below:

我在下面有一个Pandas数据框:

    df

    name    value1    value2  otherstuff1 otherstuff2 
0   Hyman       1         1       1.19        2.39     
1   Hyman       1         2       1.19        2.39
2   Luke       0         1       1.08        1.08  
3   Mark       0         1       3.45        3.45
4   Luke       1         0       1.08        1.08

Same "name" will have the same value for otherstuff1 and otherstuff2.

对于otherstuff1 和otherstuff2,相同的“名称”将具有相同的值。

I'm trying to groupby by column 'name' and sum column 'value1' and sum column 'value2' (Not sum value1 with value2!!! But sum them individually in each column)

我正在尝试按列 'name' 和 sum 列 'value1' 和 sum 列 'value2' 进行分组(不是将 value1 与 value2 相加!但在每列中分别对它们求和)

Expecting to get result below:

期望得到以下结果:

    newdf

    name    value1    value2  otherstuff1 otherstuff2 
0   Hyman       2         3       1.19        2.39     
1   Luke       1         1       1.08        1.08  
2   Mark       0         1       3.45        3.45

I've tried

我试过了

newdf = df.groupby(['name'], as_index = False).sum()

which groupsby name and sums up both value1 and value2 columns correctly but end up dropping column otherstuff1 and otherstuff2.

它按名称分组并正确总结了 value1 和 value2 列,但最终丢弃了列 otherstuff1 和 otherstuff2。

Please help. Thank you guys so much!

请帮忙。非常感谢你们!

采纳答案by YOBEN_S

Something like ?(Assuming you have same otherstuff1 and otherstuff2 under the same name )

类似于

df.groupby(['name','otherstuff1','otherstuff2'],as_index=False).sum()
Out[121]: 
   name  otherstuff1  otherstuff2  value1  value2
0  Hyman         1.19         2.39       2       3
1  Luke         1.08         1.08       1       1
2  Mark         3.45         3.45       0       1

回答by Guybrush

You should specify what pandas must do with the other columns. In your case, I think you want to keep one row, regardless of its position within the group.

您应该指定 pandas 必须对其他列执行的操作。就您而言,我认为您希望保留一行,而不管其在组中的位置如何。

This could be done with aggon a group. aggaccepts a parameter that specifies what operation should be performed for each column.

这可以agg在一个组上完成。agg接受一个参数,该参数指定应该对每一列执行什么操作。

df.groupby(['name'], as_index=False).agg({'value1': 'sum', 'value2': 'sum', 'otherstuff1': 'first', 'otherstuff2': 'first'})

回答by Graven74

The key in the answer above is actually the "as_index=False", otherwise all the columns in the list get used in the index.

上面答案中的关键实际上是“as_index=False”,否则列表中的所有列都会在索引中使用。

p_summ = p.groupby( attributes_list, as_index=False ).agg( {'AMT':sum })

p_summ = p.groupby( attributes_list, as_index=False ).agg( {'AMT':sum })