pandas Python 在将 sum() 与 groupby 一起使用时保留其他列

Question

提问by SwagZ

I have a pandas dataframe below:

我在下面有一个Pandas数据框：

    df

    name    value1    value2  otherstuff1 otherstuff2 
0   Hyman       1         1       1.19        2.39     
1   Hyman       1         2       1.19        2.39
2   Luke       0         1       1.08        1.08  
3   Mark       0         1       3.45        3.45
4   Luke       1         0       1.08        1.08

Same "name" will have the same value for otherstuff1 and otherstuff2.

对于otherstuff1 和otherstuff2，相同的“名称”将具有相同的值。

I'm trying to groupby by column 'name' and sum column 'value1' and sum column 'value2' (Not sum value1 with value2!!! But sum them individually in each column)

我正在尝试按列 'name' 和 sum 列 'value1' 和 sum 列 'value2' 进行分组（不是将 value1 与 value2 相加！但在每列中分别对它们求和）

Expecting to get result below:

期望得到以下结果：

    newdf

    name    value1    value2  otherstuff1 otherstuff2 
0   Hyman       2         3       1.19        2.39     
1   Luke       1         1       1.08        1.08  
2   Mark       0         1       3.45        3.45

I've tried

我试过了

newdf = df.groupby(['name'], as_index = False).sum()

which groupsby name and sums up both value1 and value2 columns correctly but end up dropping column otherstuff1 and otherstuff2.

它按名称分组并正确总结了 value1 和 value2 列，但最终丢弃了列 otherstuff1 和 otherstuff2。

Please help. Thank you guys so much!

请帮忙。非常感谢你们！

Answer 1

采纳答案by YOBEN_S

Something like ?(Assuming you have same otherstuff1 and otherstuff2 under the same name )

类似于

df.groupby(['name','otherstuff1','otherstuff2'],as_index=False).sum()
Out[121]: 
   name  otherstuff1  otherstuff2  value1  value2
0  Hyman         1.19         2.39       2       3
1  Luke         1.08         1.08       1       1
2  Mark         3.45         3.45       0       1

Answer 2

回答by Guybrush

You should specify what pandas must do with the other columns. In your case, I think you want to keep one row, regardless of its position within the group.

您应该指定 pandas 必须对其他列执行的操作。就您而言，我认为您希望保留一行，而不管其在组中的位置如何。

This could be done with aggon a group. aggaccepts a parameter that specifies what operation should be performed for each column.

这可以agg在一个组上完成。agg接受一个参数，该参数指定应该对每一列执行什么操作。

df.groupby(['name'], as_index=False).agg({'value1': 'sum', 'value2': 'sum', 'otherstuff1': 'first', 'otherstuff2': 'first'})

Answer 3

回答by Graven74

The key in the answer above is actually the "as_index=False", otherwise all the columns in the list get used in the index.

上面答案中的关键实际上是“as_index=False”，否则列表中的所有列都会在索引中使用。

p_summ = p.groupby( attributes_list, as_index=False ).agg( {'AMT':sum })

pandas Python 在将 sum() 与 groupby 一起使用时保留其他列

提问by SwagZ

采纳答案by YOBEN_S

回答by Guybrush

回答by Graven74

相关推荐

最近更新

标签

pandas Python 在将 sum() 与 groupby 一起使用时保留其他列

提问by SwagZ

采纳答案by YOBEN_S

回答by Guybrush

回答by Graven74

相关推荐

pandas 'DataFrame' 对象没有属性 'to_frame'

pandas Python（NLTK）-提取名词短语的更有效方法？

Pandas 在 csv 读取后删除第一列

pandas DtypeWarning：列 (15,16,18,24) 具有混合类型。如果列具有混合类型，则会被删除

相关推荐

最近更新

标签