Python Pandas - 数据框 groupby - 如何获得多列的总和

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46431243/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:38:34  来源:igfitidea点击:

Pandas - dataframe groupby - how to get sum of multiple columns

pythonpandasdataframepandas-groupby

提问by Axel

This should be an easy one, but somehow I couldn't find a solution that works.

这应该很容易,但不知何故我找不到有效的解决方案。

I have a pandas dataframe which looks like this:

我有一个看起来像这样的熊猫数据框:

index col1   col2   col3   col4   col5
0     a      c      1      2      f 
1     a      c      1      2      f
2     a      d      1      2      f
3     b      d      1      2      g
4     b      e      1      2      g
5     b      e      1      2      g

I want to group by col1 and col2 and get the sum()of col3 and col4.Col5can be dropped, since the data can not be aggregated.

我想按 col1 和 col2 分组并获得sum()col3 和 col4。Col5可以删除,因为无法聚合数据。

Here is how the output should look like. I am interested in having both col3and col4in the resulting dataframe. It doesn't really matter if col1and col2are part of the index or not.

下面是输出的样子。我有兴趣同时拥有col3col4产生的数据帧。是否col1col2是否是索引的一部分并不重要。

index col1   col2   col3   col4   
0     a      c      2      4          
1     a      d      1      2      
2     b      d      1      2      
3     b      e      2      4      

Here is what I tried:

这是我尝试过的:

df_new = df.groupby(['col1', 'col2'])["col3", "col4"].sum()

That however only returns the aggregated results of col4.

然而,这仅返回 的聚合结果col4

I am lost here. Every example I found only aggregates one column, where the issue obviously doesn't occur.

我在这里迷路了。我发现的每个示例都只聚合一列,显然不会发生该问题。

回答by YOBEN_S

By using apply

通过使用 apply

df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum())
Out[1257]: 
           col3  col4
col1 col2            
a    c        2     4
     d        1     2
b    d        1     2
     e        2     4

If you want to agg

如果你想 agg

df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'})

回答by Prateek Sharma

Another generic solution is

另一个通用的解决方案是

df.groupby(['col1','col2']).agg({'col3':'sum','col4':'sum'}).reset_index()

This will give you the required output.

这将为您提供所需的输出。

回答by A.Kot

The issue is likely that df.col3.dtypeis likely not an intor a numeric datatype. Try df.col3 = df.col3.astype(int)before doing your groupby

问题很df.col3.dtype可能不是 anint或 numeric 数据类型。df.col3 = df.col3.astype(int)在做你的事之前尝试groupby

Additionally, select your columns after the groupby to see if the columns are even being aggregated:

此外,在 groupby 之后选择您的列以查看这些列是否正在聚合:

df_new = df.groupby(['col1', 'col2']).sum()[["col3", "col4"]]

回答by Leo James

The above answer didn't work for me.

上面的答案对我不起作用。

df_new = df.groupby(['col1', 'col2']).sum()[["col3", "col4"]]

I was grouping by single group by and sum columns.

我按单个 group by 和 sum 列分组。

Here is the one worked for me.

这是为我工作的那个。

D1.groupby(['col1'])['col2'].sum() << The sum at the end not the middle.

回答by Hanni Ali

I think it would be more efficient to do the following:

我认为执行以下操作会更有效:

df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'}).sum(axis=1)

or:

或者:

df.groupby(['col1', 'col2'])['col3', 'col4'].sum().sum(axis=1)

This does assume you have appropriate types in the dataframe.

这确实假设您在数据框中具有适当的类型。