Python Pandas - 数据框 groupby - 如何获得多列的总和

Question

提问by Axel

This should be an easy one, but somehow I couldn't find a solution that works.

这应该很容易，但不知何故我找不到有效的解决方案。

I have a pandas dataframe which looks like this:

我有一个看起来像这样的熊猫数据框：

index col1   col2   col3   col4   col5
0     a      c      1      2      f 
1     a      c      1      2      f
2     a      d      1      2      f
3     b      d      1      2      g
4     b      e      1      2      g
5     b      e      1      2      g

I want to group by col1 and col2 and get the sum()of col3 and col4.Col5can be dropped, since the data can not be aggregated.

我想按 col1 和 col2 分组并获得sum()col3 和 col4。Col5可以删除，因为无法聚合数据。

Here is how the output should look like. I am interested in having both col3and col4in the resulting dataframe. It doesn't really matter if col1and col2are part of the index or not.

下面是输出的样子。我有兴趣同时拥有col3和col4产生的数据帧。是否col1和col2是否是索引的一部分并不重要。

index col1   col2   col3   col4   
0     a      c      2      4          
1     a      d      1      2      
2     b      d      1      2      
3     b      e      2      4

Here is what I tried:

这是我尝试过的：

df_new = df.groupby(['col1', 'col2'])["col3", "col4"].sum()

That however only returns the aggregated results of col4.

然而，这仅返回的聚合结果col4。

I am lost here. Every example I found only aggregates one column, where the issue obviously doesn't occur.

我在这里迷路了。我发现的每个示例都只聚合一列，显然不会发生该问题。

Answer 1

回答by YOBEN_S

By using apply

通过使用 apply

df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum())
Out[1257]: 
           col3  col4
col1 col2            
a    c        2     4
     d        1     2
b    d        1     2
     e        2     4

If you want to agg

如果你想 agg

df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'})

Answer 2

回答by Prateek Sharma

Another generic solution is

另一个通用的解决方案是

df.groupby(['col1','col2']).agg({'col3':'sum','col4':'sum'}).reset_index()

This will give you the required output.

这将为您提供所需的输出。

Answer 3

回答by A.Kot

The issue is likely that df.col3.dtypeis likely not an intor a numeric datatype. Try df.col3 = df.col3.astype(int)before doing your groupby

问题很df.col3.dtype可能不是 anint或 numeric 数据类型。df.col3 = df.col3.astype(int)在做你的事之前尝试groupby

Additionally, select your columns after the groupby to see if the columns are even being aggregated:

此外，在 groupby 之后选择您的列以查看这些列是否正在聚合：

df_new = df.groupby(['col1', 'col2']).sum()[["col3", "col4"]]

Answer 4

回答by Leo James

The above answer didn't work for me.

上面的答案对我不起作用。

df_new = df.groupby(['col1', 'col2']).sum()[["col3", "col4"]]

I was grouping by single group by and sum columns.

我按单个 group by 和 sum 列分组。

Here is the one worked for me.

这是为我工作的那个。

D1.groupby(['col1'])['col2'].sum() << The sum at the end not the middle.

Answer 5

回答by Hanni Ali

I think it would be more efficient to do the following:

我认为执行以下操作会更有效：

df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'}).sum(axis=1)

or:

或者：

df.groupby(['col1', 'col2'])['col3', 'col4'].sum().sum(axis=1)

This does assume you have appropriate types in the dataframe.

这确实假设您在数据框中具有适当的类型。

Python Pandas - 数据框 groupby - 如何获得多列的总和

提问by Axel

回答by YOBEN_S

回答by Prateek Sharma

回答by A.Kot

回答by Leo James

回答by Hanni Ali

相关推荐

最近更新

标签

Python Pandas - 数据框 groupby - 如何获得多列的总和

提问by Axel

回答by YOBEN_S

回答by Prateek Sharma

回答by A.Kot

回答by Leo James

回答by Hanni Ali

相关推荐

Python 导入错误：DLL 加载在 Jupyter 笔记本中失败，但在 .py 文件中工作

Python 精细控制学术论文 Seaborn 图中的字体大小

Python 获取安装在 Anaconda 中的软件包列表

Python OHLC 数据上的 Pandas OHLC 聚合

相关推荐

最近更新

标签