Python Pandas - 数据框 groupby - 如何获得多列的总和
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46431243/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas - dataframe groupby - how to get sum of multiple columns
提问by Axel
This should be an easy one, but somehow I couldn't find a solution that works.
这应该很容易,但不知何故我找不到有效的解决方案。
I have a pandas dataframe which looks like this:
我有一个看起来像这样的熊猫数据框:
index col1 col2 col3 col4 col5
0 a c 1 2 f
1 a c 1 2 f
2 a d 1 2 f
3 b d 1 2 g
4 b e 1 2 g
5 b e 1 2 g
I want to group by col1 and col2 and get the sum()of col3 and col4.Col5can be dropped, since the data can not be aggregated.
我想按 col1 和 col2 分组并获得sum()col3 和 col4。Col5可以删除,因为无法聚合数据。
Here is how the output should look like. I am interested in having both col3and col4in the resulting dataframe. It doesn't really matter if col1and col2are part of the index or not.
下面是输出的样子。我有兴趣同时拥有col3和col4产生的数据帧。是否col1和col2是否是索引的一部分并不重要。
index col1 col2 col3 col4
0 a c 2 4
1 a d 1 2
2 b d 1 2
3 b e 2 4
Here is what I tried:
这是我尝试过的:
df_new = df.groupby(['col1', 'col2'])["col3", "col4"].sum()
That however only returns the aggregated results of col4.
然而,这仅返回 的聚合结果col4。
I am lost here. Every example I found only aggregates one column, where the issue obviously doesn't occur.
我在这里迷路了。我发现的每个示例都只聚合一列,显然不会发生该问题。
回答by YOBEN_S
By using apply
通过使用 apply
df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum())
Out[1257]:
col3 col4
col1 col2
a c 2 4
d 1 2
b d 1 2
e 2 4
If you want to agg
如果你想 agg
df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'})
回答by Prateek Sharma
Another generic solution is
另一个通用的解决方案是
df.groupby(['col1','col2']).agg({'col3':'sum','col4':'sum'}).reset_index()
This will give you the required output.
这将为您提供所需的输出。
回答by A.Kot
The issue is likely that df.col3.dtypeis likely not an intor a numeric datatype. Try df.col3 = df.col3.astype(int)before doing your groupby
问题很df.col3.dtype可能不是 anint或 numeric 数据类型。df.col3 = df.col3.astype(int)在做你的事之前尝试groupby
Additionally, select your columns after the groupby to see if the columns are even being aggregated:
此外,在 groupby 之后选择您的列以查看这些列是否正在聚合:
df_new = df.groupby(['col1', 'col2']).sum()[["col3", "col4"]]
回答by Leo James
The above answer didn't work for me.
上面的答案对我不起作用。
df_new = df.groupby(['col1', 'col2']).sum()[["col3", "col4"]]
I was grouping by single group by and sum columns.
我按单个 group by 和 sum 列分组。
Here is the one worked for me.
这是为我工作的那个。
D1.groupby(['col1'])['col2'].sum() << The sum at the end not the middle.
回答by Hanni Ali
I think it would be more efficient to do the following:
我认为执行以下操作会更有效:
df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'}).sum(axis=1)
or:
或者:
df.groupby(['col1', 'col2'])['col3', 'col4'].sum().sum(axis=1)
This does assume you have appropriate types in the dataframe.
这确实假设您在数据框中具有适当的类型。

