Python Pandas - 数据框 groupby - 如何获得多列的总和
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46431243/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas - dataframe groupby - how to get sum of multiple columns
提问by Axel
This should be an easy one, but somehow I couldn't find a solution that works.
这应该很容易,但不知何故我找不到有效的解决方案。
I have a pandas dataframe which looks like this:
我有一个看起来像这样的熊猫数据框:
index col1 col2 col3 col4 col5
0 a c 1 2 f
1 a c 1 2 f
2 a d 1 2 f
3 b d 1 2 g
4 b e 1 2 g
5 b e 1 2 g
I want to group by col1 and col2 and get the sum()
of col3 and col4.Col5
can be dropped, since the data can not be aggregated.
我想按 col1 和 col2 分组并获得sum()
col3 和 col4。Col5
可以删除,因为无法聚合数据。
Here is how the output should look like. I am interested in having both col3
and col4
in the resulting dataframe. It doesn't really matter if col1
and col2
are part of the index or not.
下面是输出的样子。我有兴趣同时拥有col3
和col4
产生的数据帧。是否col1
和col2
是否是索引的一部分并不重要。
index col1 col2 col3 col4
0 a c 2 4
1 a d 1 2
2 b d 1 2
3 b e 2 4
Here is what I tried:
这是我尝试过的:
df_new = df.groupby(['col1', 'col2'])["col3", "col4"].sum()
That however only returns the aggregated results of col4
.
然而,这仅返回 的聚合结果col4
。
I am lost here. Every example I found only aggregates one column, where the issue obviously doesn't occur.
我在这里迷路了。我发现的每个示例都只聚合一列,显然不会发生该问题。
回答by YOBEN_S
By using apply
通过使用 apply
df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum())
Out[1257]:
col3 col4
col1 col2
a c 2 4
d 1 2
b d 1 2
e 2 4
If you want to agg
如果你想 agg
df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'})
回答by Prateek Sharma
Another generic solution is
另一个通用的解决方案是
df.groupby(['col1','col2']).agg({'col3':'sum','col4':'sum'}).reset_index()
This will give you the required output.
这将为您提供所需的输出。
回答by A.Kot
The issue is likely that df.col3.dtype
is likely not an int
or a numeric datatype. Try df.col3 = df.col3.astype(int)
before doing your groupby
问题很df.col3.dtype
可能不是 anint
或 numeric 数据类型。df.col3 = df.col3.astype(int)
在做你的事之前尝试groupby
Additionally, select your columns after the groupby to see if the columns are even being aggregated:
此外,在 groupby 之后选择您的列以查看这些列是否正在聚合:
df_new = df.groupby(['col1', 'col2']).sum()[["col3", "col4"]]
回答by Leo James
The above answer didn't work for me.
上面的答案对我不起作用。
df_new = df.groupby(['col1', 'col2']).sum()[["col3", "col4"]]
I was grouping by single group by and sum columns.
我按单个 group by 和 sum 列分组。
Here is the one worked for me.
这是为我工作的那个。
D1.groupby(['col1'])['col2'].sum() << The sum at the end not the middle.
回答by Hanni Ali
I think it would be more efficient to do the following:
我认为执行以下操作会更有效:
df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'}).sum(axis=1)
or:
或者:
df.groupby(['col1', 'col2'])['col3', 'col4'].sum().sum(axis=1)
This does assume you have appropriate types in the dataframe.
这确实假设您在数据框中具有适当的类型。