pandas 按一列分组并在熊猫中找到另一列的总和和最大值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44724480/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Group by one columns and find sum and max value for another in pandas
提问by jovicbg
I have a dataframe like this:
我有一个这样的数据框:
Name id col1 col2 col3 cl4
PL 252 0 747 3 53
PL2 252 1 24 2 35
PL3 252 4 75 24 13
AD 889 53 24 0 95
AD2 889 23 2 0 13
AD3 889 0 24 3 6
BG 024 12 89 53 66
BG1 024 43 16 13 0
BG2 024 5 32 101 4
And now I need to group by ID, and for columns col1 and col4 find the sum for each id and put that into a new column near to parent column (example: col3(sum)) But for col2 and col3 find max value. Desired output:
现在我需要按 ID 分组,对于列 col1 和 col4,找到每个 id 的总和,并将其放入靠近父列的新列中(例如:col3(sum))但是对于 col2 和 col3 找到最大值。期望的输出:
Name id col1 col1(sum) col2 col2(max) col3 col(max) col4 col4(sum)
PL 252 0 5 747 747 3 24 6 18
PL2 252 1 5 24 747 2 24 12 18
PL3 252 4 5 75 747 24 24 0 18
AD 889 53 76 24 24 95 95 23 33
AD2 889 23 76 2 24 13 95 5 33
AD3 889 0 76 24 24 6 95 5 33
BG 024 12 60 89 89 66 66 0 67
BG1 024 43 60 16 89 0 66 63 67
BG2 024 5 60 32 89 4 66 4 67
What is the easiest and fastest way to calculate this?
计算这个最简单和最快的方法是什么?
回答by Maresh
The most (pandas) native way to do this, is to use the .agg()
method that allows you to specify the aggregation function you want to apply per column (just like you would do in SQL).
执行此操作的最(pandas)本地方法是使用.agg()
允许您指定要对每列应用的聚合函数的方法(就像您在 SQL 中所做的那样)。
Sample from the documentation:
来自文档的示例:
df.groupby('A').agg({'B': ['min', 'max'], 'C': 'sum'})
回答by Vaishali
You can use groupby/transform to creat the required columns
您可以使用 groupby/transform 创建所需的列
df[['col1_sum', 'col4_sum']]=df.groupby('id')['col1', 'cl4'].transform('sum')
df[['col2_max', 'col3_max']]=df.groupby('id')['col1', 'cl4'].transform('max')
Name id col1 col2 col3 cl4 col1_sum col4_sum col2_max col3_max
0 PL 252 0 747 3 53 5 101 4 53
1 PL2 252 1 24 2 35 5 101 4 53
2 PL3 252 4 75 24 13 5 101 4 53
3 AD 889 53 24 0 95 76 114 53 95
4 AD2 889 23 2 0 13 76 114 53 95
5 AD3 889 0 24 3 6 76 114 53 95
6 BG 24 12 89 53 66 60 70 43 66
7 BG1 24 43 16 13 0 60 70 43 66
8 BG2 24 5 32 101 4 60 70 43 66
回答by Tbaki
回答by Stewbaca
I know this is messy but I like chaining so you can do something like this:
我知道这很乱,但我喜欢链接,所以你可以做这样的事情:
df = df.groupby('id').
apply(lambda g: g.assign(
col1_sum=g.col1.sum(),
col2_max=g.col2.max()))
Basically, this is applying a group based assign command to each group and then combining into a single DataFrame.
基本上,这是将基于组的分配命令应用于每个组,然后组合成单个 DataFrame。
See https://pandas.pydata.org/pandas-docs/stable/api.htmlfor details on each method.
有关每种方法的详细信息,请参阅https://pandas.pydata.org/pandas-docs/stable/api.html。