pandas 按一列分组并在熊猫中找到另一列的总和和最大值

Question

提问by jovicbg

I have a dataframe like this:

我有一个这样的数据框：

Name  id  col1  col2  col3  cl4 
PL    252  0     747   3     53  
PL2   252  1     24    2     35 
PL3   252  4     75    24    13 
AD    889  53    24    0     95 
AD2   889  23    2     0     13  
AD3   889  0     24    3     6  
BG    024  12    89    53    66 
BG1   024  43    16    13    0   
BG2   024  5     32    101   4

And now I need to group by ID, and for columns col1 and col4 find the sum for each id and put that into a new column near to parent column (example: col3(sum)) But for col2 and col3 find max value. Desired output:

现在我需要按 ID 分组，对于列 col1 和 col4，找到每个 id 的总和，并将其放入靠近父列的新列中（例如：col3(sum)）但是对于 col2 和 col3 找到最大值。期望的输出：

Name  id  col1 col1(sum) col2 col2(max) col3 col(max) col4 col4(sum)
PL    252  0       5      747    747     3     24    6    18
PL2   252  1       5      24     747     2     24    12   18
PL3   252  4       5      75     747     24    24    0    18
AD    889  53      76     24     24      95    95    23   33
AD2   889  23      76     2      24      13    95    5    33
AD3   889  0       76     24     24      6     95    5    33
BG    024  12      60     89     89      66    66    0    67   
BG1   024  43      60     16     89      0     66    63   67    
BG2   024  5       60     32     89      4     66    4    67

What is the easiest and fastest way to calculate this?

计算这个最简单和最快的方法是什么？

Answer 1

回答by Maresh

The most (pandas) native way to do this, is to use the .agg()method that allows you to specify the aggregation function you want to apply per column (just like you would do in SQL).

执行此操作的最（pandas）本地方法是使用.agg()允许您指定要对每列应用的聚合函数的方法（就像您在 SQL 中所做的那样）。

Sample from the documentation:

来自文档的示例：

df.groupby('A').agg({'B': ['min', 'max'], 'C': 'sum'})

Answer 2

回答by Vaishali

You can use groupby/transform to creat the required columns

您可以使用 groupby/transform 创建所需的列

df[['col1_sum', 'col4_sum']]=df.groupby('id')['col1', 'cl4'].transform('sum')
df[['col2_max', 'col3_max']]=df.groupby('id')['col1', 'cl4'].transform('max')

    Name    id  col1    col2    col3    cl4 col1_sum    col4_sum    col2_max    col3_max
0   PL      252 0       747     3       53  5           101         4   53
1   PL2     252 1       24      2       35  5           101         4   53
2   PL3     252 4       75      24      13  5           101         4   53
3   AD      889 53      24      0       95  76          114         53  95
4   AD2     889 23      2       0       13  76          114         53  95
5   AD3     889 0       24      3       6   76          114         53  95
6   BG      24  12      89      53      66  60          70          43  66
7   BG1     24  43      16      13      0   60          70          43  66
8   BG2     24  5       32      101     4   60          70          43  66

Answer 3

回答by Tbaki

You can use mergewhen you have groupbyand sumon id :

当您在 id 上有groupby和sum时，您可以使用合并：

pd.merge(df,df.groupby("id").sum().reset_index(), on='id',how='outer')

output

输出

Answer 4

回答by Stewbaca

I know this is messy but I like chaining so you can do something like this:

我知道这很乱，但我喜欢链接，所以你可以做这样的事情：

df = df.groupby('id').
    apply(lambda g: g.assign(
        col1_sum=g.col1.sum(),
        col2_max=g.col2.max()))

Basically, this is applying a group based assign command to each group and then combining into a single DataFrame.

基本上，这是将基于组的分配命令应用于每个组，然后组合成单个 DataFrame。

See https://pandas.pydata.org/pandas-docs/stable/api.htmlfor details on each method.

有关每种方法的详细信息，请参阅https://pandas.pydata.org/pandas-docs/stable/api.html。

pandas 按一列分组并在熊猫中找到另一列的总和和最大值

提问by jovicbg

回答by Maresh

回答by Vaishali

回答by Tbaki

回答by Stewbaca

相关推荐

最近更新

标签

pandas 按一列分组并在熊猫中找到另一列的总和和最大值

提问by jovicbg

回答by Maresh

回答by Vaishali

回答by Tbaki

回答by Stewbaca

相关推荐

Pandas 用列表替换列值

使用 pandas dataframe.query() 选择列

删除 Pandas 中的双引号

在 Python Pandas Dataframe 中计算百分位数

相关推荐

最近更新

标签