Pandas Dataframe 中 group by 的多重聚合

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35901959/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:50:17  来源:igfitidea点击:

Multiple aggregation in group by in Pandas Dataframe

pythonpandasgroup-bydataframeaggregate-functions

提问by Ivan KR

SQL : Select Max(A) , Min (B) , C from Table group by C 

I want to do the same operation in pandas on a dataframe. The closer I got was till :

我想在数据帧上的 Pandas 中执行相同的操作。我离得越近,直到:

DF2= DF1.groupby(by=['C']).max() 

where I land up getting max of both the columns , how do i do more than one operation while grouping by.

我在哪里获得两列的最大值,我如何在分组时执行多个操作。

采纳答案by MaxU

try agg()function:

尝试agg()功能:

import numpy as np
import pandas as pd


df = pd.DataFrame(np.random.randint(0,5,size=(20, 3)), columns=list('ABC'))
print(df)

print(df.groupby('C').agg({'A': max, 'B':min}))

Output:

输出:

    A  B  C
0   2  3  0
1   2  2  1
2   4  0  1
3   0  1  4
4   3  3  2
5   0  4  3
6   2  4  2
7   3  4  0
8   4  2  2
9   3  2  1
10  2  3  1
11  4  1  0
12  4  3  2
13  0  0  1
14  3  1  1
15  4  1  1
16  0  0  0
17  4  0  1
18  3  4  0
19  0  2  4
   A  B
C
0  4  0
1  4  0
2  4  2
3  0  4
4  0  1

Alternatively you may want to check pandas.read_sql_query()function...

或者,您可能想检查pandas.read_sql_query()函数...

回答by jezrael

You can use function agg:

您可以使用功能agg

DF2 = DF1.groupby('C').agg({'A': max, 'B': min})

Sample:

样本:

print DF1
   A   B  C  D
0  1   5  a  a
1  7   9  a  b
2  2  10  c  d
3  3   2  c  c

DF2 = DF1.groupby('C').agg({'A': max, 'B': min})

print DF2
   A  B
C      
a  7  5
c  3  2

GroupBy-fu: improvements in grouping and aggregating data in pandas - nice explanations.

GroupBy-fu:在 Pandas 中分组和聚合数据的改进- 很好的解释。

回答by dmb

You can use the aggfunction

您可以使用agg功能

import pandas as pd
import numpy as np

df.groupby('something').agg({'column1': np.max, 'columns2': np.min})