pandas Groupby 列并找到每个组的最小值和最大值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46501703/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:33:03  来源:igfitidea点击:

Groupby column and find min and max of each group

pythonpandasdataframegroup-bypandas-groupby

提问by The Cat

I have the following dataset,

我有以下数据集,

        Day    Element  Data_Value
6786    01-01   TMAX    112
9333    01-01   TMAX    101
9330    01-01   TMIN    60
11049   01-01   TMIN    0
6834    01-01   TMIN    25
11862   01-01   TMAX    113
1781    01-01   TMAX    115
11042   01-01   TMAX    105
1110    01-01   TMAX    111
651     01-01   TMIN    44
11350   01-01   TMIN    83
1798    01-02   TMAX    70
4975    01-02   TMAX    79
12774   01-02   TMIN    0
3977    01-02   TMIN    60
2485    01-02   TMAX    73
4888    01-02   TMIN    31
11836   01-02   TMIN    26
11368   01-02   TMAX    71
2483    01-02   TMIN    26

I want to group by the Day and then find the overall min of TMIN an the max of TMAX and put these in to a data frame, so I get an output like...

我想按天分组,然后找到 TMIN 的总体最小值和 TMAX 的最大值,并将它们放入数据框中,所以我得到了类似的输出...

Day    DayMin    DayMax
01-01  0         115
01-02  0         79

I know I need to do,

我知道我需要做,

df.groupby(by='Day')

but I am a stuck with the next step - should create columns to store the TMAX and TMIN values?

但我坚持下一步 - 应该创建列来存储 TMAX 和 TMIN 值吗?

回答by cs95

You can use a assign+ abs, followed by groupby+ agg:

您可以使用assign+ abs,后跟groupby+ agg

df = (df.assign(Data_Value=df['Data_Value'].abs())
       .groupby(['Day'])['Data_Value'].agg([('Min' , 'min'), ('Max', 'max')])
       .add_prefix('Day'))

df 
       DayMin  DayMax
Day                  
01-01       0     115
01-02       0      79

回答by Zero

Use

In [5265]: def maxmin(x):
      ...:     mx = x[x.Element == 'TMAX'].Data_Value.max()
      ...:     mn = x[x.Element == 'TMIN'].Data_Value.min()
      ...:     return pd.Series({'DayMin': mn, 'DayMax': mx})
      ...:

In [5266]: df.groupby('Day').apply(maxmin)
Out[5266]:
       DayMax  DayMin
Day
01-01     115       0
01-02      79       0

Also,

还,

In [5268]: df.groupby('Day').apply(maxmin).reset_index()
Out[5268]:
     Day  DayMax  DayMin
0  01-01     115       0
1  01-02      79       0

Or, use queryinstead of x[x.Element == 'TMAX']as x.query("Element == 'TMAX'")

或者,使用query代替x[x.Element == 'TMAX']作为x.query("Element == 'TMAX'")

回答by Bharath

Create duplicate columns and find min and max using agg i.e

创建重复的列并使用 agg 查找最小值和最大值,即

ndf = df.assign(DayMin = df['Data_Value'].abs(),DayMax=df['Data_Value'].abs()).groupby('Day')\
     .agg({'DayMin':'min','DayMax':'max'})
     DayMax  DayMin
Day                  
01-01     115       0
01-02      79       0

Incase you want both TMIN and TMAX then groupby(['Day','Element'])

如果你想要 TMIN 和 TMAX 那么 groupby(['Day','Element'])