pandas 熊猫:找到最大值,何时和如果条件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36792806/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:06:09  来源:igfitidea点击:

Pandas: find maximum value, when and if conditions

pythonpandas

提问by LearningSlowly

I have a dataframe, df:

我有一个数据框,df:

id  volume  saturation  time_delay_normalised   speed   BPR_free_speed  BPR_speed   Volume  time_normalised
27WESTBOUND 580 0.351515152 57  6.54248366  17.88   15.91366177 580 1.59375
27WESTBOUND 588 0.356363636 100 5.107142857 17.88   15.86519847 588 2.041666667
27WESTBOUND 475 0.287878788 64  6.25625 17.88   16.51161331 475 0.666666667
27EASTBOUND 401 0.243030303 59  6.458064516 17.88   16.88283672 401 1.0914583333
27EASTBOUND 438 0.265454545 46  7.049295775 17.88   16.70300418 438 1.479166667
27EASTBOUND 467 0.283030303 58  6.5 17.88   16.55392848 467 0.9604166667

I wish to create a new column, free_capacityand set it as the maximum value of Volume, per ID, when time_normalisedis less than or equal to 1.1

我希望创建一个新列,free_capacity并将其设置为Volume, per的最大值ID,当time_normalised小于或等于 1.1

Without considering the time_normalised condition, I can do this:

不考虑 time_normalised 条件,我可以这样做:

df['free_capacity'] = df.groupby('id')["Volume"].transform('max')

How do I add the when time_normalised <= 1.1condition?

如何添加whentime_normalised <= 1.1条件?

EDIT

编辑

@jezrael suggested the following:

@jezrael 建议如下:

df.loc[df['time_normalised'] <= 1.1, 'free_capacity'] = df.loc[df['time_normalised'] <= 1.1].groupby('id')["Volume"].transform('max')

Which gives:

这使:

id  volume  saturation  time_delay_normalised     speed  \
27WESTBOUND     580    0.351515                     57  6.542484   
27WESTBOUND     588    0.356364                    100  5.107143   
27WESTBOUND     475    0.287879                     64  6.256250   
27EASTBOUND     401    0.243030                     59  6.458065   
27EASTBOUND     438    0.265455                     46  7.049296   
27EASTBOUND     467    0.283030                     58  6.500000   

   BPR_free_speed  BPR_speed  Volume  time_normalised  free_capacity  
          17.88  15.913662     580         1.593750            NaN  
          17.88  15.865198     588         2.041667            NaN  
          17.88  16.511613     475         0.666667          475.0  
          17.88  16.882837     401         1.091458          467.0  
          17.88  16.703004     438         1.479167            NaN  
          17.88  16.553928     467         0.960417          467.0 

However, I still wish to attribute the value of free_capacity, identified by id

但是,我仍然希望赋予 free_capacity 的值,标识为 id

Thus, I tried:

因此,我试过:

df['free_capacity'] = df.loc[df['time_normalised'] <= 1.1].groupby('id')["Volume"].transform('max')

However, this still results in NaN values. The 1.1 time_normalised condition is for finding the value, not limiting its application.

但是,这仍然会导致 NaN 值。1.1 time_normalised 条件是为了找到值,而不是限制它的应用。

The desired outcome:

想要的结果:

id  volume  saturation  time_delay_normalised     speed  \
    27WESTBOUND     580    0.351515                     57  6.542484   
    27WESTBOUND     588    0.356364                    100  5.107143   
    27WESTBOUND     475    0.287879                     64  6.256250   
    27EASTBOUND     401    0.243030                     59  6.458065   
    27EASTBOUND     438    0.265455                     46  7.049296   
    27EASTBOUND     467    0.283030                     58  6.500000   

       BPR_free_speed  BPR_speed  Volume  time_normalised  free_capacity  
             17.88  15.913662     580         1.593750          475.0  
             17.88  15.865198     588         2.041667          475.0  
             17.88  16.511613     475         0.666667          475.0  
             17.88  16.882837     401         1.091458          467.0  
             17.88  16.703004     438         1.479167          467.0 
             17.88  16.553928     467         0.960417          467.0

采纳答案by jezrael

You can use wherefor filtering by conditions and then groupbyby Seriesdf['id']with transform:

您可以使用where由条件筛选,然后groupby通过Seriesdf['id']transform

df['free_capacity'] = df['Volume'].where(df['time_normalised'] <= 1.1)
                                  .groupby(df['id'])
                                  .transform('max')
print df
            id  volume  saturation  time_delay_normalised     speed  \
0  27WESTBOUND     580    0.351515                     57  6.542484   
1  27WESTBOUND     588    0.356364                    100  5.107143   
2  27WESTBOUND     475    0.287879                     64  6.256250   
3  27EASTBOUND     401    0.243030                     59  6.458065   
4  27EASTBOUND     438    0.265455                     46  7.049296   
5  27EASTBOUND     467    0.283030                     58  6.500000   

   BPR_free_speed  BPR_speed  Volume  time_normalised  free_capacity  
0           17.88  15.913662     580         1.593750          475.0  
1           17.88  15.865198     588         2.041667          475.0  
2           17.88  16.511613     475         0.666667          475.0  
3           17.88  16.882837     401         1.091458          467.0  
4           17.88  16.703004     438         1.479167          467.0  
5           17.88  16.553928     467         0.960417          467.0  

It is same if use wherefor creating new column Volume1by your criteria:

如果where用于Volume1根据您的条件创建新列,则相同:

df['Volume1'] = df['Volume'].where(df['time_normalised'] <= 1.1)
print df
            id  volume  saturation  time_delay_normalised     speed  \
0  27WESTBOUND     580    0.351515                     57  6.542484   
1  27WESTBOUND     588    0.356364                    100  5.107143   
2  27WESTBOUND     475    0.287879                     64  6.256250   
3  27EASTBOUND     401    0.243030                     59  6.458065   
4  27EASTBOUND     438    0.265455                     46  7.049296   
5  27EASTBOUND     467    0.283030                     58  6.500000   

   BPR_free_speed  BPR_speed  Volume  time_normalised  Volume1  
0           17.88  15.913662     580         1.593750      NaN  
1           17.88  15.865198     588         2.041667      NaN  
2           17.88  16.511613     475         0.666667    475.0  
3           17.88  16.882837     401         1.091458    401.0  
4           17.88  16.703004     438         1.479167      NaN  
5           17.88  16.553928     467         0.960417    467.0 

Use groupbywith transformwith new column Volume1:

使用groupbytransform新柱Volume1

df['free_capacity'] = df.groupby('id')["Volume1"].transform('max')
print df
            id  volume  saturation  time_delay_normalised     speed  \
0  27WESTBOUND     580    0.351515                     57  6.542484   
1  27WESTBOUND     588    0.356364                    100  5.107143   
2  27WESTBOUND     475    0.287879                     64  6.256250   
3  27EASTBOUND     401    0.243030                     59  6.458065   
4  27EASTBOUND     438    0.265455                     46  7.049296   
5  27EASTBOUND     467    0.283030                     58  6.500000   

   BPR_free_speed  BPR_speed  Volume  time_normalised  Volume1  free_capacity  
0           17.88  15.913662     580         1.593750      NaN          475.0  
1           17.88  15.865198     588         2.041667      NaN          475.0  
2           17.88  16.511613     475         0.666667    475.0          475.0  
3           17.88  16.882837     401         1.091458    401.0          467.0  
4           17.88  16.703004     438         1.479167      NaN          467.0  
5           17.88  16.553928     467         0.960417    467.0          467.0  

回答by CoderBC

There can be several answers, You can also do this:

可能有几个答案,您也可以这样做:

df.set_index('id', inplace=True)
df['free_capacity'] = df.groupby(level=0).apply(lambda x: x.loc[x['time_normalised']<=1.1]['volume'].max())

This gives the following:

这给出了以下内容:

             volume  saturation  time_delay_normalised     speed  \
id
27WESTBOUND     580    0.351515                     57  6.542484
27WESTBOUND     588    0.356364                    100  5.107143
27WESTBOUND     475    0.287879                     64  6.256250
27EASTBOUND     401    0.243030                     59  6.458065
27EASTBOUND     438    0.265455                     46  7.049296
27EASTBOUND     467    0.283030                     58  6.500000

             BPR_free_speed  BPR_speed  Volume  time_normalised    wrong_x    free_capacity
id
27WESTBOUND           17.88  15.913662     580         1.593750  588  475
27WESTBOUND           17.88  15.865198     588         2.041667  588  475
27WESTBOUND           17.88  16.511613     475         0.666667  588  475
27EASTBOUND           17.88  16.882837     401         1.091458  467  467
27EASTBOUND           17.88  16.703004     438         1.479167  467  467
27EASTBOUND           17.88  16.553928     467         0.960417  467  467

You can reset the index back if you want by df.reset_index(inplace=True)The wrong_x column is the wrong result, without the condition by doing

如果你想的话,你可以重置索引df.reset_index(inplace=True)。 wrong_x 列是错误的结果,没有条件通过做

df['wrong_x']=B.groupby(level=0)['volume'].max()

which is what you tried initially.

这就是你最初尝试的。

回答by Parfait

Consider also a groupby().apply():

还要考虑一个groupby().apply()

def maxtime(row):
    row['free_capacity'] = row[row['time_normalised'] <= 1.1]['Volume'].max()
    return row

df = df.groupby('id').apply(maxtime)