pandas 熊猫：找到最大值，何时和如果条件

Question

提问by LearningSlowly

I have a dataframe, df:

我有一个数据框，df：

id  volume  saturation  time_delay_normalised   speed   BPR_free_speed  BPR_speed   Volume  time_normalised
27WESTBOUND 580 0.351515152 57  6.54248366  17.88   15.91366177 580 1.59375
27WESTBOUND 588 0.356363636 100 5.107142857 17.88   15.86519847 588 2.041666667
27WESTBOUND 475 0.287878788 64  6.25625 17.88   16.51161331 475 0.666666667
27EASTBOUND 401 0.243030303 59  6.458064516 17.88   16.88283672 401 1.0914583333
27EASTBOUND 438 0.265454545 46  7.049295775 17.88   16.70300418 438 1.479166667
27EASTBOUND 467 0.283030303 58  6.5 17.88   16.55392848 467 0.9604166667

I wish to create a new column, free_capacityand set it as the maximum value of Volume, per ID, when time_normalisedis less than or equal to 1.1

我希望创建一个新列，free_capacity并将其设置为Volume, per的最大值ID，当time_normalised小于或等于 1.1

Without considering the time_normalised condition, I can do this:

不考虑 time_normalised 条件，我可以这样做：

df['free_capacity'] = df.groupby('id')["Volume"].transform('max')

How do I add the when time_normalised <= 1.1condition?

如何添加whentime_normalised <= 1.1条件？

EDIT

编辑

@jezrael suggested the following:

@jezrael 建议如下：

df.loc[df['time_normalised'] <= 1.1, 'free_capacity'] = df.loc[df['time_normalised'] <= 1.1].groupby('id')["Volume"].transform('max')

Which gives:

这使：

id  volume  saturation  time_delay_normalised     speed  \
27WESTBOUND     580    0.351515                     57  6.542484   
27WESTBOUND     588    0.356364                    100  5.107143   
27WESTBOUND     475    0.287879                     64  6.256250   
27EASTBOUND     401    0.243030                     59  6.458065   
27EASTBOUND     438    0.265455                     46  7.049296   
27EASTBOUND     467    0.283030                     58  6.500000   

   BPR_free_speed  BPR_speed  Volume  time_normalised  free_capacity  
          17.88  15.913662     580         1.593750            NaN  
          17.88  15.865198     588         2.041667            NaN  
          17.88  16.511613     475         0.666667          475.0  
          17.88  16.882837     401         1.091458          467.0  
          17.88  16.703004     438         1.479167            NaN  
          17.88  16.553928     467         0.960417          467.0

However, I still wish to attribute the value of free_capacity, identified by id

但是，我仍然希望赋予 free_capacity 的值，标识为 id

Thus, I tried:

因此，我试过：

df['free_capacity'] = df.loc[df['time_normalised'] <= 1.1].groupby('id')["Volume"].transform('max')

However, this still results in NaN values. The 1.1 time_normalised condition is for finding the value, not limiting its application.

但是，这仍然会导致 NaN 值。1.1 time_normalised 条件是为了找到值，而不是限制它的应用。

The desired outcome:

想要的结果：

id  volume  saturation  time_delay_normalised     speed  \
    27WESTBOUND     580    0.351515                     57  6.542484   
    27WESTBOUND     588    0.356364                    100  5.107143   
    27WESTBOUND     475    0.287879                     64  6.256250   
    27EASTBOUND     401    0.243030                     59  6.458065   
    27EASTBOUND     438    0.265455                     46  7.049296   
    27EASTBOUND     467    0.283030                     58  6.500000   

       BPR_free_speed  BPR_speed  Volume  time_normalised  free_capacity  
             17.88  15.913662     580         1.593750          475.0  
             17.88  15.865198     588         2.041667          475.0  
             17.88  16.511613     475         0.666667          475.0  
             17.88  16.882837     401         1.091458          467.0  
             17.88  16.703004     438         1.479167          467.0 
             17.88  16.553928     467         0.960417          467.0

Answer 1

采纳答案by jezrael

You can use wherefor filtering by conditions and then groupbyby Seriesdf['id']with transform:

您可以使用where由条件筛选，然后groupby通过Seriesdf['id']用transform：

df['free_capacity'] = df['Volume'].where(df['time_normalised'] <= 1.1)
                                  .groupby(df['id'])
                                  .transform('max')
print df
            id  volume  saturation  time_delay_normalised     speed  \
0  27WESTBOUND     580    0.351515                     57  6.542484   
1  27WESTBOUND     588    0.356364                    100  5.107143   
2  27WESTBOUND     475    0.287879                     64  6.256250   
3  27EASTBOUND     401    0.243030                     59  6.458065   
4  27EASTBOUND     438    0.265455                     46  7.049296   
5  27EASTBOUND     467    0.283030                     58  6.500000   

   BPR_free_speed  BPR_speed  Volume  time_normalised  free_capacity  
0           17.88  15.913662     580         1.593750          475.0  
1           17.88  15.865198     588         2.041667          475.0  
2           17.88  16.511613     475         0.666667          475.0  
3           17.88  16.882837     401         1.091458          467.0  
4           17.88  16.703004     438         1.479167          467.0  
5           17.88  16.553928     467         0.960417          467.0

It is same if use wherefor creating new column Volume1by your criteria:

如果where用于Volume1根据您的条件创建新列，则相同：

df['Volume1'] = df['Volume'].where(df['time_normalised'] <= 1.1)
print df
            id  volume  saturation  time_delay_normalised     speed  \
0  27WESTBOUND     580    0.351515                     57  6.542484   
1  27WESTBOUND     588    0.356364                    100  5.107143   
2  27WESTBOUND     475    0.287879                     64  6.256250   
3  27EASTBOUND     401    0.243030                     59  6.458065   
4  27EASTBOUND     438    0.265455                     46  7.049296   
5  27EASTBOUND     467    0.283030                     58  6.500000   

   BPR_free_speed  BPR_speed  Volume  time_normalised  Volume1  
0           17.88  15.913662     580         1.593750      NaN  
1           17.88  15.865198     588         2.041667      NaN  
2           17.88  16.511613     475         0.666667    475.0  
3           17.88  16.882837     401         1.091458    401.0  
4           17.88  16.703004     438         1.479167      NaN  
5           17.88  16.553928     467         0.960417    467.0

Use groupbywith transformwith new column Volume1:

使用groupby与transform新柱Volume1：

df['free_capacity'] = df.groupby('id')["Volume1"].transform('max')
print df
            id  volume  saturation  time_delay_normalised     speed  \
0  27WESTBOUND     580    0.351515                     57  6.542484   
1  27WESTBOUND     588    0.356364                    100  5.107143   
2  27WESTBOUND     475    0.287879                     64  6.256250   
3  27EASTBOUND     401    0.243030                     59  6.458065   
4  27EASTBOUND     438    0.265455                     46  7.049296   
5  27EASTBOUND     467    0.283030                     58  6.500000   

   BPR_free_speed  BPR_speed  Volume  time_normalised  Volume1  free_capacity  
0           17.88  15.913662     580         1.593750      NaN          475.0  
1           17.88  15.865198     588         2.041667      NaN          475.0  
2           17.88  16.511613     475         0.666667    475.0          475.0  
3           17.88  16.882837     401         1.091458    401.0          467.0  
4           17.88  16.703004     438         1.479167      NaN          467.0  
5           17.88  16.553928     467         0.960417    467.0          467.0

Answer 2

回答by CoderBC

There can be several answers, You can also do this:

可能有几个答案，您也可以这样做：

df.set_index('id', inplace=True)
df['free_capacity'] = df.groupby(level=0).apply(lambda x: x.loc[x['time_normalised']<=1.1]['volume'].max())

This gives the following:

这给出了以下内容：

             volume  saturation  time_delay_normalised     speed  \
id
27WESTBOUND     580    0.351515                     57  6.542484
27WESTBOUND     588    0.356364                    100  5.107143
27WESTBOUND     475    0.287879                     64  6.256250
27EASTBOUND     401    0.243030                     59  6.458065
27EASTBOUND     438    0.265455                     46  7.049296
27EASTBOUND     467    0.283030                     58  6.500000

             BPR_free_speed  BPR_speed  Volume  time_normalised    wrong_x    free_capacity
id
27WESTBOUND           17.88  15.913662     580         1.593750  588  475
27WESTBOUND           17.88  15.865198     588         2.041667  588  475
27WESTBOUND           17.88  16.511613     475         0.666667  588  475
27EASTBOUND           17.88  16.882837     401         1.091458  467  467
27EASTBOUND           17.88  16.703004     438         1.479167  467  467
27EASTBOUND           17.88  16.553928     467         0.960417  467  467

You can reset the index back if you want by df.reset_index(inplace=True)The wrong_x column is the wrong result, without the condition by doing

如果你想的话，你可以重置索引df.reset_index(inplace=True)。 wrong_x 列是错误的结果，没有条件通过做

df['wrong_x']=B.groupby(level=0)['volume'].max()

which is what you tried initially.

这就是你最初尝试的。

Answer 3

回答by Parfait

Consider also a groupby().apply():

还要考虑一个groupby().apply()：

def maxtime(row):
    row['free_capacity'] = row[row['time_normalised'] <= 1.1]['Volume'].max()
    return row

df = df.groupby('id').apply(maxtime)

pandas 熊猫：找到最大值，何时和如果条件

提问by LearningSlowly

采纳答案by jezrael

回答by CoderBC

回答by Parfait

相关推荐

最近更新

标签

pandas 熊猫：找到最大值，何时和如果条件

提问by LearningSlowly

采纳答案by jezrael

回答by CoderBC

回答by Parfait

相关推荐

对 Pandas 数据框中的每一行只运行一次函数

pandas FutureWarning：元素比较失败；而是返回标量

pandas 如何在 Python 中创建一个条件低于或高于中位数的虚拟变量？

在 Pandas 中，如何根据多列的组合创建唯一 ID？

相关推荐

最近更新

标签