pandas 熊猫:找到最大值,何时和如果条件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36792806/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: find maximum value, when and if conditions
提问by LearningSlowly
I have a dataframe, df:
我有一个数据框,df:
id volume saturation time_delay_normalised speed BPR_free_speed BPR_speed Volume time_normalised
27WESTBOUND 580 0.351515152 57 6.54248366 17.88 15.91366177 580 1.59375
27WESTBOUND 588 0.356363636 100 5.107142857 17.88 15.86519847 588 2.041666667
27WESTBOUND 475 0.287878788 64 6.25625 17.88 16.51161331 475 0.666666667
27EASTBOUND 401 0.243030303 59 6.458064516 17.88 16.88283672 401 1.0914583333
27EASTBOUND 438 0.265454545 46 7.049295775 17.88 16.70300418 438 1.479166667
27EASTBOUND 467 0.283030303 58 6.5 17.88 16.55392848 467 0.9604166667
I wish to create a new column, free_capacity
and set it as the maximum value of Volume
, per ID
, when time_normalised
is less than or equal to 1.1
我希望创建一个新列,free_capacity
并将其设置为Volume
, per的最大值ID
,当time_normalised
小于或等于 1.1
Without considering the time_normalised condition, I can do this:
不考虑 time_normalised 条件,我可以这样做:
df['free_capacity'] = df.groupby('id')["Volume"].transform('max')
How do I add the when time_normalised <= 1.1
condition?
如何添加whentime_normalised <= 1.1
条件?
EDIT
编辑
@jezrael suggested the following:
@jezrael 建议如下:
df.loc[df['time_normalised'] <= 1.1, 'free_capacity'] = df.loc[df['time_normalised'] <= 1.1].groupby('id')["Volume"].transform('max')
Which gives:
这使:
id volume saturation time_delay_normalised speed \
27WESTBOUND 580 0.351515 57 6.542484
27WESTBOUND 588 0.356364 100 5.107143
27WESTBOUND 475 0.287879 64 6.256250
27EASTBOUND 401 0.243030 59 6.458065
27EASTBOUND 438 0.265455 46 7.049296
27EASTBOUND 467 0.283030 58 6.500000
BPR_free_speed BPR_speed Volume time_normalised free_capacity
17.88 15.913662 580 1.593750 NaN
17.88 15.865198 588 2.041667 NaN
17.88 16.511613 475 0.666667 475.0
17.88 16.882837 401 1.091458 467.0
17.88 16.703004 438 1.479167 NaN
17.88 16.553928 467 0.960417 467.0
However, I still wish to attribute the value of free_capacity, identified by id
但是,我仍然希望赋予 free_capacity 的值,标识为 id
Thus, I tried:
因此,我试过:
df['free_capacity'] = df.loc[df['time_normalised'] <= 1.1].groupby('id')["Volume"].transform('max')
However, this still results in NaN values. The 1.1 time_normalised condition is for finding the value, not limiting its application.
但是,这仍然会导致 NaN 值。1.1 time_normalised 条件是为了找到值,而不是限制它的应用。
The desired outcome:
想要的结果:
id volume saturation time_delay_normalised speed \
27WESTBOUND 580 0.351515 57 6.542484
27WESTBOUND 588 0.356364 100 5.107143
27WESTBOUND 475 0.287879 64 6.256250
27EASTBOUND 401 0.243030 59 6.458065
27EASTBOUND 438 0.265455 46 7.049296
27EASTBOUND 467 0.283030 58 6.500000
BPR_free_speed BPR_speed Volume time_normalised free_capacity
17.88 15.913662 580 1.593750 475.0
17.88 15.865198 588 2.041667 475.0
17.88 16.511613 475 0.666667 475.0
17.88 16.882837 401 1.091458 467.0
17.88 16.703004 438 1.479167 467.0
17.88 16.553928 467 0.960417 467.0
采纳答案by jezrael
You can use where
for filtering by conditions and then groupby
by Series
df['id']
with transform
:
您可以使用where
由条件筛选,然后groupby
通过Series
df['id']
用transform
:
df['free_capacity'] = df['Volume'].where(df['time_normalised'] <= 1.1)
.groupby(df['id'])
.transform('max')
print df
id volume saturation time_delay_normalised speed \
0 27WESTBOUND 580 0.351515 57 6.542484
1 27WESTBOUND 588 0.356364 100 5.107143
2 27WESTBOUND 475 0.287879 64 6.256250
3 27EASTBOUND 401 0.243030 59 6.458065
4 27EASTBOUND 438 0.265455 46 7.049296
5 27EASTBOUND 467 0.283030 58 6.500000
BPR_free_speed BPR_speed Volume time_normalised free_capacity
0 17.88 15.913662 580 1.593750 475.0
1 17.88 15.865198 588 2.041667 475.0
2 17.88 16.511613 475 0.666667 475.0
3 17.88 16.882837 401 1.091458 467.0
4 17.88 16.703004 438 1.479167 467.0
5 17.88 16.553928 467 0.960417 467.0
It is same if use where
for creating new column Volume1
by your criteria:
如果where
用于Volume1
根据您的条件创建新列,则相同:
df['Volume1'] = df['Volume'].where(df['time_normalised'] <= 1.1)
print df
id volume saturation time_delay_normalised speed \
0 27WESTBOUND 580 0.351515 57 6.542484
1 27WESTBOUND 588 0.356364 100 5.107143
2 27WESTBOUND 475 0.287879 64 6.256250
3 27EASTBOUND 401 0.243030 59 6.458065
4 27EASTBOUND 438 0.265455 46 7.049296
5 27EASTBOUND 467 0.283030 58 6.500000
BPR_free_speed BPR_speed Volume time_normalised Volume1
0 17.88 15.913662 580 1.593750 NaN
1 17.88 15.865198 588 2.041667 NaN
2 17.88 16.511613 475 0.666667 475.0
3 17.88 16.882837 401 1.091458 401.0
4 17.88 16.703004 438 1.479167 NaN
5 17.88 16.553928 467 0.960417 467.0
Use groupby
with transform
with new column Volume1
:
df['free_capacity'] = df.groupby('id')["Volume1"].transform('max')
print df
id volume saturation time_delay_normalised speed \
0 27WESTBOUND 580 0.351515 57 6.542484
1 27WESTBOUND 588 0.356364 100 5.107143
2 27WESTBOUND 475 0.287879 64 6.256250
3 27EASTBOUND 401 0.243030 59 6.458065
4 27EASTBOUND 438 0.265455 46 7.049296
5 27EASTBOUND 467 0.283030 58 6.500000
BPR_free_speed BPR_speed Volume time_normalised Volume1 free_capacity
0 17.88 15.913662 580 1.593750 NaN 475.0
1 17.88 15.865198 588 2.041667 NaN 475.0
2 17.88 16.511613 475 0.666667 475.0 475.0
3 17.88 16.882837 401 1.091458 401.0 467.0
4 17.88 16.703004 438 1.479167 NaN 467.0
5 17.88 16.553928 467 0.960417 467.0 467.0
回答by CoderBC
There can be several answers, You can also do this:
可能有几个答案,您也可以这样做:
df.set_index('id', inplace=True)
df['free_capacity'] = df.groupby(level=0).apply(lambda x: x.loc[x['time_normalised']<=1.1]['volume'].max())
This gives the following:
这给出了以下内容:
volume saturation time_delay_normalised speed \
id
27WESTBOUND 580 0.351515 57 6.542484
27WESTBOUND 588 0.356364 100 5.107143
27WESTBOUND 475 0.287879 64 6.256250
27EASTBOUND 401 0.243030 59 6.458065
27EASTBOUND 438 0.265455 46 7.049296
27EASTBOUND 467 0.283030 58 6.500000
BPR_free_speed BPR_speed Volume time_normalised wrong_x free_capacity
id
27WESTBOUND 17.88 15.913662 580 1.593750 588 475
27WESTBOUND 17.88 15.865198 588 2.041667 588 475
27WESTBOUND 17.88 16.511613 475 0.666667 588 475
27EASTBOUND 17.88 16.882837 401 1.091458 467 467
27EASTBOUND 17.88 16.703004 438 1.479167 467 467
27EASTBOUND 17.88 16.553928 467 0.960417 467 467
You can reset the index back if you want by df.reset_index(inplace=True)
The wrong_x column is the wrong result, without the condition by doing
如果你想的话,你可以重置索引df.reset_index(inplace=True)
。 wrong_x 列是错误的结果,没有条件通过做
df['wrong_x']=B.groupby(level=0)['volume'].max()
which is what you tried initially.
这就是你最初尝试的。
回答by Parfait
Consider also a groupby().apply()
:
还要考虑一个groupby().apply()
:
def maxtime(row):
row['free_capacity'] = row[row['time_normalised'] <= 1.1]['Volume'].max()
return row
df = df.groupby('id').apply(maxtime)