pandas 数据帧中每组熊猫的第二个最大值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36310564/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas second max value per group in dataframe
提问by Arthur Zangiev
I have a dataframe:
我有一个数据框:
bq_selection_id bq_balance bq_market_id bq_back_price
0 45094462 185.04 155 1.87
1 45094462 185.04 155 1.97
2 45094463 185.04 155 3.05
3 45094463 185.04 156 3.05
4 45094464 185.04 156 5.80
5 45094464 185.04 156 5.80
6 45094466 185.04 157 200.00
7 45094466 185.04 157 200.00
8 45094465 185.04 157 NaN
9 45094465 185.04 157 NaN
I would like to have two extra columns second_lowest, none_values both of them per group, groupby market id. Another words for market id 155 second_lowest 1.97 and there are no NaN values therefore none_values is False. I would like to get something like:
我想要两个额外的列 second_lowest, none_values 每组,groupby 市场 ID。市场 ID 155 second_lowest 1.97 的另一种说法,并且没有 NaN 值,因此 none_values 是 False。我想得到类似的东西:
bq_selection_id bq_balance bq_market_id bq_back_price second_lowest none_val
0 45094462 185.04 155 1.87 1.97 False
1 45094462 185.04 155 1.97 1.97 False
2 45094463 185.04 155 3.05 1.97 False
3 45094463 185.04 156 3.05 5.80 False
4 45094464 185.04 156 5.80 5.80 False
5 45094464 185.04 156 6.40 5.80 False
6 45094466 185.04 157 1.00 1.70 True
7 45094466 185.04 157 1.70 1.70 True
8 45094465 185.04 157 NaN 1.70 True
9 45094465 185.04 157 NaN 1.70 True
Can you help me with this, please?
你能帮我解决这个问题吗?
回答by unutbu
Combining ideas used in your previous questions (1, 2),
you could use groupby/transform
to assign a new value for each row in your DataFrame:
结合您之前的问题 ( 1, 2) 中使用的想法,您可以使用groupby/transform
为 DataFrame 中的每一行分配一个新值:
import numpy as np
import pandas as pd
pd.options.display.width = 1000
df = pd.DataFrame(
{'bq_back_price': [1.87, 1.97, 3.05, 3.05, 5.8, 5.8, 200.0, 200.0, np.nan, np.nan],
'bq_balance': [1850.4, 1850.4, 1850.4, 1850.4, 1850.4, 1850.4, 1850.4,
1850.4, 1850.4, 1850.4],
'bq_market_id': [155, 155, 155, 156, 156, 156, 157, 157, 157, 157],
'bq_selection_id': [45094462, 45094462, 45094463, 45094463, 45094464,
45094464, 45094466, 45094466, 45094465, 45094465]})
grouped = df.groupby('bq_market_id')['bq_back_price']
df['second_lowest'] = grouped.transform(lambda x: x.nsmallest(2).max())
df['has_null'] = grouped.transform(lambda x: pd.isnull(x).any()).astype(bool)
print(df)
yields
产量
bq_back_price bq_balance bq_market_id bq_selection_id second_lowest has_null
0 1.87 1850.4 155 45094462 1.97 False
1 1.97 1850.4 155 45094462 1.97 False
2 3.05 1850.4 155 45094463 1.97 False
3 3.05 1850.4 156 45094463 5.80 False
4 5.80 1850.4 156 45094464 5.80 False
5 5.80 1850.4 156 45094464 5.80 False
6 200.00 1850.4 157 45094466 200.00 True
7 200.00 1850.4 157 45094466 200.00 True
8 NaN 1850.4 157 45094465 200.00 True
9 NaN 1850.4 157 45094465 200.00 True
回答by mdurant
How about:
怎么样:
gb = df.groupby('bq_market_id')
df['second_lowest'] = gb.bq_back_price.apply(lambda x: x.sort_values(ascending=False).iloc[1])[df.bq_market_id]
df['none_val'] = gb.bq_back_price.apply(lambda x: x.isnull().values.any())[df.bq_market_id]