pandas 数据帧中每组熊猫的第二个最大值

Question

提问by Arthur Zangiev

I have a dataframe:

我有一个数据框：

bq_selection_id  bq_balance  bq_market_id  bq_back_price
0         45094462      185.04       155           1.87
1         45094462      185.04       155           1.97
2         45094463      185.04       155           3.05
3         45094463      185.04       156           3.05
4         45094464      185.04       156           5.80
5         45094464      185.04       156           5.80
6         45094466      185.04       157         200.00
7         45094466      185.04       157         200.00
8         45094465      185.04       157            NaN
9         45094465      185.04       157            NaN

I would like to have two extra columns second_lowest, none_values both of them per group, groupby market id. Another words for market id 155 second_lowest 1.97 and there are no NaN values therefore none_values is False. I would like to get something like:

我想要两个额外的列 second_lowest, none_values 每组，groupby 市场 ID。市场 ID 155 second_lowest 1.97 的另一种说法，并且没有 NaN 值，因此 none_values 是 False。我想得到类似的东西：

bq_selection_id bq_balance bq_market_id bq_back_price second_lowest none_val
0         45094462      185.04       155           1.87    1.97       False
1         45094462      185.04       155           1.97    1.97       False
2         45094463      185.04       155           3.05    1.97       False
3         45094463      185.04       156           3.05    5.80       False
4         45094464      185.04       156           5.80    5.80       False
5         45094464      185.04       156           6.40    5.80       False
6         45094466      185.04       157           1.00    1.70       True
7         45094466      185.04       157           1.70    1.70       True
8         45094465      185.04       157            NaN    1.70       True
9         45094465      185.04       157            NaN    1.70       True

Can you help me with this, please?

你能帮我解决这个问题吗？

Answer 1

回答by unutbu

Combining ideas used in your previous questions (1, 2), you could use groupby/transformto assign a new value for each row in your DataFrame:

结合您之前的问题 ( 1, 2) 中使用的想法，您可以使用groupby/transform为 DataFrame 中的每一行分配一个新值：

import numpy as np
import pandas as pd
pd.options.display.width = 1000

df = pd.DataFrame(
    {'bq_back_price': [1.87, 1.97, 3.05, 3.05, 5.8, 5.8, 200.0, 200.0, np.nan, np.nan], 
     'bq_balance': [1850.4, 1850.4, 1850.4, 1850.4, 1850.4, 1850.4, 1850.4, 
                    1850.4, 1850.4, 1850.4], 
     'bq_market_id': [155, 155, 155, 156, 156, 156, 157, 157, 157, 157], 
     'bq_selection_id': [45094462, 45094462, 45094463, 45094463, 45094464, 
                         45094464, 45094466, 45094466, 45094465, 45094465]})

grouped = df.groupby('bq_market_id')['bq_back_price']
df['second_lowest'] = grouped.transform(lambda x: x.nsmallest(2).max())
df['has_null'] = grouped.transform(lambda x: pd.isnull(x).any()).astype(bool)
print(df)

yields

产量

   bq_back_price  bq_balance  bq_market_id  bq_selection_id  second_lowest has_null
0           1.87      1850.4           155         45094462           1.97    False
1           1.97      1850.4           155         45094462           1.97    False
2           3.05      1850.4           155         45094463           1.97    False
3           3.05      1850.4           156         45094463           5.80    False
4           5.80      1850.4           156         45094464           5.80    False
5           5.80      1850.4           156         45094464           5.80    False
6         200.00      1850.4           157         45094466         200.00     True
7         200.00      1850.4           157         45094466         200.00     True
8            NaN      1850.4           157         45094465         200.00     True
9            NaN      1850.4           157         45094465         200.00     True

Answer 2

回答by mdurant

How about:

怎么样：

gb = df.groupby('bq_market_id')
df['second_lowest'] = gb.bq_back_price.apply(lambda x: x.sort_values(ascending=False).iloc[1])[df.bq_market_id]
df['none_val'] = gb.bq_back_price.apply(lambda x: x.isnull().values.any())[df.bq_market_id]

pandas 数据帧中每组熊猫的第二个最大值

提问by Arthur Zangiev

回答by unutbu

回答by mdurant

相关推荐

最近更新

标签

pandas 数据帧中每组熊猫的第二个最大值

提问by Arthur Zangiev

回答by unutbu

回答by mdurant

相关推荐

pandas 如何在 Python 中使用熊猫跳过读取空文件

pandas AttributeError: 'module' 对象没有属性 'version'

pandas pyspark中的java.lang.OutOfMemoryError

pandas 熊猫将行从 1 个 DF 移动到另一个 DF

相关推荐

最近更新

标签