根据python中的条件获取两列pandas数据框之间的差异
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36588522/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Take difference between two column of pandas dataframe based on condition in python
提问by User1090
I have a dataframe named pricecomp_df, I want to take compare the price of column "market price" and each of the other columns like "apple price","mangoes price", "watermelon price" but prioritize the difference based on the condition : (First priority is watermelon price, second to mangoes and third for apple). The input dataframe is given below:
我有一个名为 pricecomp_df 的数据框,我想比较“市场价格”列和“苹果价格”、“芒果价格”、“西瓜价格”等其他每一列的价格,但根据条件优先考虑差异:(第一优先是西瓜价格,其次是芒果,第三是苹果)。输入数据框如下:
code apple price mangoes price watermelon price market price
0 101 101 NaN NaN 122
1 102 123 123 NaN 124
2 103 NaN NaN NaN 123
3 105 123 167 NaN 154
4 107 165 NaN 177 176
5 110 123 NaN NaN 123
So here the first row has just apple price and market price then take their diff, but in second row, we have apple, mangoes price so i have to take only the difference between market price and mangoes price. likewise take the difference based on priority condition. Also skip the rows with nan for all three prices. Can anyone help on this?
所以这里第一行只有苹果价格和市场价格,然后取它们的差异,但在第二行,我们有苹果和芒果的价格,所以我只需要考虑市场价格和芒果价格之间的差异。同样根据优先级条件取差值。对于所有三个价格,也跳过带有 nan 的行。任何人都可以帮忙吗?
回答by MERose
Hope I'm not too late. The idea is to calculate the differences and overwrite them according to your priority list.
希望我还不算太晚。这个想法是计算差异并根据您的优先级列表覆盖它们。
import numpy as np
import pandas as pd
df = pd.DataFrame({'code': [101, 102, 103, 105, 107, 110],
'apple price': [101, 123, np.nan, 123, 165, 123],
'mangoes price': [np.nan, 123, np.nan, 167, np.nan, np.nan],
'watermelon price': [np.nan, np.nan, np.nan, np.nan, 177, np.nan],
'market price': [122, 124, 123, 154, 176, 123]})
# Calculate difference to apple price
df['diff'] = df['market price'] - df['apple price']
# Overwrite with difference to mangoes price
df['diff'] = df.apply(lambda x: x['market price'] - x['mangoes price'] if not np.isnan(x['mangoes price']) else x['diff'], axis=1)
# Overwrite with difference to watermelon price
df['diff'] = df.apply(lambda x: x['market price'] - x['watermelon price'] if not np.isnan(x['watermelon price']) else x['diff'], axis=1)
print df
apple price code mangoes price market price watermelon price diff
0 101 101 NaN 122 NaN 21
1 123 102 123 124 NaN 1
2 NaN 103 NaN 123 NaN NaN
3 123 105 167 154 NaN -13
4 165 107 NaN 176 177 -1
5 123 110 NaN 123 NaN 0