pandas 熊猫数据框中的条件列算术
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28190476/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Conditional column arithmetic in pandas dataframe
提问by user1718097
I have a pandas dataframe with the following structure:
我有一个具有以下结构的Pandas数据框:
import numpy as np
import pandas as pd
myData = pd.DataFrame({'x': [1.2,2.4,5.3,2.3,4.1], 'y': [6.7,7.5,8.1,5.3,8.3], 'condition':[1,1,np.nan,np.nan,1],'calculation': [np.nan]*5})
print myData
calculation condition x y
0 NaN 1 1.2 6.7
1 NaN 1 2.4 7.5
2 NaN NaN 5.3 8.1
3 NaN NaN 2.3 5.3
4 NaN 1 4.1 8.3
I want to enter a value in the 'calculation' column based on the values in 'x' and 'y' (e.g. x/y) but only in those cells where the 'condition' column contains NaN (np.isnan(myData['condition']). The final dataframe should look like this:
我想根据“x”和“y”(例如 x/y)中的值在“计算”列中输入一个值,但仅在“条件”列包含 NaN(np.isnan(myData[ 'condition']).最终的数据框应该是这样的:
calculation condition x y
0 NaN 1 1.2 6.7
1 NaN 1 2.4 7.5
2 0.654 NaN 5.3 8.1
3 0.434 NaN 2.3 5.3
4 NaN 1 4.1 8.3
I'm happy with the idea of stepping through each row in turn using a 'for' loop and then using 'if' statements to make the calculations but the actual dataframe I have is very large and I wanted do the calculations in an array-based way. Is this possible? I guess I could calculate the value for all rows and then delete the ones I don't want but this seems like a lot of wasted effort (the NaNs are quite rare in the dataframe) and, in some cases where 'condition' equals 1, the calculation cannot be made due to division by zero.
我对使用“for”循环依次遍历每一行然后使用“if”语句进行计算的想法感到满意,但我拥有的实际数据框非常大,我想在数组中进行计算-基于方式。这可能吗?我想我可以计算所有行的值,然后删除我不想要的行,但这似乎是浪费了很多精力(数据框中的 NaN 非常罕见),并且在某些情况下,“条件”等于 1 ,由于被零除,无法进行计算。
Thanks in advance.
提前致谢。
回答by EdChum
Use whereand pass your condition to it, this will then only perform your calculation where the rows meet the condition:
使用where并将您的条件传递给它,这将仅在行满足条件的情况下执行您的计算:
In [117]:
myData['calculation'] = (myData['x']/myData['y']).where(myData['condition'].isnull())
myData
Out[117]:
calculation condition x y
0 NaN 1 1.2 6.7
1 NaN 1 2.4 7.5
2 0.654321 NaN 5.3 8.1
3 0.433962 NaN 2.3 5.3
4 NaN 1 4.1 8.3

