pandas 带有 If 语句的 Python DataFrames For 循环不起作用
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42376810/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python DataFrames For Loop with If Statement not working
提问by Cole Starbuck
I have a DataFrame called ES_15M_Summary, with coefficients/betas in on column titled ES_15M_Summary['Rolling_OLS_Coefficient'] as follows:
我有一个名为 ES_15M_Summary 的 DataFrame,在标题为 ES_15M_Summary['Rolling_OLS_Coefficient'] 的列中有系数/beta,如下所示:
If the above pictured column ('Rolling_OLS_Coefficient') is a value greater than .08, I want a new column titled 'Long' to be a binary 'Y'. If the value in the other column is less than .08, I want that value to be 'NaN' or just 'N' (either works).
如果上图中的列 ('Rolling_OLS_Coefficient') 的值大于 0.08,我希望名为 'Long' 的新列是一个二进制的 'Y'。如果另一列中的值小于 0.08,我希望该值是 'NaN' 或只是 'N'(任何一种都有效)。
So I'm writing a for loop to run down the columns. First, I created a new column titled 'Long' and set it to NaN:
所以我正在编写一个 for 循环来运行列。首先,我创建了一个名为“Long”的新列并将其设置为 NaN:
ES_15M_Summary['Long'] = np.nan
Then I made the following For Loop:
然后我做了以下 For 循环:
for index, row in ES_15M_Summary.iterrows():
if ES_15M_Summary['Rolling_OLS_Coefficient'] > .08:
ES_15M_Summary['Long'] = 'Y'
else:
ES_15M_Summary['Long'] = 'NaN'
I get the error:
我收到错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
...referring to the if statement line shown above (if...>.08:). I'm not sure why I'm getting this error or what's wrong with the for loop. Any help is appreciated.
...参考上面显示的 if 语句行 (if...>.08:)。我不确定为什么会出现此错误或 for 循环有什么问题。任何帮助表示赞赏。
回答by jezrael
I think better is use numpy.where
:
我认为更好的是使用numpy.where
:
mask = ES_15M_Summary['Rolling_OLS_Coefficient'] > .08
ES_15M_Summary['Long'] = np.where(mask, 'Y', 'N')
Sample:
样本:
ES_15M_Summary = pd.DataFrame({'Rolling_OLS_Coefficient':[0.07,0.01,0.09]})
print (ES_15M_Summary)
Rolling_OLS_Coefficient
0 0.07
1 0.01
2 0.09
mask = ES_15M_Summary['Rolling_OLS_Coefficient'] > .08
ES_15M_Summary['Long'] = np.where(mask, 'Y', 'N')
print (ES_15M_Summary)
Rolling_OLS_Coefficient Long
0 0.07 N
1 0.01 N
2 0.09 Y
Looping, very slow solution:
循环,非常慢的解决方案:
for index, row in ES_15M_Summary.iterrows():
if ES_15M_Summary.loc[index, 'Rolling_OLS_Coefficient'] > .08:
ES_15M_Summary.loc[index,'Long'] = 'Y'
else:
ES_15M_Summary.loc[index,'Long'] = 'N'
print (ES_15M_Summary)
Rolling_OLS_Coefficient Long
0 0.07 N
1 0.01 N
2 0.09 Y
Timings:
时间:
#3000 rows
ES_15M_Summary = pd.DataFrame({'Rolling_OLS_Coefficient':[0.07,0.01,0.09] * 1000})
#print (ES_15M_Summary)
def loop(df):
for index, row in ES_15M_Summary.iterrows():
if ES_15M_Summary.loc[index, 'Rolling_OLS_Coefficient'] > .08:
ES_15M_Summary.loc[index,'Long'] = 'Y'
else:
ES_15M_Summary.loc[index,'Long'] = 'N'
return (ES_15M_Summary)
print (loop(ES_15M_Summary))
In [51]: %timeit (loop(ES_15M_Summary))
1 loop, best of 3: 2.38 s per loop
In [52]: %timeit ES_15M_Summary['Long'] = np.where(ES_15M_Summary['Rolling_OLS_Coefficient'] > .08, 'Y', 'N')
1000 loops, best of 3: 555 μs per loop