pandas 带有 If 语句的 Python DataFrames For 循环不起作用

Question

提问by Cole Starbuck

I have a DataFrame called ES_15M_Summary, with coefficients/betas in on column titled ES_15M_Summary['Rolling_OLS_Coefficient'] as follows:

我有一个名为 ES_15M_Summary 的 DataFrame，在标题为 ES_15M_Summary['Rolling_OLS_Coefficient'] 的列中有系数/beta，如下所示：

If the above pictured column ('Rolling_OLS_Coefficient') is a value greater than .08, I want a new column titled 'Long' to be a binary 'Y'. If the value in the other column is less than .08, I want that value to be 'NaN' or just 'N' (either works).

如果上图中的列 ('Rolling_OLS_Coefficient') 的值大于 0.08，我希望名为 'Long' 的新列是一个二进制的 'Y'。如果另一列中的值小于 0.08，我希望该值是 'NaN' 或只是 'N'（任何一种都有效）。

So I'm writing a for loop to run down the columns. First, I created a new column titled 'Long' and set it to NaN:

所以我正在编写一个 for 循环来运行列。首先，我创建了一个名为“Long”的新列并将其设置为 NaN：

ES_15M_Summary['Long'] = np.nan

Then I made the following For Loop:

然后我做了以下 For 循环：

for index, row in ES_15M_Summary.iterrows():
    if ES_15M_Summary['Rolling_OLS_Coefficient'] > .08:
        ES_15M_Summary['Long'] = 'Y'
    else:
        ES_15M_Summary['Long'] = 'NaN'

I get the error:

我收到错误：

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

...referring to the if statement line shown above (if...>.08:). I'm not sure why I'm getting this error or what's wrong with the for loop. Any help is appreciated.

...参考上面显示的 if 语句行 (if...>.08:)。我不确定为什么会出现此错误或 for 循环有什么问题。任何帮助表示赞赏。

Answer 1

回答by jezrael

I think better is use numpy.where:

我认为更好的是使用numpy.where：

mask = ES_15M_Summary['Rolling_OLS_Coefficient'] > .08
ES_15M_Summary['Long'] = np.where(mask, 'Y', 'N')

Sample:

样本：

ES_15M_Summary = pd.DataFrame({'Rolling_OLS_Coefficient':[0.07,0.01,0.09]})
print (ES_15M_Summary)
   Rolling_OLS_Coefficient
0                     0.07
1                     0.01
2                     0.09

mask = ES_15M_Summary['Rolling_OLS_Coefficient'] > .08
ES_15M_Summary['Long'] = np.where(mask, 'Y', 'N')
print (ES_15M_Summary)
   Rolling_OLS_Coefficient Long
0                     0.07    N
1                     0.01    N
2                     0.09    Y

Looping, very slow solution:

循环，非常慢的解决方案：

for index, row in ES_15M_Summary.iterrows():
    if ES_15M_Summary.loc[index, 'Rolling_OLS_Coefficient'] > .08:
        ES_15M_Summary.loc[index,'Long'] = 'Y'
    else:
        ES_15M_Summary.loc[index,'Long'] = 'N'
print (ES_15M_Summary)
   Rolling_OLS_Coefficient Long
0                     0.07    N
1                     0.01    N
2                     0.09    Y

Timings:

时间：

#3000 rows
ES_15M_Summary = pd.DataFrame({'Rolling_OLS_Coefficient':[0.07,0.01,0.09] * 1000})
#print (ES_15M_Summary)


def loop(df):
    for index, row in ES_15M_Summary.iterrows():
        if ES_15M_Summary.loc[index, 'Rolling_OLS_Coefficient'] > .08:
            ES_15M_Summary.loc[index,'Long'] = 'Y'
        else:
            ES_15M_Summary.loc[index,'Long'] = 'N'
    return (ES_15M_Summary)

print (loop(ES_15M_Summary))


In [51]: %timeit (loop(ES_15M_Summary))
1 loop, best of 3: 2.38 s per loop

In [52]: %timeit ES_15M_Summary['Long'] = np.where(ES_15M_Summary['Rolling_OLS_Coefficient'] > .08, 'Y', 'N')
1000 loops, best of 3: 555 μs per loop

pandas 带有 If 语句的 Python DataFrames For 循环不起作用

提问by Cole Starbuck

回答by jezrael

相关推荐

最近更新

标签

pandas 带有 If 语句的 Python DataFrames For 循环不起作用

提问by Cole Starbuck

回答by jezrael

相关推荐

在 Pandas 中获取最大值的行

pandas python fastparquet 模块可以读取压缩的镶木地板文件吗？

pandas 使用pandas包用python清理excel数据

使用 Pandas DataFrame 样式为列着色（Python 3）

相关推荐

最近更新

标签