Pandas 计算 ewm 错误吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37924377/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Does Pandas calculate ewm wrong?
提问by jeronimo
When trying to calculate the exponential moving average (EMA) from financial data in a dataframe it seems that Pandas' ewm approach is incorrect.
当试图从数据框中的金融数据计算指数移动平均线 (EMA) 时,Pandas 的 ewm 方法似乎是不正确的。
The basics are well explained in the following link: http://stockcharts.com/school/doku.php?id=chart_school:technical_indicators:moving_averages
以下链接很好地解释了基础知识:http: //stockcharts.com/school/doku.php?id=chart_school: technical_indicators: moving_averages
When going to Pandas explanation, the approach taken is as follows (using the "adjust" parameter as False):
在去 Pandas 解释时,采取的方法如下(使用“调整”参数为 False):
weighted_average[0] = arg[0];
weighted_average[i] = (1-alpha) * weighted_average[i-1] + alpha * arg[i]
This in my view is incorrect. The "arg" should be (for example) the closing values, however, arg[0] is the first average (i.e. the simple average of the first series of data of the length of the period selected), but NOT the first closing value. arg[0] and arg[i] can therefore never be from the same data. Using the "min_periods" parameter does not seem to resolve this.
这在我看来是不正确的。“arg”应该是(例如)收盘值,但是,arg[0] 是第一个平均值(即所选周期长度的第一个数据系列的简单平均值),而不是第一个收盘值. 因此 arg[0] 和 arg[i] 永远不可能来自相同的数据。使用“min_periods”参数似乎不能解决这个问题。
Can anyone explain me how (or if) Pandas can be used to properly calculate the EMA of data?
谁能解释一下如何(或是否)可以使用 Pandas 来正确计算数据的 EMA?
回答by chrisb
There are several ways to initialize an exponential moving average, so I wouldn't say pandas is doing it wrong, just different.
有几种方法可以初始化指数移动平均线,所以我不会说熊猫做错了,只是不同。
Here would be a way to calculate it like you want:
这是一种根据需要计算它的方法:
In [20]: s.head()
Out[20]:
0 22.27
1 22.19
2 22.08
3 22.17
4 22.18
Name: Price, dtype: float64
In [21]: span = 10
In [22]: sma = s.rolling(window=span, min_periods=span).mean()[:span]
In [24]: rest = s[span:]
In [25]: pd.concat([sma, rest]).ewm(span=span, adjust=False).mean()
Out[25]:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 22.221000
10 22.208091
11 22.241165
12 22.266408
13 22.328879
14 22.516356
15 22.795200
16 22.968800
17 23.125382
18 23.275312
19 23.339801
20 23.427110
21 23.507635
22 23.533520
23 23.471062
24 23.403596
25 23.390215
26 23.261085
27 23.231797
28 23.080561
29 22.915004
Name: Price, dtype: float64
回答by arkochhar
You can compute EWMA using alpha or coefficient (span
) in Pandas ewm
function.
您可以span
在 Pandasewm
函数中使用 alpha 或系数 ( )计算 EWMA 。
Formula for using alpha: (1 - alpha) * previous_val + alpha * current_val
where alpha = 1 / period
使用 alpha 的公式:(1 - alpha) * previous_val + alpha * current_val
其中alpha = 1 / period
Formula for using coeff: ((current_val - previous_val) * coeff) + previous_val
where coeff = 2 / (period + 1)
使用系数的公式:((current_val - previous_val) * coeff) + previous_val
其中coeff = 2 / (period + 1)
Here is how you can use Pandas for computing above formulas:
以下是使用 Pandas 计算上述公式的方法:
con = pd.concat([df[:period][base].rolling(window=period).mean(), df[period:][base]])
if (alpha == True):
df[target] = con.ewm(alpha=1 / period, adjust=False).mean()
else:
df[target] = con.ewm(span=period, adjust=False).mean()
回答by Ben
Here's an example of how Pandas calculates both adjusted and non-adjusted ewm:
以下是 Pandas 如何计算调整后和未调整 ewm 的示例:
name = 'closing'
series = pd.Series([1, 2, 3, 5, 8, 13, 21, 34], name=name).to_frame()
period = 4
alpha = 2/(1+period)
series[name+'_ewma'] = np.nan
series.loc[0, name+'_ewma'] = series[name].iloc[0]
series[name+'_ewma_adjust'] = np.nan
series.loc[0, name+'_ewma_adjust'] = series[name].iloc[0]
for i in range(1, len(series)):
series.loc[i, name+'_ewma'] = (1-alpha) * series.loc[i-1, name+'_ewma'] + alpha * series.loc[i, name]
ajusted_weights = np.array([(1-alpha)**(i-t) for t in range(i+1)])
series.loc[i, name+'_ewma_adjust'] = np.sum(series.iloc[0:i+1][name].values * ajusted_weights) / ajusted_weights.sum()
print(series)
print("diff adjusted=False -> ", np.sum(series[name+'_ewma'] - series[name].ewm(span=period, adjust=False).mean()))
print("diff adjusted=True -> ", np.sum(series[name+'_ewma_adjust'] - series[name].ewm(span=period, adjust=True).mean()))
Mathematical formula can be found at https://github.com/pandas-dev/pandas/issues/8861
回答by tentativafc
If you are calculating ewm of ewm (Like MACD formula), you will have bad results because the second and following ewm will use index starting by 0 and ending with period. I use the following solution.
如果您正在计算 ewm 的 ewm(如 MACD 公式),您将得到不好的结果,因为第二个和接下来的 ewm 将使用从 0 开始并以句点结束的索引。我使用以下解决方案。
sma = df['Close'].rolling(period, min_periods=period).mean()
#this variable is used to shift index by non null start minus period
idx_start = sma.isna().sum() + 1 - period
idx_end = idx_start + period
sma = sma[idx_start: idx_end]
rest = df[item][idx_end:]
ema = pd.concat([sma, rest]).ewm(span=period, adjust=False).mean()