Pandas 计算 ewm 错误吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37924377/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 15:43:25  来源:igfitidea点击:

Does Pandas calculate ewm wrong?

pandasexponentialmoving-average

提问by jeronimo

When trying to calculate the exponential moving average (EMA) from financial data in a dataframe it seems that Pandas' ewm approach is incorrect.

当试图从数据框中的金融数据计算指数移动平均线 (EMA) 时,Pandas 的 ewm 方法似乎是不正确的。

The basics are well explained in the following link: http://stockcharts.com/school/doku.php?id=chart_school:technical_indicators:moving_averages

以下链接很好地解释了基础知识:http: //stockcharts.com/school/doku.php?id=chart_school: technical_indicators: moving_averages

When going to Pandas explanation, the approach taken is as follows (using the "adjust" parameter as False):

在去 Pandas 解释时,采取的方法如下(使用“调整”参数为 False):

   weighted_average[0] = arg[0];
   weighted_average[i] = (1-alpha) * weighted_average[i-1] + alpha * arg[i]

This in my view is incorrect. The "arg" should be (for example) the closing values, however, arg[0] is the first average (i.e. the simple average of the first series of data of the length of the period selected), but NOT the first closing value. arg[0] and arg[i] can therefore never be from the same data. Using the "min_periods" parameter does not seem to resolve this.

这在我看来是不正确的。“arg”应该是(例如)收盘值,但是,arg[0] 是第一个平均值(即所选周期长度的第一个数据系列的简单平均值),而不是第一个收盘值. 因此 arg[0] 和 arg[i] 永远不可能来自相同的数据。使用“min_periods”参数似乎不能解决这个问题。

Can anyone explain me how (or if) Pandas can be used to properly calculate the EMA of data?

谁能解释一下如何(或是否)可以使用 Pandas 来正确计算数据的 EMA?

回答by chrisb

There are several ways to initialize an exponential moving average, so I wouldn't say pandas is doing it wrong, just different.

有几种方法可以初始化指数移动平均线,所以我不会说熊猫做错了,只是不同。

Here would be a way to calculate it like you want:

这是一种根据需要计算它的方法:

In [20]: s.head()
Out[20]: 
0    22.27
1    22.19
2    22.08
3    22.17
4    22.18
Name: Price, dtype: float64

In [21]: span = 10

In [22]: sma = s.rolling(window=span, min_periods=span).mean()[:span]

In [24]: rest = s[span:]

In [25]: pd.concat([sma, rest]).ewm(span=span, adjust=False).mean()
Out[25]: 
0           NaN
1           NaN
2           NaN
3           NaN
4           NaN
5           NaN
6           NaN
7           NaN
8           NaN
9     22.221000
10    22.208091
11    22.241165
12    22.266408
13    22.328879
14    22.516356
15    22.795200
16    22.968800
17    23.125382
18    23.275312
19    23.339801
20    23.427110
21    23.507635
22    23.533520
23    23.471062
24    23.403596
25    23.390215
26    23.261085
27    23.231797
28    23.080561
29    22.915004
Name: Price, dtype: float64

回答by arkochhar

You can compute EWMA using alpha or coefficient (span) in Pandas ewmfunction.

您可以span在 Pandasewm函数中使用 alpha 或系数 ( )计算 EWMA 。

Formula for using alpha: (1 - alpha) * previous_val + alpha * current_valwhere alpha = 1 / period

使用 alpha 的公式:(1 - alpha) * previous_val + alpha * current_val其中alpha = 1 / period

Formula for using coeff: ((current_val - previous_val) * coeff) + previous_valwhere coeff = 2 / (period + 1)

使用系数的公式:((current_val - previous_val) * coeff) + previous_val其中coeff = 2 / (period + 1)

Here is how you can use Pandas for computing above formulas:

以下是使用 Pandas 计算上述公式的方法:

con = pd.concat([df[:period][base].rolling(window=period).mean(), df[period:][base]])

if (alpha == True):
    df[target] = con.ewm(alpha=1 / period, adjust=False).mean()
else:
    df[target] = con.ewm(span=period, adjust=False).mean()

回答by Ben

Here's an example of how Pandas calculates both adjusted and non-adjusted ewm:

以下是 Pandas 如何计算调整后和未调整 ewm 的示例:

name = 'closing'
series = pd.Series([1, 2, 3, 5, 8, 13, 21, 34], name=name).to_frame()
period = 4
alpha = 2/(1+period)

series[name+'_ewma'] = np.nan
series.loc[0, name+'_ewma'] = series[name].iloc[0]

series[name+'_ewma_adjust'] = np.nan
series.loc[0, name+'_ewma_adjust'] = series[name].iloc[0]

for i in range(1, len(series)):
    series.loc[i, name+'_ewma'] = (1-alpha) * series.loc[i-1, name+'_ewma'] + alpha * series.loc[i, name]

    ajusted_weights = np.array([(1-alpha)**(i-t) for t in range(i+1)])
    series.loc[i, name+'_ewma_adjust'] = np.sum(series.iloc[0:i+1][name].values * ajusted_weights) / ajusted_weights.sum()

print(series)
print("diff adjusted=False -> ", np.sum(series[name+'_ewma'] - series[name].ewm(span=period, adjust=False).mean()))
print("diff adjusted=True -> ", np.sum(series[name+'_ewma_adjust'] - series[name].ewm(span=period, adjust=True).mean()))

Mathematical formula can be found at https://github.com/pandas-dev/pandas/issues/8861

数学公式可以在https://github.com/pandas-dev/pandas/issues/8861找到

回答by tentativafc

If you are calculating ewm of ewm (Like MACD formula), you will have bad results because the second and following ewm will use index starting by 0 and ending with period. I use the following solution.

如果您正在计算 ewm 的 ewm(如 MACD 公式),您将得到不好的结果,因为第二个和接下来的 ewm 将使用从 0 开始并以句点结束的索引。我使用以下解决方案。

sma = df['Close'].rolling(period, min_periods=period).mean()
#this variable is used to shift index by non null start minus period
idx_start = sma.isna().sum() + 1 - period
idx_end = idx_start + period
sma = sma[idx_start: idx_end]
rest = df[item][idx_end:]
ema = pd.concat([sma, rest]).ewm(span=period, adjust=False).mean()