Python 如何使用 NumPy 计算移动平均线?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14313510/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 11:03:43  来源:igfitidea点击:

How to calculate moving average using NumPy?

pythonnumpyscipytime-seriesmoving-average

提问by goncalopp

There seems to be no function that simply calculates the moving average on numpy/scipy, leading to convoluted solutions.

似乎没有函数可以简单地计算 numpy/scipy 的移动平均值,从而导致复杂的解决方案

My question is two-fold:

我的问题有两个:

  • What's the easiest way to (correctly) implement a moving average with numpy?
  • Since this seems non-trivial and error prone, is there a good reason not to have the batteries includedin this case?
  • 使用 numpy(正确)实现移动平均线的最简单方法是什么?
  • 既然这看起来很重要且容易出错,那么在这种情况下是否有充分的理由不包含电池

回答by Jaime

If you just want a straightforward non-weighted moving average, you can easily implement it with np.cumsum, which may beisfaster than FFT based methods:

如果你只是想要一个简单的非加权移动平均线,您可以轻松地实现它np.cumsum,这可能比快FFT为基础的方法:

EDITCorrected an off-by-one wrong indexing spotted by Bean in the code. EDIT

编辑更正了由 Bean 在代码中发现的一对一错误索引。编辑

def moving_average(a, n=3) :
    ret = np.cumsum(a, dtype=float)
    ret[n:] = ret[n:] - ret[:-n]
    return ret[n - 1:] / n

>>> a = np.arange(20)
>>> moving_average(a)
array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,
        12.,  13.,  14.,  15.,  16.,  17.,  18.])
>>> moving_average(a, n=4)
array([  1.5,   2.5,   3.5,   4.5,   5.5,   6.5,   7.5,   8.5,   9.5,
        10.5,  11.5,  12.5,  13.5,  14.5,  15.5,  16.5,  17.5])

So I guess the answer is: it is really easy to implement, and maybe numpy is already a little bloated with specialized functionality.

所以我想答案是:它真的很容易实现,也许 numpy 已经有点臃肿,具有专门的功能。

回答by doug

NumPy's lack of a particular domain-specific function is perhaps due to the Core Team's discipline and fidelity to NumPy's prime directive: provide an N-dimensional array type, as well as functions for creating, and indexing those arrays. Like many foundational objectives, this one is not small, and NumPy does it brilliantly.

NumPy 缺乏特定领域的特定功能可能是由于核心团队的纪律和忠于 NumPy 的主要指令:提供 N 维数组类型,以及用于创建和索引这些数组的函数。像许多基本目标一样,这个目标并不小,而且 NumPy 做得非常出色。

The (much) larger SciPycontains a much larger collection of domain-specific libraries (called subpackagesby SciPy devs)--for instance, numerical optimization (optimize), signal processsing (signal), and integral calculus (integrate).

(更大)的SciPy包含更大的特定领​​域库集合(SciPy 开发人员称为子包)——例如,数值优化 ( optimize)、信号处理 ( signal) 和积分微积分 ( integrate)。

My guess is that the function you are after is in at least one of the SciPy subpackages (scipy.signalperhaps); however, i would look first in the collection of SciPy scikits, identify the relevant scikit(s) and look for the function of interest there.

我的猜测是您所追求的功能至少在 SciPy 子包之一中(可能是scipy.signal);然而,我会首先查看SciPy scikits的集合,确定相关的 scikit(s) 并在那里寻找感兴趣的功能。

Scikitsare independently developed packages based on NumPy/SciPy and directed to a particular technical discipline (e.g., scikits-image, scikits-learn, etc.) Several of these were (in particular, the awesome OpenOptfor numerical optimization) were highly regarded, mature projects long before choosing to reside under the relatively new scikitsrubric. The Scikitshomepage liked to above lists about 30 such scikits, though at least several of those are no longer under active development.

Scikits是基于 NumPy/SciPy 的独立开发的软件包,并针对特定的技术学科(例如scikits-imagescikits-learn等)。其中一些(特别是用于数值优化的令人敬畏的OpenOpt)受到高度重视,成熟的项目早在选择驻留在相对较新的scikits标题下之前。该Scikits主页喜欢约30个这样的上述清单scikits,但至少数那些正在积极发展不再。

Following this advice would lead you to scikits-timeseries; however, that package is no longer under active development; In effect, Pandashas become, AFAIK, the de factoNumPy-based time series library.

遵循此建议将引导您使用scikits-timeseries;但是,该软件包不再处于积极开发之中;实际上,Pandas已成为事实上的基于NumPy的时间序列库AFAIK 。

Pandashas several functions that can be used to calculate a moving average; the simplest of these is probably rolling_mean, which you use like so:

Pandas有几个函数可以用来计算移动平均线;其中最简单的可能是rolling_mean,您可以像这样使用它:

>>> # the recommended syntax to import pandas
>>> import pandas as PD
>>> import numpy as NP

>>> # prepare some fake data:
>>> # the date-time indices:
>>> t = PD.date_range('1/1/2010', '12/31/2012', freq='D')

>>> # the data:
>>> x = NP.arange(0, t.shape[0])

>>> # combine the data & index into a Pandas 'Series' object
>>> D = PD.Series(x, t)

Now, just call the function rolling_meanpassing in the Series object and a window size, which in my example below is 10 days.

现在,只需调用函数rolling_mean并传入 Series 对象和一个window size,在我下面的示例中为10 days

>>> d_mva = PD.rolling_mean(D, 10)

>>> # d_mva is the same size as the original Series
>>> d_mva.shape
    (1096,)

>>> # though obviously the first w values are NaN where w is the window size
>>> d_mva[:3]
    2010-01-01         NaN
    2010-01-02         NaN
    2010-01-03         NaN

verify that it worked--e.g., compared values 10 - 15 in the original series versus the new Series smoothed with rolling mean

验证它是否有效 - 例如,将原始系列中的值 10 - 15 与使用滚动平均值平滑的新系列进行比较

>>> D[10:15]
     2010-01-11    2.041076
     2010-01-12    2.041076
     2010-01-13    2.720585
     2010-01-14    2.720585
     2010-01-15    3.656987
     Freq: D

>>> d_mva[10:20]
      2010-01-11    3.131125
      2010-01-12    3.035232
      2010-01-13    2.923144
      2010-01-14    2.811055
      2010-01-15    2.785824
      Freq: D

The function rolling_mean, along with about a dozen or so other function are informally grouped in the Pandas documentation under the rubric moving windowfunctions; a second, related group of functions in Pandas is referred to as exponentially-weighted functions (e.g., ewma, which calculates exponentially moving weighted average). The fact that this second group is not included in the first (moving windowfunctions) is perhaps because the exponentially-weighted transforms don't rely on a fixed-length window

函数rolling_mean 以及大约十几个其他函数被非正式地分组在Pandas 文档中的标题移动窗口函数下;Pandas 中的第二组相关函数称为指数加权函数(例如,ewma,它计算指数移动加权平均值)。第二组不包含在第一组(移动窗口函数)中的事实可能是因为指数加权变换不依赖于固定长度的窗口

回答by Peixiang Zhong

In case you want to take care the edge conditions carefully (compute mean only from available elements at edges), the following function will do the trick.

如果您想仔细注意边缘条件(仅从边缘的可用元素计算平均值),以下函数将起作用。

import numpy as np

def running_mean(x, N):
    out = np.zeros_like(x, dtype=np.float64)
    dim_len = x.shape[0]
    for i in range(dim_len):
        if N%2 == 0:
            a, b = i - (N-1)//2, i + (N-1)//2 + 2
        else:
            a, b = i - (N-1)//2, i + (N-1)//2 + 1

        #cap indices to min and max indices
        a = max(0, a)
        b = min(dim_len, b)
        out[i] = np.mean(x[a:b])
    return out

>>> running_mean(np.array([1,2,3,4]), 2)
array([1.5, 2.5, 3.5, 4. ])

>>> running_mean(np.array([1,2,3,4]), 3)
array([1.5, 2. , 3. , 3.5])

回答by Vladtn

This answer using Pandas is adapted from above, as rolling_meanis not part of Pandas anymore

这个使用 Pandas 的答案是从上面改编的,因为rolling_mean它不再是 Pandas 的一部分

# the recommended syntax to import pandas
import pandas as pd
import numpy as np

# prepare some fake data:
# the date-time indices:
t = pd.date_range('1/1/2010', '12/31/2012', freq='D')

# the data:
x = np.arange(0, t.shape[0])

# combine the data & index into a Pandas 'Series' object
D = pd.Series(x, t)

Now, just call the function rollingon the dataframe with a window size, which in my example below is 10 days.

现在,只需rolling使用窗口大小调用数据帧上的函数,在我下面的示例中为 10 天。

d_mva10 = D.rolling(10).mean()

# d_mva is the same size as the original Series
# though obviously the first w values are NaN where w is the window size
d_mva10[:11]

2010-01-01    NaN
2010-01-02    NaN
2010-01-03    NaN
2010-01-04    NaN
2010-01-05    NaN
2010-01-06    NaN
2010-01-07    NaN
2010-01-08    NaN
2010-01-09    NaN
2010-01-10    4.5
2010-01-11    5.5
Freq: D, dtype: float64

回答by Anthony Anyanwu

I feel this can be easily solved using bottleneck

我觉得这可以使用bottleneck轻松解决

See basic sample below:

请参阅下面的基本示例:

import numpy as np
import bottleneck as bn

a = np.random.randint(4, 1000, size=(5, 7))
mm = bn.move_mean(a, window=2, min_count=1)

This gives move mean along each axis.

这给出了沿每个轴的移动平均值。

  • "mm" is the moving mean for "a".

  • "window" is the max number of entries to consider for moving mean.

  • "min_count" is min number of entries to consider for moving mean (e.g. for first element or if the array has nan values).

  • “mm”是“a”的移动平均值。

  • “窗口”是移动均值要考虑的最大条目数。

  • "min_count" 是考虑移动平均值的最小条目数(例如,对于第一个元素或数组是否具有 nan 值)。

The good part is Bottleneck helps to deal with nan values and it's also very efficient.

好的部分是瓶颈有助于处理 nan 值,而且它也非常有效。

回答by yatu

A simple way to achieve this is by using np.convolve. The idea behind this is to leverage the way the discrete convolutionis computed and use it to return a rolling mean. This can be done by convolving with a sequence of np.onesof a length equal to the sliding window length we want.

实现此目的的一种简单方法是使用np.convolve. 这背后的想法是利用离散卷积的计算方式并使用它来返回滚动均值。这可以通过与np.ones长度等于我们想要的滑动窗口长度的序列进行卷积来完成。

In order to do so we could define the following function:

为此,我们可以定义以下函数:

def moving_average(x, w):
    return np.convolve(x, np.ones(w), 'valid') / w

This function will be taking the convolution of the sequence xand a sequence of ones of length w. Note that the chosen modeis validso that the convolution product is only given for points where the sequences overlap completely.

此函数将采用序列x和长度为 的序列的卷积w。请注意,所选择的modevalid仅针对序列完全重叠的点给出卷积乘积。



Some examples:

一些例子:

x = np.array([5,3,8,10,2,1,5,1,0,2])

For a moving average with a window of length 2we would have:

对于具有长度窗口的移动平均线,2我们将有:

moving_average(x, 2)
# array([4. , 5.5, 9. , 6. , 1.5, 3. , 3. , 0.5, 1. ])

And for a window of length 4:

对于长度为 的窗口4

moving_average(x, 4)
# array([6.5 , 5.75, 5.25, 4.5 , 2.25, 1.75, 2.  ])


How does convolvework?

如何convolve工作?

Lets have a more in depth look at the way the discrete convolution is being computed. The following function aims to replicate the way np.convolveis computing the output values:

让我们更深入地了解离散卷积的计算方式。以下函数旨在复制np.convolve计算输出值的方式:

def mov_avg(x, w):
    for m in range(len(x)-(w-1)):
        yield sum(np.ones(w) * x[m:m+w]) / w 

Which, for the same example above would also yield:

对于上面的相同示例,其中也将产生:

list(mov_avg(x, 2))
# [4.0, 5.5, 9.0, 6.0, 1.5, 3.0, 3.0, 0.5, 1.0]

So what is being done at each step is to take the inner product between the array of ones and the current window. In this case the multiplication by np.ones(w)is superfluous given that we are directly taking the sumof the sequence.

因此,每一步所做的就是在 1 的数组和当前window之间取内积。在这种情况下,乘法np.ones(w)是多余的,因为我们直接取sum序列的 。

Bellow is an example of how the first outputs are computed so that it is a little clearer. Lets suppose we want a window of w=4:

Bellow 是如何计算第一个输出的示例,以便更清楚一些。假设我们想要一个窗口w=4

[1,1,1,1]
[5,3,8,10,2,1,5,1,0,2]
= (1*5 + 1*3 + 1*8 + 1*10) / w = 6.5

And the following output would be computed as:

以下输出将计算为:

  [1,1,1,1]
[5,3,8,10,2,1,5,1,0,2]
= (1*3 + 1*8 + 1*10 + 1*2) / w = 5.75

And so on, returning a moving average of the sequence once all overlaps have been performed.

依此类推,一旦执行完所有重叠,就返回序列的移动平均值。

回答by cbartondock

I actually wanted a slightly different behavior than the accepted answer. I was building a moving average feature extractor for an sklearnpipeline, so I required that the output of the moving average have the same dimension as the input. What I want is for the moving average to assume the series stays constant, ie a moving average of [1,2,3,4,5]with window 2 would give [1.5,2.5,3.5,4.5,5.0].

我实际上想要一种与接受的答案略有不同的行为。我正在为sklearn管道构建移动平均特征提取器,因此我要求移动平均的输出与输入具有相同的维度。我想要的是移动平均线假设系列保持不变,即[1,2,3,4,5]窗口 2的移动平均线会给出[1.5,2.5,3.5,4.5,5.0]

For column vectors (my use case) we get

对于列向量(我的用例),我们得到

def moving_average_col(X, n):
  z2 = np.cumsum(np.pad(X, ((n,0),(0,0)), 'constant', constant_values=0), axis=0)
  z1 = np.cumsum(np.pad(X, ((0,n),(0,0)), 'constant', constant_values=X[-1]), axis=0)
  return (z1-z2)[(n-1):-1]/n

And for arrays

而对于数组

def moving_average_array(X, n):
  z2 = np.cumsum(np.pad(X, (n,0), 'constant', constant_values=0))
  z1 = np.cumsum(np.pad(X, (0,n), 'constant', constant_values=X[-1]))
  return (z1-z2)[(n-1):-1]/n

Of course, one doesn't have to assume constant values for the padding, but doing so should be adequate in most cases.

当然,不必为填充假设常数值,但在大多数情况下这样做应该就足够了。

回答by argentum2f

Here are a variety of ways to do this, along with some benchmarks. The best methods are versions using optimized code from other libraries. The bottleneck.move_meanmethod is probably best all around. The scipy.convolveapproach is also very fast, extensible, and syntactically and conceptually simple, but doesn't scale well for very large window values. The numpy.cumsummethod is good if you need a pure numpyapproach.

这里有多种方法可以做到这一点,以及一些基准。最好的方法是使用来自其他库的优化代码的版本。该bottleneck.move_mean方法可能是最好的。该scipy.convolve方法也非常快速、可扩展,并且在语法和概念上都很简单,但是对于非常大的窗口值不能很好地扩展。numpy.cumsum如果您需要纯numpy方法,该方法很好。

Note:Some of these (e.g. bottleneck.move_mean) are not centered, and will shift your data.

注意:其中一些(例如bottleneck.move_mean)不居中,并且会移动您的数据。

import numpy as np
import scipy as sci
import scipy.signal as sig
import pandas as pd
import bottleneck as bn
import time as time

def rollavg_direct(a,n): 
    'Direct "for" loop'
    assert n%2==1
    b = a*0.0
    for i in range(len(a)) :
        b[i]=a[max(i-n//2,0):min(i+n//2+1,len(a))].mean()
    return b

def rollavg_comprehension(a,n):
    'List comprehension'
    assert n%2==1
    r,N = int(n/2),len(a)
    return np.array([a[max(i-r,0):min(i+r+1,N)].mean() for i in range(N)]) 

def rollavg_convolve(a,n):
    'scipy.convolve'
    assert n%2==1
    return sci.convolve(a,np.ones(n,dtype='float')/n, 'same')[n//2:-n//2+1]  

def rollavg_convolve_edges(a,n):
    'scipy.convolve, edge handling'
    assert n%2==1
    return sci.convolve(a,np.ones(n,dtype='float'), 'same')/sci.convolve(np.ones(len(a)),np.ones(n), 'same')  

def rollavg_cumsum(a,n):
    'numpy.cumsum'
    assert n%2==1
    cumsum_vec = np.cumsum(np.insert(a, 0, 0)) 
    return (cumsum_vec[n:] - cumsum_vec[:-n]) / n

def rollavg_cumsum_edges(a,n):
    'numpy.cumsum, edge handling'
    assert n%2==1
    N = len(a)
    cumsum_vec = np.cumsum(np.insert(np.pad(a,(n-1,n-1),'constant'), 0, 0)) 
    d = np.hstack((np.arange(n//2+1,n),np.ones(N-n)*n,np.arange(n,n//2,-1)))  
    return (cumsum_vec[n+n//2:-n//2+1] - cumsum_vec[n//2:-n-n//2]) / d

def rollavg_roll(a,n):
    'Numpy array rolling'
    assert n%2==1
    N = len(a)
    rolling_idx = np.mod((N-1)*np.arange(n)[:,None] + np.arange(N), N)
    return a[rolling_idx].mean(axis=0)[n-1:] 

def rollavg_roll_edges(a,n):
    # see https://stackoverflow.com/questions/42101082/fast-numpy-roll
    'Numpy array rolling, edge handling'
    assert n%2==1
    a = np.pad(a,(0,n-1-n//2), 'constant')*np.ones(n)[:,None]
    m = a.shape[1]
    idx = np.mod((m-1)*np.arange(n)[:,None] + np.arange(m), m) # Rolling index
    out = a[np.arange(-n//2,n//2)[:,None], idx]
    d = np.hstack((np.arange(1,n),np.ones(m-2*n+1+n//2)*n,np.arange(n,n//2,-1)))
    return (out.sum(axis=0)/d)[n//2:]

def rollavg_pandas(a,n):
    'Pandas rolling average'
    return pd.DataFrame(a).rolling(n, center=True, min_periods=1).mean().to_numpy()

def rollavg_bottlneck(a,n):
    'bottleneck.move_mean'
    return bn.move_mean(a, window=n, min_count=1)

N = 10**6
a = np.random.rand(N)
functions = [rollavg_direct, rollavg_comprehension, rollavg_convolve, 
        rollavg_convolve_edges, rollavg_cumsum, rollavg_cumsum_edges, 
        rollavg_pandas, rollavg_bottlneck, rollavg_roll, rollavg_roll_edges]

print('Small window (n=3)')
%load_ext memory_profiler
for f in functions : 
    print('\n'+f.__doc__+ ' : ')
    %timeit b=f(a,3)

print('\nLarge window (n=1001)')
for f in functions[0:-2] : 
    print('\n'+f.__doc__+ ' : ')
    %timeit b=f(a,1001)

print('\nMemory\n')
print('Small window (n=3)')
N = 10**7
a = np.random.rand(N)
%load_ext memory_profiler
for f in functions[2:] : 
    print('\n'+f.__doc__+ ' : ')
    %memit b=f(a,3)

print('\nLarge window (n=1001)')
for f in functions[2:-2] : 
    print('\n'+f.__doc__+ ' : ')
    %memit b=f(a,1001)

Timing, Small window (n=3)

定时,小窗口 (n=3)

Direct "for" loop : 

4.14 s ± 23.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

List comprehension : 
3.96 s ± 27.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

scipy.convolve : 
1.07 ms ± 26.7 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

scipy.convolve, edge handling : 
4.68 ms ± 9.69 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

numpy.cumsum : 
5.31 ms ± 5.11 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

numpy.cumsum, edge handling : 
8.52 ms ± 11.1 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Pandas rolling average : 
9.85 ms ± 9.63 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

bottleneck.move_mean : 
1.3 ms ± 12.2 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Numpy array rolling : 
31.3 ms ± 91.9 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Numpy array rolling, edge handling : 
61.1 ms ± 55.9 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Timing, Large window (n=1001)

时序,大窗口 (n=1001)

Direct "for" loop : 
4.67 s ± 34 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

List comprehension : 
4.46 s ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

scipy.convolve : 
103 ms ± 165 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

scipy.convolve, edge handling : 
272 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

numpy.cumsum : 
5.19 ms ± 12.4 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

numpy.cumsum, edge handling : 
8.7 ms ± 11.5 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Pandas rolling average : 
9.67 ms ± 199 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

bottleneck.move_mean : 
1.31 ms ± 15.7 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Memory, Small window (n=3)

内存,小窗口 (n=3)

The memory_profiler extension is already loaded. To reload it, use:
  %reload_ext memory_profiler

scipy.convolve : 
peak memory: 362.66 MiB, increment: 73.61 MiB

scipy.convolve, edge handling : 
peak memory: 510.24 MiB, increment: 221.19 MiB

numpy.cumsum : 
peak memory: 441.81 MiB, increment: 152.76 MiB

numpy.cumsum, edge handling : 
peak memory: 518.14 MiB, increment: 228.84 MiB

Pandas rolling average : 
peak memory: 449.34 MiB, increment: 160.02 MiB

bottleneck.move_mean : 
peak memory: 374.17 MiB, increment: 75.54 MiB

Numpy array rolling : 
peak memory: 661.29 MiB, increment: 362.65 MiB

Numpy array rolling, edge handling : 
peak memory: 1111.25 MiB, increment: 812.61 MiB

Memory, Large window (n=1001)

内存,大窗口 (n=1001)

scipy.convolve : 
peak memory: 370.62 MiB, increment: 71.83 MiB

scipy.convolve, edge handling : 
peak memory: 521.98 MiB, increment: 223.18 MiB

numpy.cumsum : 
peak memory: 451.32 MiB, increment: 152.52 MiB

numpy.cumsum, edge handling : 
peak memory: 527.51 MiB, increment: 228.71 MiB

Pandas rolling average : 
peak memory: 451.25 MiB, increment: 152.50 MiB

bottleneck.move_mean : 
peak memory: 374.64 MiB, increment: 75.85 MiB

回答by Josmoor98

talibcontains a simple moving average tool, as well as other similar averaging tools (i.e. exponential moving average). Below compares the method to some of the other solutions.

talib包含一个简单的移动平均工具,以及其他类似的平均工具(即指数移动平均)。下面将该方法与其他一些解决方案进行比较。



%timeit pd.Series(np.arange(100000)).rolling(3).mean()
2.53 ms ± 40.5 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit talib.SMA(real = np.arange(100000.), timeperiod = 3)
348 μs ± 3.5 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit moving_average(np.arange(100000))
638 μs ± 45.1 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


One caveat is that the real must have elements of dtype = float. Otherwise the following error is raised

一个警告是 real 必须具有 的元素dtype = float。否则会引发以下错误

Exception: real is not double

例外:real 不是 double

回答by Mott The Tuple

Here is a fast implementation using numba (mind the types). Note it does contain nans where shifted.

这是使用 numba 的快速实现(注意类型)。请注意,它确实包含移位的 nans。

import numpy as np
import numba as nb

@nb.jit(nb.float64[:](nb.float64[:],nb.int64),
        fastmath=True,nopython=True)
def moving_average( array, window ):    
    ret = np.cumsum(array)
    ret[window:] = ret[window:] - ret[:-window]
    ma = ret[window - 1:] / window
    n = np.empty(window-1); n.fill(np.nan)
    return np.concatenate((n.ravel(), ma.ravel()))