使用 Python 估计自相关
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14297012/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Estimate Autocorrelation using Python
提问by 8765674
I would like to perform Autocorrelation on the signal shown below. The time between two consecutive points is 2.5ms (or a repetition rate of 400Hz).
我想对下面显示的信号执行自相关。两个连续点之间的时间为 2.5ms(或 400Hz 的重复率)。


This is the equation for estimating autoacrrelation that I would like to use (Taken from http://en.wikipedia.org/wiki/Autocorrelation, section Estimation):
这是我想使用的估计 autoacrrelation 的方程(取自http://en.wikipedia.org/wiki/Autocorrelation,估计部分):


What is the simplest method of finding the estimated autocorrelation of my data in python? Is there something similar to numpy.correlatethat I can use?
在python中找到我的数据的估计自相关的最简单方法是什么?有没有类似于numpy.correlate我可以使用的东西?
Or should I just calculate the mean and variance?
或者我应该只计算均值和方差?
Edit:
编辑:
With help from unutbu, I have written:
from numpy import *
import numpy as N
import pylab as P
fn = 'data.txt'
x = loadtxt(fn,unpack=True,usecols=[1])
time = loadtxt(fn,unpack=True,usecols=[0])
def estimated_autocorrelation(x):
n = len(x)
variance = x.var()
x = x-x.mean()
r = N.correlate(x, x, mode = 'full')[-n:]
#assert N.allclose(r, N.array([(x[:n-k]*x[-(n-k):]).sum() for k in range(n)]))
result = r/(variance*(N.arange(n, 0, -1)))
return result
P.plot(time,estimated_autocorrelation(x))
P.xlabel('time (s)')
P.ylabel('autocorrelation')
P.show()
采纳答案by unutbu
I don't think there is a NumPy function for this particular calculation. Here is how I would write it:
我认为这个特定计算没有 NumPy 函数。这是我将如何编写它:
def estimated_autocorrelation(x):
"""
http://stackoverflow.com/q/14297012/190597
http://en.wikipedia.org/wiki/Autocorrelation#Estimation
"""
n = len(x)
variance = x.var()
x = x-x.mean()
r = np.correlate(x, x, mode = 'full')[-n:]
assert np.allclose(r, np.array([(x[:n-k]*x[-(n-k):]).sum() for k in range(n)]))
result = r/(variance*(np.arange(n, 0, -1)))
return result
The assert statement is there to both check the calculation and to document its intent.
assert 语句用于检查计算并记录其意图。
When you are confident this function is behaving as expected, you can comment-out the assertstatement, or run your script with python -O. (The -Oflag tells Python to ignore assert statements.)
当您确信此函数的行为符合预期时,您可以注释掉该assert语句,或使用python -O. (该-O标志告诉 Python 忽略断言语句。)
回答by hamogu
The statsmodels package adds a autocorrelation function that internally uses np.correlate(according to the statsmodelsdocumentation).
statsmodels 包添加了一个内部使用的自相关函数np.correlate(根据statsmodels文档)。
请参阅:http: //statsmodels.sourceforge.net/stable/generated/statsmodels.tsa.stattools.acf.html#statsmodels.tsa.stattools.acf
回答by Kathirmani Sukumar
I took a part of code from pandas autocorrelation_plot() function. I checked the answers with R and the values are matching exactly.
我从 pandas autocorrelation_plot() 函数中提取了一部分代码。我用 R 检查了答案,值完全匹配。
import numpy
def acf(series):
n = len(series)
data = numpy.asarray(series)
mean = numpy.mean(data)
c0 = numpy.sum((data - mean) ** 2) / float(n)
def r(h):
acf_lag = ((data[:n - h] - mean) * (data[h:] - mean)).sum() / float(n) / c0
return round(acf_lag, 3)
x = numpy.arange(n) # Avoiding lag 0 calculation
acf_coeffs = map(r, x)
return acf_coeffs
回答by Alexander McFarlane
The method I wrote as of my latest edit is now faster than even scipy.statstools.acfwith fft=Trueuntil the sample size gets very large.
我写我的最新编辑的方法是比现在还要快scipy.statstools.acf用fft=True,直到试样尺寸变得非常大。
Error analysisIf you want to adjust for biases & get highly accurate error estimates: Look at my code herewhich implements this paperby Ulli Wolff (or original by UW in Matlab)
错误分析如果你想调整偏差并获得高度准确的错误估计:看看我的代码here,它实现了 Ulli Wolff 的这篇论文(或原版 UW inMatlab)
Functions Tested
功能测试
a = correlatedData(n=10000)is from a routine found heregamma()is from same place ascorrelated_data()acorr()is my function belowestimated_autocorrelationis found in another answeracf()is fromfrom statsmodels.tsa.stattools import acf
a = correlatedData(n=10000)来自这里的例程gamma()来自同一个地方correlated_data()acorr()下面是我的功能estimated_autocorrelation在另一个答案中找到acf()来自from statsmodels.tsa.stattools import acf
Timings
时间安排
%timeit a0, junk, junk = gamma(a, f=0) # puwr.py
%timeit a1 = [acorr(a, m, i) for i in range(l)] # my own
%timeit a2 = acf(a) # statstools
%timeit a3 = estimated_autocorrelation(a) # numpy
%timeit a4 = acf(a, fft=True) # stats FFT
## -- End pasted text --
100 loops, best of 3: 7.18 ms per loop
100 loops, best of 3: 2.15 ms per loop
10 loops, best of 3: 88.3 ms per loop
10 loops, best of 3: 87.6 ms per loop
100 loops, best of 3: 3.33 ms per loop
Edit... I checked again keeping l=40and changing n=10000to n=200000samples the FFT methods start to get a bit of traction and statsmodelsfft implementation just edges it... (order is the same)
编辑...我检查再次保持l=40和不断变化的n=10000,以n=200000样品的FFT方法开始变得有点牵引和statsmodelsFFT实现只是边缘吧...(顺序是一样的)
## -- End pasted text --
10 loops, best of 3: 86.2 ms per loop
10 loops, best of 3: 69.5 ms per loop
1 loops, best of 3: 16.2 s per loop
1 loops, best of 3: 16.3 s per loop
10 loops, best of 3: 52.3 ms per loop
Edit 2: I changed my routine and re-tested vs. the FFT for n=10000and n=20000
编辑 2:我改变了我的例行程序并重新测试了 FFTn=10000和n=20000
a = correlatedData(n=200000); b=correlatedData(n=10000)
m = a.mean(); rng = np.arange(40); mb = b.mean()
%timeit a1 = map(lambda t:acorr(a, m, t), rng)
%timeit a1 = map(lambda t:acorr.acorr(b, mb, t), rng)
%timeit a4 = acf(a, fft=True)
%timeit a4 = acf(b, fft=True)
10 loops, best of 3: 73.3 ms per loop # acorr below
100 loops, best of 3: 2.37 ms per loop # acorr below
10 loops, best of 3: 79.2 ms per loop # statstools with FFT
100 loops, best of 3: 2.69 ms per loop # statstools with FFT
Implementation
执行
def acorr(op_samples, mean, separation, norm = 1):
"""autocorrelation of a measured operator with optional normalisation
the autocorrelation is measured over the 0th axis
Required Inputs
op_samples :: np.ndarray :: the operator samples
mean :: float :: the mean of the operator
separation :: int :: the separation between HMC steps
norm :: float :: the autocorrelation with separation=0
"""
return ((op_samples[:op_samples.size-separation] - mean)*(op_samples[separation:]- mean)).ravel().mean() / norm
4xspeedupcan be achieved below. You must be careful to only pass op_samples=a.copy()as it will modify the array aby a-=meanotherwise:
4x可以在下面实现加速。你必须小心只传递,否则op_samples=a.copy()它会修改数组:aa-=mean
op_samples -= mean
return (op_samples[:op_samples.size-separation]*op_samples[separation:]).ravel().mean() / norm
Sanity Check
完整性检查
Example Error Analysis
示例错误分析
This is a bit out of scope but I can't be bothered to redo the figure without the integrated autocorrelation time or integration window calculation. The autocorrelations with errors are clear in the bottom plot

回答by Corin Dawson
I found this got the expected results with just a slight change:
我发现这得到了预期的结果,只需稍作改动:
def estimated_autocorrelation(x):
n = len(x)
variance = x.var()
x = x-x.mean()
r = N.correlate(x, x, mode = 'full')
result = r/(variance*n)
return result
Testing against Excel's autocorrelation results.
针对 Excel 的自相关结果进行测试。

