使用 Python 估计自相关

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14297012/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 11:00:47  来源:igfitidea点击:

Estimate Autocorrelation using Python

pythonnumpysignal-processing

提问by 8765674

I would like to perform Autocorrelation on the signal shown below. The time between two consecutive points is 2.5ms (or a repetition rate of 400Hz).

我想对下面显示的信号执行自相关。两个连续点之间的时间为 2.5ms(或 400Hz 的重复率)。

enter image description here

在此处输入图片说明

This is the equation for estimating autoacrrelation that I would like to use (Taken from http://en.wikipedia.org/wiki/Autocorrelation, section Estimation):

这是我想使用的估计 autoacrrelation 的方程(取自http://en.wikipedia.org/wiki/Autocorrelation,估计部分):

enter image description here

在此处输入图片说明

What is the simplest method of finding the estimated autocorrelation of my data in python? Is there something similar to numpy.correlatethat I can use?

在python中找到我的数据的估计自相关的最简单方法是什么?有没有类似于numpy.correlate我可以使用的东西?

Or should I just calculate the mean and variance?

或者我应该只计算均值和方差?



Edit:

编辑:

With help from unutbu, I have written:

unutbu 的帮助,我写道:

from numpy import *
import numpy as N
import pylab as P

fn = 'data.txt'
x = loadtxt(fn,unpack=True,usecols=[1])
time = loadtxt(fn,unpack=True,usecols=[0]) 

def estimated_autocorrelation(x):
    n = len(x)
    variance = x.var()
    x = x-x.mean()
    r = N.correlate(x, x, mode = 'full')[-n:]
    #assert N.allclose(r, N.array([(x[:n-k]*x[-(n-k):]).sum() for k in range(n)]))
    result = r/(variance*(N.arange(n, 0, -1)))
    return result

P.plot(time,estimated_autocorrelation(x))
P.xlabel('time (s)')
P.ylabel('autocorrelation')
P.show()

采纳答案by unutbu

I don't think there is a NumPy function for this particular calculation. Here is how I would write it:

我认为这个特定计算没有 NumPy 函数。这是我将如何编写它:

def estimated_autocorrelation(x):
    """
    http://stackoverflow.com/q/14297012/190597
    http://en.wikipedia.org/wiki/Autocorrelation#Estimation
    """
    n = len(x)
    variance = x.var()
    x = x-x.mean()
    r = np.correlate(x, x, mode = 'full')[-n:]
    assert np.allclose(r, np.array([(x[:n-k]*x[-(n-k):]).sum() for k in range(n)]))
    result = r/(variance*(np.arange(n, 0, -1)))
    return result

The assert statement is there to both check the calculation and to document its intent.

assert 语句用于检查计算并记录其意图。

When you are confident this function is behaving as expected, you can comment-out the assertstatement, or run your script with python -O. (The -Oflag tells Python to ignore assert statements.)

当您确信此函数的行为符合预期时,您可以注释掉该assert语句,或使用python -O. (该-O标志告诉 Python 忽略断言语句。)

回答by hamogu

The statsmodels package adds a autocorrelation function that internally uses np.correlate(according to the statsmodelsdocumentation).

statsmodels 包添加了一个内部使用的自相关函数np.correlate(根据statsmodels文档)。

See: http://statsmodels.sourceforge.net/stable/generated/statsmodels.tsa.stattools.acf.html#statsmodels.tsa.stattools.acf

请参阅:http: //statsmodels.sourceforge.net/stable/generated/statsmodels.tsa.stattools.acf.html#statsmodels.tsa.stattools.acf

回答by Kathirmani Sukumar

I took a part of code from pandas autocorrelation_plot() function. I checked the answers with R and the values are matching exactly.

我从 pandas autocorrelation_plot() 函数中提取了一部分代码。我用 R 检查了答案,值完全匹配。

import numpy
def acf(series):
    n = len(series)
    data = numpy.asarray(series)
    mean = numpy.mean(data)
    c0 = numpy.sum((data - mean) ** 2) / float(n)

    def r(h):
        acf_lag = ((data[:n - h] - mean) * (data[h:] - mean)).sum() / float(n) / c0
        return round(acf_lag, 3)
    x = numpy.arange(n) # Avoiding lag 0 calculation
    acf_coeffs = map(r, x)
    return acf_coeffs

回答by Alexander McFarlane

The method I wrote as of my latest edit is now faster than even scipy.statstools.acfwith fft=Trueuntil the sample size gets very large.

我写我的最新编辑的方法是比现在还要快scipy.statstools.acffft=True,直到试样尺寸变得非常大。

Error analysisIf you want to adjust for biases & get highly accurate error estimates: Look at my code herewhich implements this paperby Ulli Wolff (or original by UW in Matlab)

错误分析如果你想调整偏差并获得高度准确的错误估计:看看我的代码here,它实现了 Ulli Wolff 的这篇论文或原版 UW inMatlab

Functions Tested

功能测试

  • a = correlatedData(n=10000)is from a routine found here
  • gamma()is from same place as correlated_data()
  • acorr()is my function below
  • estimated_autocorrelationis found in another answer
  • acf()is from from statsmodels.tsa.stattools import acf
  • a = correlatedData(n=10000)来自这里的例程
  • gamma()来自同一个地方 correlated_data()
  • acorr()下面是我的功能
  • estimated_autocorrelation在另一个答案中找到
  • acf()来自 from statsmodels.tsa.stattools import acf

Timings

时间安排

%timeit a0, junk, junk = gamma(a, f=0)                            # puwr.py
%timeit a1 = [acorr(a, m, i) for i in range(l)]                   # my own
%timeit a2 = acf(a)                                               # statstools
%timeit a3 = estimated_autocorrelation(a)                         # numpy
%timeit a4 = acf(a, fft=True)                                     # stats FFT

## -- End pasted text --
100 loops, best of 3: 7.18 ms per loop
100 loops, best of 3: 2.15 ms per loop
10 loops, best of 3: 88.3 ms per loop
10 loops, best of 3: 87.6 ms per loop
100 loops, best of 3: 3.33 ms per loop

Edit... I checked again keeping l=40and changing n=10000to n=200000samples the FFT methods start to get a bit of traction and statsmodelsfft implementation just edges it... (order is the same)

编辑...我检查再次保持l=40和不断变化的n=10000,以n=200000样品的FFT方法开始变得有点牵引和statsmodelsFFT实现只是边缘吧...(顺序是一样的)

## -- End pasted text --
10 loops, best of 3: 86.2 ms per loop
10 loops, best of 3: 69.5 ms per loop
1 loops, best of 3: 16.2 s per loop
1 loops, best of 3: 16.3 s per loop
10 loops, best of 3: 52.3 ms per loop

Edit 2: I changed my routine and re-tested vs. the FFT for n=10000and n=20000

编辑 2:我改变了我的例行程序并重新测试了 FFTn=10000n=20000

a = correlatedData(n=200000); b=correlatedData(n=10000)
m = a.mean(); rng = np.arange(40); mb = b.mean()
%timeit a1 = map(lambda t:acorr(a, m, t), rng)
%timeit a1 = map(lambda t:acorr.acorr(b, mb, t), rng)
%timeit a4 = acf(a, fft=True)
%timeit a4 = acf(b, fft=True)

10 loops, best of 3: 73.3 ms per loop   # acorr below
100 loops, best of 3: 2.37 ms per loop  # acorr below
10 loops, best of 3: 79.2 ms per loop   # statstools with FFT
100 loops, best of 3: 2.69 ms per loop # statstools with FFT

Implementation

执行

def acorr(op_samples, mean, separation, norm = 1):
    """autocorrelation of a measured operator with optional normalisation
    the autocorrelation is measured over the 0th axis

    Required Inputs
        op_samples  :: np.ndarray :: the operator samples
        mean        :: float :: the mean of the operator
        separation  :: int :: the separation between HMC steps
        norm        :: float :: the autocorrelation with separation=0
    """
    return ((op_samples[:op_samples.size-separation] - mean)*(op_samples[separation:]- mean)).ravel().mean() / norm

4xspeedupcan be achieved below. You must be careful to only pass op_samples=a.copy()as it will modify the array aby a-=meanotherwise:

4x可以在下面实现加速。你必须小心只传递,否则op_samples=a.copy()它会修改数组:aa-=mean

op_samples -= mean
return (op_samples[:op_samples.size-separation]*op_samples[separation:]).ravel().mean() / norm

Sanity Check

完整性检查

enter image description here

在此处输入图片说明

Example Error Analysis

示例错误分析

This is a bit out of scope but I can't be bothered to redo the figure without the integrated autocorrelation time or integration window calculation. The autocorrelations with errors are clear in the bottom plot enter image description here

这有点超出范围,但如果没有积分自相关时间或积分窗口计算,我不会费心重做这个数字。底部图中的误差自相关很清楚 在此处输入图片说明

回答by Corin Dawson

I found this got the expected results with just a slight change:

我发现这得到了预期的结果,只需稍作改动:

def estimated_autocorrelation(x):
    n = len(x)
    variance = x.var()
    x = x-x.mean()
    r = N.correlate(x, x, mode = 'full')
    result = r/(variance*n)
    return result

Testing against Excel's autocorrelation results.

针对 Excel 的自相关结果进行测试。