pandas matplotlib 的 plt.acorr 中自相关图的错误?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27541290/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:46:25  来源:igfitidea点击:

bug of autocorrelation plot in matplotlib‘s plt.acorr?

pythonmatplotlibpandasstatsmodels

提问by Wu Fuheng

I am plotting autocorrelation with python. I used three ways to do it: 1. pandas, 2. matplotlib, 3. statsmodels. I found the graph I got from matplotlib is not consistent with the other two. The code is:

我正在用 python 绘制自相关图。我使用了三种方法来做到这一点:1.pandas,2.matplotlib,3.statsmodels。我发现我从 matplotlib 得到的图形与其他两个不一致。代码是:

 from statsmodels.graphics.tsaplots import *
 # print out data
 print mydata.values

 #1. pandas
 p=autocorrelation_plot(mydata)
 plt.title('mydata')

 #2. matplotlib
 fig=plt.figure()
 plt.acorr(mydata,maxlags=150)
 plt.title('mydata')

 #3. statsmodels.graphics.tsaplots.plot_acf
 plot_acf(mydata)
 plt.title('mydata')

The graph is here: http://quant365.com/viewtopic.php?f=4&t=33

图表在这里:http: //quant365.com/viewtopic.php?f=4&t=33

回答by Joe Kington

This is a result of different common definitions between statistics and signal processing. Basically, the signal processing definition assumes that you're going to handle the detrending. The statistical definition assumes that subtracting the mean is all the detrending you'll do, and does it for you.

这是统计和信号处理之间不同的共同定义的结果。基本上,信号处理定义假设您要处理去趋势。统计定义假定减去均值就是您要做的所有去趋势化操作,并且为您执行此操作。

First off, let's demonstrate the problem with a stand-alone example:

首先,让我们用一个独立的例子来演示这个问题:

import numpy as np
import matplotlib.pyplot as plt

import pandas as pd
from statsmodels.graphics import tsaplots

def label(ax, string):
    ax.annotate(string, (1, 1), xytext=(-8, -8), ha='right', va='top',
                size=14, xycoords='axes fraction', textcoords='offset points')

np.random.seed(1977)
data = np.random.normal(0, 1, 100).cumsum()

fig, axes = plt.subplots(nrows=4, figsize=(8, 12))
fig.tight_layout()

axes[0].plot(data)
label(axes[0], 'Raw Data')

axes[1].acorr(data, maxlags=data.size-1)
label(axes[1], 'Matplotlib Autocorrelation')

tsaplots.plot_acf(data, axes[2])
label(axes[2], 'Statsmodels Autocorrelation')

pd.tools.plotting.autocorrelation_plot(data, ax=axes[3])
label(axes[3], 'Pandas Autocorrelation')

# Remove some of the titles and labels that were automatically added
for ax in axes.flat:
    ax.set(title='', xlabel='')
plt.show()

enter image description here

在此处输入图片说明

So, why the heck am I saying that they're all correct? They're clearly different!

那么,为什么我要说它们都是正确的呢?他们显然不一样!

Let's write our own autocorrelation function to demonstrate what plt.acorris doing:

让我们编写我们自己的自相关函数来演示plt.acorr正在做什么:

def acorr(x, ax=None):
    if ax is None:
        ax = plt.gca()
    autocorr = np.correlate(x, x, mode='full')
    autocorr /= autocorr.max()

    return ax.stem(autocorr)

If we plot this with our data, we'll get a more-or-less identical result to plt.acorr(I'm leaving out properly labeling the lags, simply because I'm lazy):

如果我们用我们的数据绘制它,我们将得到或多或少相同的结果plt.acorr(我没有正确标记滞后,只是因为我很懒):

fig, ax = plt.subplots()
acorr(data)
plt.show()

enter image description here

在此处输入图片说明

This is a perfectly valid autocorrelation. It's all a matter of whether your background is signal processing or statistics.

这是一个完全有效的自相关。这完全取决于您的背景是信号处理还是统计。

This is the definition used in signal processing. The assumption is that you're going to handle detrending your data (note the detrendkwarg in plt.acorr). If you want it detrended, you'll explictly ask for it (and probably do something better than just subtracting the mean), and otherwise it shouldn't be assumed.

这是信号处理中使用的定义。假设您将处理数据去趋势化(注意 中的detrendkwarg plt.acorr)。如果你想要它去趋势化,你会明确地要求它(并且可能做一些比仅仅减去平均值更好的事情),否则不应该假设它。

In statistics, simply subtracting the mean is assumed to be what you wanted to do for detrending.

在统计学中,简单地减去平均值被认为是你想要做的去趋势。

All of the other functions are subtracting the mean of the data before the correlation, similar to this:

所有其他函数都是在相关之前减去数据的平均值,类似于:

def acorr(x, ax=None):
    if ax is None:
        ax = plt.gca()

    x = x - x.mean()

    autocorr = np.correlate(x, x, mode='full')
    autocorr /= autocorr.max()

    return ax.stem(autocorr)

fig, ax = plt.subplots()
acorr(data)
plt.show()

enter image description here

在此处输入图片说明

However, we still have one large difference. This one is purely a plotting convention.

但是,我们仍然有很大的不同。这纯粹是一种绘图约定。

In most signal processing textbooks (that I've seen, anyway), the "full" autocorrelation is displayed, such that zero lag is in the center, and the result is symmetric on each side. R, on the other hand, has the very reasonable convention to display only one side of it. (After all, the other side is completely redundant.) The statistical plotting functions follow the R convetion, and plt.acorrfollows what Matlab does, which is the opposite convention.

在大多数信号处理教科书中(无论如何我都看过),显示了“完全”自相关,因此零滞后位于中心,并且结果在每一侧都是对称的。另一方面,R 有非常合理的约定,只显示它的一侧。(毕竟对方是完全多余的。)统计绘图函数遵循R对流,plt.acorr遵循Matlab所做的,这是相反的约定。

Basically, you'd want this:

基本上,你会想要这个:

def acorr(x, ax=None):
    if ax is None:
        ax = plt.gca()

    x = x - x.mean()

    autocorr = np.correlate(x, x, mode='full')
    autocorr = autocorr[x.size:]
    autocorr /= autocorr.max()

    return ax.stem(autocorr)

fig, ax = plt.subplots()
acorr(data)
plt.show()

enter image description here

在此处输入图片说明