Python 为什么我从 grangercausalitytests 得到“LinAlgError: Singular matrix”?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44305456/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why am I getting "LinAlgError: Singular matrix" from grangercausalitytests?
提问by displayname
I am trying to run grangercausalitytests
on two time series:
我试图grangercausalitytests
在两个时间序列上运行:
import numpy as np
import pandas as pd
from statsmodels.tsa.stattools import grangercausalitytests
n = 1000
ls = np.linspace(0, 2*np.pi, n)
df1 = pd.DataFrame(np.sin(ls))
df2 = pd.DataFrame(2*np.sin(1+ls))
df = pd.concat([df1, df2], axis=1)
df.plot()
grangercausalitytests(df, maxlag=20)
However, I am getting
但是,我得到
Granger Causality
number of lags (no zero) 1
ssr based F test: F=272078066917221398041264652288.0000, p=0.0000 , df_denom=996, df_num=1
ssr based chi2 test: chi2=272897579166972095424217743360.0000, p=0.0000 , df=1
likelihood ratio test: chi2=60811.2671, p=0.0000 , df=1
parameter F test: F=272078066917220553616334520320.0000, p=0.0000 , df_denom=996, df_num=1
Granger Causality
number of lags (no zero) 2
ssr based F test: F=7296.6976, p=0.0000 , df_denom=995, df_num=2
ssr based chi2 test: chi2=14637.3954, p=0.0000 , df=2
likelihood ratio test: chi2=2746.0362, p=0.0000 , df=2
parameter F test: F=13296850090491009488285469769728.0000, p=0.0000 , df_denom=995, df_num=2
...
/usr/local/lib/python3.5/dist-packages/numpy/linalg/linalg.py in _raise_linalgerror_singular(err, flag)
88
89 def _raise_linalgerror_singular(err, flag):
---> 90 raise LinAlgError("Singular matrix")
91
92 def _raise_linalgerror_nonposdef(err, flag):
LinAlgError: Singular matrix
and I am not sure why this is the case.
我不确定为什么会这样。
回答by jotasi
The problem arises due to the perfect correlation between the two series in your data. From the traceback, you can see, that internally a wald test is used to compute the maximum likelihood estimates for the parameters of the lag-time series. To do this an estimate of the parameters covariance matrix (which is then near-zero) and its inverse is needed (as you can also see in the line invcov = np.linalg.inv(cov_p)
in the traceback). This near-zero matrix is now singular for some maximum lag number (>=5) and thus the test crashes. If you add just a little noise to your data, the error disappears:
问题的出现是由于数据中两个系列之间的完美相关性。从回溯中,您可以看到,内部使用 wald 检验来计算滞后时间序列参数的最大似然估计。为此,需要对参数协方差矩阵(然后接近零)及其逆矩阵的估计(正如您invcov = np.linalg.inv(cov_p)
在回溯中的行中所见)。对于某个最大滞后数 (>=5),这个接近零的矩阵现在是奇异的,因此测试崩溃。如果您向数据中添加一点噪音,错误就会消失:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import grangercausalitytests
n = 1000
ls = np.linspace(0, 2*np.pi, n)
df1Clean = pd.DataFrame(np.sin(ls))
df2Clean = pd.DataFrame(2*np.sin(ls+1))
dfClean = pd.concat([df1Clean, df2Clean], axis=1)
dfDirty = dfClean+0.00001*np.random.rand(n, 2)
grangercausalitytests(dfClean, maxlag=20, verbose=False) # Raises LinAlgError
grangercausalitytests(dfDirty, maxlag=20, verbose=False) # Runs fine
回答by user12081571
Another thing to keep an eye out for is duplicate columns. Duplicate columns will have a correlation score of 1.0, resulting in singularity. Otherwise, it's also possible you have 2 features that are perfectly correlated. And easy way to check this is with df.corr()
, and look for pairs of columns with correlation = 1.0.
另一件需要注意的事情是重复的列。重复列的相关性分数为 1.0,导致奇异性。否则,您也可能有 2 个完全相关的特征。检查这一点的简单方法是使用df.corr()
,并查找相关性 = 1.0 的列对。