沿每列计算 Pandas DataFrame 的自相关

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26083293/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:30:52  来源:igfitidea点击:

Calculating Autocorrelation of Pandas DataFrame along each Column

pythonnumpypandas

提问by fabian

I want to calculate the autocorrelation coefficients of lag length one among columns of a Pandas DataFrame. A snippet of my data is:

我想计算 Pandas DataFrame 列中滞后长度之一的自相关系数。我的数据片段是:

            RF        PC         C         D        PN        DN         P
year                                                                      
1890       NaN       NaN       NaN       NaN       NaN       NaN       NaN
1891 -0.028470 -0.052632  0.042254  0.081818 -0.045541  0.047619 -0.016974
1892 -0.249084  0.000000  0.027027  0.067227  0.099404  0.045455  0.122337
1893  0.653659  0.000000  0.000000  0.039370 -0.135624  0.043478 -0.142062

Along year, I want to calculate autocorrelations of lag one for each column (RF, PC, etc...).

沿着year,我想计算每一列(RFPC等...)的滞后一的自相关。

To calculate the autocorrelations, I extracted two time series for each column whose start and end data differed by one year and then calculated correlation coefficients with numpy.corrcoef.

为了计算自相关,我为开始和结束数据相差一年的每一列提取了两个时间序列,然后使用numpy.corrcoef.

For example, I wrote:

例如,我写道:

numpy.corrcoef(data[['C']][1:-1],data[['C']][2:])

numpy.corrcoef(data[['C']][1:-1],data[['C']][2:])

(the entire DataFrame is called data).
However, the command unfortunately returned:

(整个 DataFrame 被称为data)。
然而,命令不幸返回:

array([[ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       ..., 
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan]])

Can somebody kindly advise me on how to calculate autocorrelations?

有人可以就如何计算自相关向我提出建议吗?

采纳答案by joaquin

you should use:

你应该使用:

numpy.corrcoef(df['C'][1:-1], df['C'][2:])

df[['C']]represents a dataframe with only one column, while df['C']is a series containing the values in your C column.

df[['C']]表示只有一列的数据框,而df['C']是包含 C 列中值的系列。

回答by eclark

This is a late answer, but for future users, you can also use the pandas.Series.autocorr(), which calculates lag-N (default=1) autocorrelation on Series:

这是一个迟到的答案,但对于未来的用户,您还可以使用 pandas.Series.autocorr(),它计算系列上的滞后 N(默认值 = 1)自相关:

df['C'].autocorr(lag=1)

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.autocorr.html#pandas.Series.autocorr

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.autocorr.html#pandas.Series.autocorr

回答by Brad Solomon

.autocorrappliesto Series, not DataFrames. You can use .applyto apply to a DataFrame:

.autocorr适用于系列,而不是数据帧。您可以用于.apply应用到 DataFrame:

def df_autocorr(df, lag=1, axis=0):
    """Compute full-sample column-wise autocorrelation for a DataFrame."""
    return df.apply(lambda col: col.autocorr(lag), axis=axis)
d1 = DataFrame(np.random.randn(100, 6))

df_autocorr(d1)
Out[32]: 
0    0.141
1   -0.028
2   -0.031
3    0.114
4   -0.121
5    0.060
dtype: float64

You could also compute rolling autocorrelations with a specified window as follows (this is what .autocorris doing under the hood):

您还可以使用指定的窗口计算滚动自相关,如下所示(这是.autocorr幕后所做的):

def df_rolling_autocorr(df, window, lag=1):
    """Compute rolling column-wise autocorrelation for a DataFrame."""

    return (df.rolling(window=window)
        .corr(df.shift(lag))) # could .dropna() here

df_rolling_autocorr(d1, window=21).dropna().head()
Out[38]: 
        0      1      2      3      4      5
21 -0.173 -0.367  0.142 -0.044 -0.080  0.012
22  0.015 -0.341  0.250 -0.036  0.023 -0.012
23  0.038 -0.329  0.279 -0.026  0.075 -0.121
24 -0.025 -0.361  0.319  0.117  0.031 -0.120
25  0.119 -0.320  0.181 -0.011  0.038 -0.111