沿每列计算 Pandas DataFrame 的自相关

Question

提问by fabian

I want to calculate the autocorrelation coefficients of lag length one among columns of a Pandas DataFrame. A snippet of my data is:

我想计算 Pandas DataFrame 列中滞后长度之一的自相关系数。我的数据片段是：

            RF        PC         C         D        PN        DN         P
year                                                                      
1890       NaN       NaN       NaN       NaN       NaN       NaN       NaN
1891 -0.028470 -0.052632  0.042254  0.081818 -0.045541  0.047619 -0.016974
1892 -0.249084  0.000000  0.027027  0.067227  0.099404  0.045455  0.122337
1893  0.653659  0.000000  0.000000  0.039370 -0.135624  0.043478 -0.142062

Along year, I want to calculate autocorrelations of lag one for each column (RF, PC, etc...).

沿着year，我想计算每一列（RF，PC等...）的滞后一的自相关。

To calculate the autocorrelations, I extracted two time series for each column whose start and end data differed by one year and then calculated correlation coefficients with numpy.corrcoef.

为了计算自相关，我为开始和结束数据相差一年的每一列提取了两个时间序列，然后使用numpy.corrcoef.

For example, I wrote:

例如，我写道：

numpy.corrcoef(data[['C']][1:-1],data[['C']][2:])

(the entire DataFrame is called data).
However, the command unfortunately returned:

（整个 DataFrame 被称为data）。
然而，命令不幸返回：

array([[ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       ..., 
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan]])

Can somebody kindly advise me on how to calculate autocorrelations?

有人可以就如何计算自相关向我提出建议吗？

Answer 1

采纳答案by joaquin

you should use:

你应该使用：

numpy.corrcoef(df['C'][1:-1], df['C'][2:])

df[['C']]represents a dataframe with only one column, while df['C']is a series containing the values in your C column.

df[['C']]表示只有一列的数据框，而df['C']是包含 C 列中值的系列。

Answer 2

回答by eclark

This is a late answer, but for future users, you can also use the pandas.Series.autocorr(), which calculates lag-N (default=1) autocorrelation on Series:

这是一个迟到的答案，但对于未来的用户，您还可以使用 pandas.Series.autocorr()，它计算系列上的滞后 N（默认值 = 1）自相关：

df['C'].autocorr(lag=1)

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.autocorr.html#pandas.Series.autocorr

Answer 3

回答by Brad Solomon

.autocorrappliesto Series, not DataFrames. You can use .applyto apply to a DataFrame:

.autocorr适用于系列，而不是数据帧。您可以用于.apply应用到 DataFrame：

def df_autocorr(df, lag=1, axis=0):
    """Compute full-sample column-wise autocorrelation for a DataFrame."""
    return df.apply(lambda col: col.autocorr(lag), axis=axis)
d1 = DataFrame(np.random.randn(100, 6))

df_autocorr(d1)
Out[32]: 
0    0.141
1   -0.028
2   -0.031
3    0.114
4   -0.121
5    0.060
dtype: float64

You could also compute rolling autocorrelations with a specified window as follows (this is what .autocorris doing under the hood):

您还可以使用指定的窗口计算滚动自相关，如下所示（这是.autocorr在幕后所做的）：

def df_rolling_autocorr(df, window, lag=1):
    """Compute rolling column-wise autocorrelation for a DataFrame."""

    return (df.rolling(window=window)
        .corr(df.shift(lag))) # could .dropna() here

df_rolling_autocorr(d1, window=21).dropna().head()
Out[38]: 
        0      1      2      3      4      5
21 -0.173 -0.367  0.142 -0.044 -0.080  0.012
22  0.015 -0.341  0.250 -0.036  0.023 -0.012
23  0.038 -0.329  0.279 -0.026  0.075 -0.121
24 -0.025 -0.361  0.319  0.117  0.031 -0.120
25  0.119 -0.320  0.181 -0.011  0.038 -0.111

沿每列计算 Pandas DataFrame 的自相关

提问by fabian

采纳答案by joaquin

回答by eclark

回答by Brad Solomon

相关推荐

最近更新

标签

沿每列计算 Pandas DataFrame 的自相关

提问by fabian

采纳答案by joaquin

回答by eclark

回答by Brad Solomon

相关推荐

仅使用一行交换 Pandas 数据框中选定行的列值的正确语法是什么？

如何使用 pandas.date_range() 在指定的开始日期和结束日期之间获取具有 n 个指定周期（相等）的时间序列

pandas 过滤数据以仅获取当月行的第一天

无法在 Pandas python 中绘制我的数据

相关推荐

最近更新

标签