沿每列计算 Pandas DataFrame 的自相关
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26083293/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Calculating Autocorrelation of Pandas DataFrame along each Column
提问by fabian
I want to calculate the autocorrelation coefficients of lag length one among columns of a Pandas DataFrame. A snippet of my data is:
我想计算 Pandas DataFrame 列中滞后长度之一的自相关系数。我的数据片段是:
RF PC C D PN DN P
year
1890 NaN NaN NaN NaN NaN NaN NaN
1891 -0.028470 -0.052632 0.042254 0.081818 -0.045541 0.047619 -0.016974
1892 -0.249084 0.000000 0.027027 0.067227 0.099404 0.045455 0.122337
1893 0.653659 0.000000 0.000000 0.039370 -0.135624 0.043478 -0.142062
Along year, I want to calculate autocorrelations of lag one for each column (RF, PC, etc...).
沿着year,我想计算每一列(RF,PC等...)的滞后一的自相关。
To calculate the autocorrelations, I extracted two time series for each column whose start and end data differed by one year and then calculated correlation coefficients with numpy.corrcoef.
为了计算自相关,我为开始和结束数据相差一年的每一列提取了两个时间序列,然后使用numpy.corrcoef.
For example, I wrote:
例如,我写道:
numpy.corrcoef(data[['C']][1:-1],data[['C']][2:])
numpy.corrcoef(data[['C']][1:-1],data[['C']][2:])
(the entire DataFrame is called data).
However, the command unfortunately returned:
(整个 DataFrame 被称为data)。
然而,命令不幸返回:
array([[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
...,
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan]])
Can somebody kindly advise me on how to calculate autocorrelations?
有人可以就如何计算自相关向我提出建议吗?
采纳答案by joaquin
you should use:
你应该使用:
numpy.corrcoef(df['C'][1:-1], df['C'][2:])
df[['C']]represents a dataframe with only one column, while df['C']is a series containing the values in your C column.
df[['C']]表示只有一列的数据框,而df['C']是包含 C 列中值的系列。
回答by eclark
This is a late answer, but for future users, you can also use the pandas.Series.autocorr(), which calculates lag-N (default=1) autocorrelation on Series:
这是一个迟到的答案,但对于未来的用户,您还可以使用 pandas.Series.autocorr(),它计算系列上的滞后 N(默认值 = 1)自相关:
df['C'].autocorr(lag=1)
回答by Brad Solomon
.autocorrappliesto Series, not DataFrames. You can use .applyto apply to a DataFrame:
.autocorr适用于系列,而不是数据帧。您可以用于.apply应用到 DataFrame:
def df_autocorr(df, lag=1, axis=0):
"""Compute full-sample column-wise autocorrelation for a DataFrame."""
return df.apply(lambda col: col.autocorr(lag), axis=axis)
d1 = DataFrame(np.random.randn(100, 6))
df_autocorr(d1)
Out[32]:
0 0.141
1 -0.028
2 -0.031
3 0.114
4 -0.121
5 0.060
dtype: float64
You could also compute rolling autocorrelations with a specified window as follows (this is what .autocorris doing under the hood):
您还可以使用指定的窗口计算滚动自相关,如下所示(这是.autocorr在幕后所做的):
def df_rolling_autocorr(df, window, lag=1):
"""Compute rolling column-wise autocorrelation for a DataFrame."""
return (df.rolling(window=window)
.corr(df.shift(lag))) # could .dropna() here
df_rolling_autocorr(d1, window=21).dropna().head()
Out[38]:
0 1 2 3 4 5
21 -0.173 -0.367 0.142 -0.044 -0.080 0.012
22 0.015 -0.341 0.250 -0.036 0.023 -0.012
23 0.038 -0.329 0.279 -0.026 0.075 -0.121
24 -0.025 -0.361 0.319 0.117 0.031 -0.120
25 0.119 -0.320 0.181 -0.011 0.038 -0.111

