Python中时间序列中两个变量的相关性?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4809577/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 17:37:37  来源:igfitidea点击:

Correlation of Two Variables in a Time Series in Python?

pythonstatistics

提问by Kyle Brandt

If I have two different data sets that are in a time series, is there a simple way to find the correlation between the two sets in python?

如果我有两个不同的时间序列数据集,是否有一种简单的方法可以在 python 中找到这两组之间的相关性?

For example with:

例如:

# [ (dateTimeObject, y, z) ... ]
x = [ (8:00am, 12, 8), (8:10am, 15, 10) .... ]

How might I get the correlation of y and z in Python?

我如何在 Python 中获得 y 和 z 的相关性?

采纳答案by Wes McKinney

Little slow on the uptake here. pandas (http://github.com/wesm/pandas and pandas.sourceforge.net) is probably your best bet. I'm biased because I wrote it but:

这里的吸收速度有点慢。pandas(http://github.com/wesm/pandas 和 pandas.sourceforge.net)可能是你最好的选择。我有偏见,因为我写了它,但是:

In [7]: ts1
Out[7]: 
2000-01-03 00:00:00    -0.945653010936
2000-01-04 00:00:00    0.759529904445
2000-01-05 00:00:00    0.177646448683
2000-01-06 00:00:00    0.579750822716
2000-01-07 00:00:00    -0.0752734982291
2000-01-10 00:00:00    0.138730447557
2000-01-11 00:00:00    -0.506961851495

In [8]: ts2
Out[8]: 
2000-01-03 00:00:00    1.10436688823
2000-01-04 00:00:00    0.110075215713
2000-01-05 00:00:00    -0.372818939799
2000-01-06 00:00:00    -0.520443811368
2000-01-07 00:00:00    -0.455928700936
2000-01-10 00:00:00    1.49624355051
2000-01-11 00:00:00    -0.204383054598

In [9]: ts1.corr(ts2)
Out[9]: -0.34768587480980645

Notably if your data are over different sets of dates, it will compute the pairwise correlation. It will also automatically exclude NaN values!

值得注意的是,如果您的数据超过不同的日期集,它将计算成对相关性。它还会自动排除 NaN 值!

回答by kefeizhou

Scipyhas a statisticsmodule with correlation function.

Scipy有一个带有关联函数的统计模块。

from scipy import stats
# Y and Z are numpy arrays or lists of variables 
stats.pearsonr(Y, Z)

回答by etarion

You can do that via the covariance matrix or correlation coefficients. http://docs.scipy.org/doc/numpy/reference/generated/numpy.cov.htmland http://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.htmlare the documentation functions for this, the former also comes with a sample how to use it (corrcoef usage is very similar).

您可以通过协方差矩阵或相关系数来做到这一点。http://docs.scipy.org/doc/numpy/reference/generated/numpy.cov.htmlhttp://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html是文档函数为此,前者还附带了一个示例如何使用它(corrcoef用法非常相似)。

>>> x = [ (None, 12, 8), (None, 15, 10), (None, 10, 6) ]
>>> data = numpy.array([[e[1] for e in x], [e[2] for e in x]])
>>> numpy.corrcoef(data)
array([[ 1.        ,  0.99339927],
       [ 0.99339927,  1.        ]])

回答by jimmyb

Use numpy:

使用 numpy:

from numpy import *
v = [ ('k', 1, 2), ('l', 2, 4), ('m', 13, 9) ]
corrcoef([ a[1] for a in v ], [ a[2] for a in v ])[0,1]