Python中时间序列中两个变量的相关性?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4809577/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Correlation of Two Variables in a Time Series in Python?
提问by Kyle Brandt
If I have two different data sets that are in a time series, is there a simple way to find the correlation between the two sets in python?
如果我有两个不同的时间序列数据集,是否有一种简单的方法可以在 python 中找到这两组之间的相关性?
For example with:
例如:
# [ (dateTimeObject, y, z) ... ]
x = [ (8:00am, 12, 8), (8:10am, 15, 10) .... ]
How might I get the correlation of y and z in Python?
我如何在 Python 中获得 y 和 z 的相关性?
采纳答案by Wes McKinney
Little slow on the uptake here. pandas (http://github.com/wesm/pandas and pandas.sourceforge.net) is probably your best bet. I'm biased because I wrote it but:
这里的吸收速度有点慢。pandas(http://github.com/wesm/pandas 和 pandas.sourceforge.net)可能是你最好的选择。我有偏见,因为我写了它,但是:
In [7]: ts1
Out[7]:
2000-01-03 00:00:00 -0.945653010936
2000-01-04 00:00:00 0.759529904445
2000-01-05 00:00:00 0.177646448683
2000-01-06 00:00:00 0.579750822716
2000-01-07 00:00:00 -0.0752734982291
2000-01-10 00:00:00 0.138730447557
2000-01-11 00:00:00 -0.506961851495
In [8]: ts2
Out[8]:
2000-01-03 00:00:00 1.10436688823
2000-01-04 00:00:00 0.110075215713
2000-01-05 00:00:00 -0.372818939799
2000-01-06 00:00:00 -0.520443811368
2000-01-07 00:00:00 -0.455928700936
2000-01-10 00:00:00 1.49624355051
2000-01-11 00:00:00 -0.204383054598
In [9]: ts1.corr(ts2)
Out[9]: -0.34768587480980645
Notably if your data are over different sets of dates, it will compute the pairwise correlation. It will also automatically exclude NaN values!
值得注意的是,如果您的数据超过不同的日期集,它将计算成对相关性。它还会自动排除 NaN 值!
回答by Navi
回答by kefeizhou
Scipyhas a statisticsmodule with correlation function.
from scipy import stats
# Y and Z are numpy arrays or lists of variables
stats.pearsonr(Y, Z)
回答by etarion
You can do that via the covariance matrix or correlation coefficients. http://docs.scipy.org/doc/numpy/reference/generated/numpy.cov.htmland http://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.htmlare the documentation functions for this, the former also comes with a sample how to use it (corrcoef usage is very similar).
您可以通过协方差矩阵或相关系数来做到这一点。http://docs.scipy.org/doc/numpy/reference/generated/numpy.cov.html和http://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html是文档函数为此,前者还附带了一个示例如何使用它(corrcoef用法非常相似)。
>>> x = [ (None, 12, 8), (None, 15, 10), (None, 10, 6) ]
>>> data = numpy.array([[e[1] for e in x], [e[2] for e in x]])
>>> numpy.corrcoef(data)
array([[ 1. , 0.99339927],
[ 0.99339927, 1. ]])
回答by jimmyb
Use numpy:
使用 numpy:
from numpy import *
v = [ ('k', 1, 2), ('l', 2, 4), ('m', 13, 9) ]
corrcoef([ a[1] for a in v ], [ a[2] for a in v ])[0,1]

