Python中时间序列中两个变量的相关性？

Question

提问by Kyle Brandt

If I have two different data sets that are in a time series, is there a simple way to find the correlation between the two sets in python?

如果我有两个不同的时间序列数据集，是否有一种简单的方法可以在 python 中找到这两组之间的相关性？

For example with:

例如：

# [ (dateTimeObject, y, z) ... ]
x = [ (8:00am, 12, 8), (8:10am, 15, 10) .... ]

How might I get the correlation of y and z in Python?

我如何在 Python 中获得 y 和 z 的相关性？

Answer 1

采纳答案by Wes McKinney

Little slow on the uptake here. pandas (http://github.com/wesm/pandas and pandas.sourceforge.net) is probably your best bet. I'm biased because I wrote it but:

这里的吸收速度有点慢。pandas（http://github.com/wesm/pandas 和 pandas.sourceforge.net）可能是你最好的选择。我有偏见，因为我写了它，但是：

In [7]: ts1
Out[7]: 
2000-01-03 00:00:00    -0.945653010936
2000-01-04 00:00:00    0.759529904445
2000-01-05 00:00:00    0.177646448683
2000-01-06 00:00:00    0.579750822716
2000-01-07 00:00:00    -0.0752734982291
2000-01-10 00:00:00    0.138730447557
2000-01-11 00:00:00    -0.506961851495

In [8]: ts2
Out[8]: 
2000-01-03 00:00:00    1.10436688823
2000-01-04 00:00:00    0.110075215713
2000-01-05 00:00:00    -0.372818939799
2000-01-06 00:00:00    -0.520443811368
2000-01-07 00:00:00    -0.455928700936
2000-01-10 00:00:00    1.49624355051
2000-01-11 00:00:00    -0.204383054598

In [9]: ts1.corr(ts2)
Out[9]: -0.34768587480980645

Notably if your data are over different sets of dates, it will compute the pairwise correlation. It will also automatically exclude NaN values!

值得注意的是，如果您的数据超过不同的日期集，它将计算成对相关性。它还会自动排除 NaN 值！

Answer 2

回答by Navi

I would recommend the pandas library http://pandas.sourceforge.net/generated/pandas.DataFrame.corr.html?highlight=corr#pandas.DataFrame.corr

我会推荐熊猫库http://pandas.sourceforge.net/generated/pandas.DataFrame.corr.html?highlight=corr#pandas.DataFrame.corr

Answer 3

回答by kefeizhou

Scipyhas a statisticsmodule with correlation function.

Scipy有一个带有关联函数的统计模块。

from scipy import stats
# Y and Z are numpy arrays or lists of variables 
stats.pearsonr(Y, Z)

Answer 4

回答by etarion

You can do that via the covariance matrix or correlation coefficients. http://docs.scipy.org/doc/numpy/reference/generated/numpy.cov.htmland http://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.htmlare the documentation functions for this, the former also comes with a sample how to use it (corrcoef usage is very similar).

您可以通过协方差矩阵或相关系数来做到这一点。http://docs.scipy.org/doc/numpy/reference/generated/numpy.cov.html和http://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html是文档函数为此，前者还附带了一个示例如何使用它（corrcoef用法非常相似）。

>>> x = [ (None, 12, 8), (None, 15, 10), (None, 10, 6) ]
>>> data = numpy.array([[e[1] for e in x], [e[2] for e in x]])
>>> numpy.corrcoef(data)
array([[ 1.        ,  0.99339927],
       [ 0.99339927,  1.        ]])

Answer 5

回答by jimmyb

Use numpy:

使用 numpy：

from numpy import *
v = [ ('k', 1, 2), ('l', 2, 4), ('m', 13, 9) ]
corrcoef([ a[1] for a in v ], [ a[2] for a in v ])[0,1]

Python中时间序列中两个变量的相关性？

提问by Kyle Brandt

采纳答案by Wes McKinney

回答by Navi

回答by kefeizhou

回答by etarion

回答by jimmyb

相关推荐

最近更新

标签

Python中时间序列中两个变量的相关性？

提问by Kyle Brandt

采纳答案by Wes McKinney

回答by Navi

回答by kefeizhou

回答by etarion

回答by jimmyb

相关推荐

Python 字符串文字前面带有“r”是什么意思？

Python 中的表达式是什么？

使用 Python 计算文本文件中的行数、单词数和字符数

Python 为什么列表索引必须是整数，而不是元组？

相关推荐

最近更新

标签