用 Python 和 Numpy 计算协方差

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15317822/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 19:47:44  来源:igfitidea点击:

Calculating Covariance with Python and Numpy

pythonnumpycovariance

提问by Dave

I am trying to figure out how to calculate covariance with the Python Numpy function cov. When I pass it two one-dimentional arrays, I get back a 2x2 matrix of results. I don't know what to do with that. I'm not great at statistics, but I believe covariance in such a situation should be a single number. Thisis what I am looking for. I wrote my own:

我想弄清楚如何使用 Python Numpy 函数 cov 计算协方差。当我传递两个一维数组时,我得到一个 2x2 的结果矩阵。我不知道该怎么办。我不擅长统计,但我相信这种情况下的协方差应该是一个单一的数字。 就是我正在寻找的。我自己写的:

def cov(a, b):

    if len(a) != len(b):
        return

    a_mean = np.mean(a)
    b_mean = np.mean(b)

    sum = 0

    for i in range(0, len(a)):
        sum += ((a[i] - a_mean) * (b[i] - b_mean))

    return sum/(len(a)-1)

That works, but I figure the Numpy version is much more efficient, if I could figure out how to use it.

这是有效的,但我认为 Numpy 版本效率更高,如果我能弄清楚如何使用它。

Does anybody know how to make the Numpy cov function perform like the one I wrote?

有人知道如何让 Numpy cov 函数像我写的那样执行吗?

Thanks,

谢谢,

Dave

戴夫

采纳答案by unutbu

When aand bare 1-dimensional sequences, numpy.cov(a,b)[0][1]is equivalent to your cov(a,b).

ab是一维序列时,numpy.cov(a,b)[0][1]相当于你的cov(a,b).

The 2x2 array returned by np.cov(a,b)has elements equal to

返回的 2x2 数组的np.cov(a,b)元素等于

cov(a,a)  cov(a,b)

cov(a,b)  cov(b,b)

(where, again, covis the function you defined above.)

(同样,这里cov是您在上面定义的函数。)

回答by Osian

Thanks to unutbu for the explanation. By default numpy.cov calculates the sample covariance. To obtain the population covariance you can specify normalisation by the total N samples like this:

感谢 unutbu 的解释。默认情况下 numpy.cov 计算样本协方差。要获得总体协方差,您可以通过总 N 个样本指定归一化,如下所示:

Covariance = numpy.cov(a, b, bias=True)[0][1]
print(Covariance)

or like this:

或者像这样:

Covariance = numpy.cov(a, b, ddof=0)[0][1]
print(Covariance)