Python numpy 协方差矩阵

Question

提问by user13321

Suppose I have two vectors of length 25, and I want to compute their covariance matrix. I try doing this with numpy.cov, but always end up with a 2x2 matrix.

假设我有两个长度为 25 的向量，我想计算它们的协方差矩阵。我尝试使用 numpy.cov 执行此操作，但最终总是以 2x2 矩阵结束。

>>> import numpy as np
>>> x=np.random.normal(size=25)
>>> y=np.random.normal(size=25)
>>> np.cov(x,y)
array([[ 0.77568388,  0.15568432],
       [ 0.15568432,  0.73839014]])

Using the rowvar flag doesn't help either - I get exactly the same result.

使用 rowvar 标志也无济于事 - 我得到完全相同的结果。

>>> np.cov(x,y,rowvar=0)
array([[ 0.77568388,  0.15568432],
       [ 0.15568432,  0.73839014]])

How can I get the 25x25 covariance matrix?

如何获得 25x25 协方差矩阵？

Answer 1

采纳答案by David Marx

You have two vectors, not 25. The computer I'm on doesn't have python so I can't test this, but try:

您有两个向量，而不是 25。我使用的计算机没有 python，所以我无法测试，但请尝试：

z = zip(x,y)
np.cov(z)

Of course.... really what you want is probably more like:

当然......你真正想要的可能更像是：

n=100 # number of points in each vector
num_vects=25
vals=[]
for _ in range(num_vects):
    vals.append(np.random.normal(size=n))
np.cov(vals)

This takes the covariance (I think/hope) of num_vects1xnvectors

这需要num_vects1xn向量的协方差（我认为/希望）

Answer 2

回答by Arcturus

Reading the documentation as,

阅读文档，

>> np.cov.__doc__

or looking at Numpy Covariance, Numpy treats each row of array as a separate variable, so you have two variables and hence you get a 2 x 2 covariance matrix.

或查看Numpy Covariance，Numpy 将数组的每一行视为一个单独的变量，因此您有两个变量，因此您会得到一个 2 x 2 协方差矩阵。

I think the previous post has right solution. I have the explanation :-)

我认为上一篇文章有正确的解决方案。我有解释:-)

Answer 3

回答by Stuart

As pointed out above, you only have two vectors so you'll only get a 2x2 cov matrix.

如上所述，您只有两个向量，因此您只会得到一个 2x2 cov 矩阵。

IIRC the 2 main diagonal terms will be sum( (x-mean(x))**2) / (n-1) and similarly for y.

IIRC 的 2 个主要对角线项将是 sum( (x-mean(x))**2) / (n-1) 和类似的 y。

The 2 off-diagonal terms will be sum( (x-mean(x))(y-mean(y)) ) / (n-1). n=25 in this case.

2 个非对角线项将是 sum( (x-mean(x))(y-mean(y)) ) / (n-1)。在这种情况下，n=25。

Answer 4

回答by Sylou

Try this:

尝试这个：

import numpy as np
x=np.random.normal(size=25)
y=np.random.normal(size=25)
z = np.vstack((x, y))
c = np.cov(z.T)

Answer 5

回答by Leukonoe

I suppose what youre looking for is actually a covariance function which is a timelag function. I'm doing autocovariance like that:

我想你要找的实际上是一个协方差函数，它是一个时滞函数。我正在做这样的自协方差：

 def autocovariance(Xi, N, k):
    Xs=np.average(Xi)
    aCov = 0.0
    for i in np.arange(0, N-k):
        aCov = (Xi[(i+k)]-Xs)*(Xi[i]-Xs)+aCov
    return  (1./(N))*aCov

autocov[i]=(autocovariance(My_wector, N, h))

Answer 6

回答by Edison Chen

i don't think you understand the definition of covariance matrix. If you need 25 x 25 covariance matrix, you need 25 vectors each with n data points.

我认为您不了解协方差矩阵的定义。如果需要 25 x 25 协方差矩阵，则需要 25 个向量，每个向量具有 n 个数据点。

Answer 7

回答by FooBar167

You should change

你应该改变

np.cov(x,y, rowvar=0)

onto

上

np.cov((x,y), rowvar=0)

Answer 8

回答by Aerin

What you got (2 by 2) is more useful than 25*25. Covariance of X and Y is an off-diagonal entry in the symmetric cov_matrix.

你得到的（2 x 2）比 25*25 更有用。X 和 Y 的协方差是对称 cov_matrix 中的非对角线项。

If you insist on (25 by 25) which I think useless, then why don't you write out the definition?

如果你坚持我认为没用的 (25 x 25)，那你为什么不写出定义呢？

x=np.random.normal(size=25).reshape(25,1) # to make it 2d array.
y=np.random.normal(size=25).reshape(25,1)

cov =  np.matmul(x-np.mean(x), (y-np.mean(y)).T) / len(x)

Answer 9

回答by Blupon

?Covariance matrix from samples vectors

?来自样本向量的协方差矩阵

To clarify the small confusion regarding what is a covariance matrix defined using two N-dimensional vectors, there are two possibilities.

为了澄清关于什么是使用两个 N 维向量定义的协方差矩阵的小混淆，有两种可能性。

The question you have to ask yourself is whether you consider:

您必须问自己的问题是您是否考虑：

each vector as N realizations/samples of one single variable(for example two 3-dimensional vectors [X1,X2,X3]and [Y1,Y2,Y3], where you have 3 realizations for the variables X and Y respectively)
each vector as 1 realization for N variables(for example two 3-dimensional vectors [X1,Y1,Z1]and [X2,Y2,Z2], where you have 1 realization for the variables X,Y and Z per vector)

每个向量作为一个单个变量的 N 个实现/样本（例如两个 3 维向量[X1,X2,X3]和[Y1,Y2,Y3]，其中变量 X 和 Y 分别有 3 个实现）
每个向量作为 N 个变量的 1 个实现（例如，两个 3 维向量[X1,Y1,Z1]和[X2,Y2,Z2]，其中每个向量的变量 X、Y 和 Z 有 1 个实现）

Since a covariance matrix is intuitively defined as a variance based on two different variables:

由于协方差矩阵直观地定义为基于两个不同变量的方差：

in the first case, you have 2 variables, N example values for each, so you end up with a 2x2 matrixwhere the covariances are computed thanks to N samples per variable
in the second case, you have N variables, 2 samples for each, so you end up with a NxN matrix

在第一种情况下，您有 2 个变量，每个变量都有 N 个示例值，因此最终得到一个 2x2 矩阵，由于每个变量有 N 个样本，因此可以计算协方差
在第二种情况下，你有 N 个变量，每个变量 2 个样本，所以你最终得到一个 NxN 矩阵

About the actual question, using numpy

关于实际问题，使用 numpy

if you consider that you have 25 variables per vector(took 3 instead of 25 to simplify example code), so one realization for several variables in one vector, use rowvar=0

如果您认为每个向量有 25 个变量（使用 3 个而不是 25 个以简化示例代码），那么一个向量中多个变量的一种实现，请使用rowvar=0

# [X1,Y1,Z1]
X_realization1 = [1,2,3]

# [X2,Y2,Z2]
X_realization2 = [2,1,8]

numpy.cov([X,Y],rowvar=0) # rowvar false, each column is a variable

Code returns, considering 3 variables:

代码返回，考虑 3 个变量：

array([[ 0.5, -0.5,  2.5],
       [-0.5,  0.5, -2.5],
       [ 2.5, -2.5, 12.5]])

otherwise, if you consider that one vector is 25 samples for one variable, use rowvar=1(numpy's default parameter)

否则，如果您认为一个向量是一个变量的 25 个样本，请使用rowvar=1（numpy 的默认参数）

# [X1,X2,X3]
X = [1,2,3]

# [Y1,Y2,Y3]
Y = [2,1,8]

numpy.cov([X,Y],rowvar=1) # rowvar true (default), each row is a variable

Code returns, considering 2 variables:

代码返回，考虑 2 个变量：

array([[ 1.        ,  3.        ],
       [ 3.        , 14.33333333]])

Answer 10

回答by lbsweek

according the document, you should expect variable vector in column:

根据文档，您应该期望列中的变量向量：

If we examine N-dimensional samples, X = [x1, x2, ..., xn]^T

though later it says each row is a variable

虽然后来它说每一行都是一个变量

Each row of m represents a variable.

so you need input your matrix as transpose

所以你需要输入你的矩阵作为转置

x=np.random.normal(size=25)
y=np.random.normal(size=25)
X = np.array([x,y])
np.cov(X.T)

and according to wikipedia: https://en.wikipedia.org/wiki/Covariance_matrix

并根据维基百科：https: //en.wikipedia.org/wiki/Covariance_matrix

X is column vector variable
X = [X1,X2, ..., Xn]^T
COV = E[X * X^T] - μx * μx^T   // μx = E[X]

you can implement it yourself:

你可以自己实现：

# X each row is variable
X = X - X.mean(axis=0)
h,w = X.shape
COV = X.T @ X / (h-1)

Python numpy 协方差矩阵

提问by user13321

采纳答案by David Marx

回答by Arcturus

回答by Stuart

回答by Sylou

回答by Leukonoe

回答by Edison Chen

回答by FooBar167

回答by Aerin

回答by Blupon

?Covariance matrix from samples vectors

?来自样本向量的协方差矩阵

About the actual question, using numpy

关于实际问题，使用 numpy

回答by lbsweek

相关推荐

最近更新

标签

Python numpy 协方差矩阵

提问by user13321

采纳答案by David Marx

回答by Arcturus

回答by Stuart

回答by Sylou

回答by Leukonoe

回答by Edison Chen

回答by FooBar167

回答by Aerin

回答by Blupon

?Covariance matrix from samples vectors

?来自样本向量的协方差矩阵

About the actual question, using numpy

关于实际问题，使用 numpy

回答by lbsweek

相关推荐

为什么 Python3 中没有 xrange 函数？

HTML 表单 POST 到 python 脚本？

Python 如何在 Tkinter 文本框上设置对齐方式

Python 如何绘制直方图，使 matplotlib 中条形的高度总和为 1？

相关推荐

最近更新

标签