Python numpy 协方差矩阵
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15036205/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
numpy covariance matrix
提问by user13321
Suppose I have two vectors of length 25, and I want to compute their covariance matrix. I try doing this with numpy.cov, but always end up with a 2x2 matrix.
假设我有两个长度为 25 的向量,我想计算它们的协方差矩阵。我尝试使用 numpy.cov 执行此操作,但最终总是以 2x2 矩阵结束。
>>> import numpy as np
>>> x=np.random.normal(size=25)
>>> y=np.random.normal(size=25)
>>> np.cov(x,y)
array([[ 0.77568388, 0.15568432],
[ 0.15568432, 0.73839014]])
Using the rowvar flag doesn't help either - I get exactly the same result.
使用 rowvar 标志也无济于事 - 我得到完全相同的结果。
>>> np.cov(x,y,rowvar=0)
array([[ 0.77568388, 0.15568432],
[ 0.15568432, 0.73839014]])
How can I get the 25x25 covariance matrix?
如何获得 25x25 协方差矩阵?
采纳答案by David Marx
You have two vectors, not 25. The computer I'm on doesn't have python so I can't test this, but try:
您有两个向量,而不是 25。我使用的计算机没有 python,所以我无法测试,但请尝试:
z = zip(x,y)
np.cov(z)
Of course.... really what you want is probably more like:
当然......你真正想要的可能更像是:
n=100 # number of points in each vector
num_vects=25
vals=[]
for _ in range(num_vects):
vals.append(np.random.normal(size=n))
np.cov(vals)
This takes the covariance (I think/hope) of num_vects1xnvectors
这需要num_vects1xn向量的协方差(我认为/希望)
回答by Arcturus
Reading the documentation as,
阅读文档,
>> np.cov.__doc__
or looking at Numpy Covariance, Numpy treats each row of array as a separate variable, so you have two variables and hence you get a 2 x 2 covariance matrix.
或查看Numpy Covariance,Numpy 将数组的每一行视为一个单独的变量,因此您有两个变量,因此您会得到一个 2 x 2 协方差矩阵。
I think the previous post has right solution. I have the explanation :-)
我认为上一篇文章有正确的解决方案。我有解释:-)
回答by Stuart
As pointed out above, you only have two vectors so you'll only get a 2x2 cov matrix.
如上所述,您只有两个向量,因此您只会得到一个 2x2 cov 矩阵。
IIRC the 2 main diagonal terms will be sum( (x-mean(x))**2) / (n-1) and similarly for y.
IIRC 的 2 个主要对角线项将是 sum( (x-mean(x))**2) / (n-1) 和类似的 y。
The 2 off-diagonal terms will be sum( (x-mean(x))(y-mean(y)) ) / (n-1). n=25 in this case.
2 个非对角线项将是 sum( (x-mean(x))(y-mean(y)) ) / (n-1)。在这种情况下,n=25。
回答by Sylou
Try this:
尝试这个:
import numpy as np
x=np.random.normal(size=25)
y=np.random.normal(size=25)
z = np.vstack((x, y))
c = np.cov(z.T)
回答by Leukonoe
I suppose what youre looking for is actually a covariance function which is a timelag function. I'm doing autocovariance like that:
我想你要找的实际上是一个协方差函数,它是一个时滞函数。我正在做这样的自协方差:
def autocovariance(Xi, N, k):
Xs=np.average(Xi)
aCov = 0.0
for i in np.arange(0, N-k):
aCov = (Xi[(i+k)]-Xs)*(Xi[i]-Xs)+aCov
return (1./(N))*aCov
autocov[i]=(autocovariance(My_wector, N, h))
回答by Edison Chen
i don't think you understand the definition of covariance matrix. If you need 25 x 25 covariance matrix, you need 25 vectors each with n data points.
我认为您不了解协方差矩阵的定义。如果需要 25 x 25 协方差矩阵,则需要 25 个向量,每个向量具有 n 个数据点。
回答by FooBar167
You should change
你应该改变
np.cov(x,y, rowvar=0)
onto
上
np.cov((x,y), rowvar=0)
回答by Aerin
What you got (2 by 2) is more useful than 25*25. Covariance of X and Y is an off-diagonal entry in the symmetric cov_matrix.
你得到的(2 x 2)比 25*25 更有用。X 和 Y 的协方差是对称 cov_matrix 中的非对角线项。
If you insist on (25 by 25) which I think useless, then why don't you write out the definition?
如果你坚持我认为没用的 (25 x 25),那你为什么不写出定义呢?
x=np.random.normal(size=25).reshape(25,1) # to make it 2d array.
y=np.random.normal(size=25).reshape(25,1)
cov = np.matmul(x-np.mean(x), (y-np.mean(y)).T) / len(x)
回答by Blupon
?Covariance matrix from samples vectors
?来自样本向量的协方差矩阵
To clarify the small confusion regarding what is a covariance matrix defined using two N-dimensional vectors, there are two possibilities.
为了澄清关于什么是使用两个 N 维向量定义的协方差矩阵的小混淆,有两种可能性。
The question you have to ask yourself is whether you consider:
您必须问自己的问题是您是否考虑:
- each vector as N realizations/samples of one single variable(for example two 3-dimensional vectors
[X1,X2,X3]and[Y1,Y2,Y3], where you have 3 realizations for the variables X and Y respectively) - each vector as 1 realization for N variables(for example two 3-dimensional vectors
[X1,Y1,Z1]and[X2,Y2,Z2], where you have 1 realization for the variables X,Y and Z per vector)
- 每个向量作为一个单个变量的 N 个实现/样本(例如两个 3 维向量
[X1,X2,X3]和[Y1,Y2,Y3],其中变量 X 和 Y 分别有 3 个实现) - 每个向量作为 N 个变量的 1 个实现(例如,两个 3 维向量
[X1,Y1,Z1]和[X2,Y2,Z2],其中每个向量的变量 X、Y 和 Z 有 1 个实现)
Since a covariance matrix is intuitively defined as a variance based on two different variables:
由于协方差矩阵直观地定义为基于两个不同变量的方差:
- in the first case, you have 2 variables, N example values for each, so you end up with a 2x2 matrixwhere the covariances are computed thanks to N samples per variable
- in the second case, you have N variables, 2 samples for each, so you end up with a NxN matrix
- 在第一种情况下,您有 2 个变量,每个变量都有 N 个示例值,因此最终得到一个 2x2 矩阵,由于每个变量有 N 个样本,因此可以计算协方差
- 在第二种情况下,你有 N 个变量,每个变量 2 个样本,所以你最终得到一个 NxN 矩阵
About the actual question, using numpy
关于实际问题,使用 numpy
if you consider that you have 25 variables per vector(took 3 instead of 25 to simplify example code), so one realization for several variables in one vector, use rowvar=0
如果您认为每个向量有 25 个变量(使用 3 个而不是 25 个以简化示例代码),那么一个向量中多个变量的一种实现,请使用rowvar=0
# [X1,Y1,Z1]
X_realization1 = [1,2,3]
# [X2,Y2,Z2]
X_realization2 = [2,1,8]
numpy.cov([X,Y],rowvar=0) # rowvar false, each column is a variable
Code returns, considering 3 variables:
代码返回,考虑 3 个变量:
array([[ 0.5, -0.5, 2.5],
[-0.5, 0.5, -2.5],
[ 2.5, -2.5, 12.5]])
otherwise, if you consider that one vector is 25 samples for one variable, use rowvar=1(numpy's default parameter)
否则,如果您认为一个向量是一个变量的 25 个样本,请使用rowvar=1(numpy 的默认参数)
# [X1,X2,X3]
X = [1,2,3]
# [Y1,Y2,Y3]
Y = [2,1,8]
numpy.cov([X,Y],rowvar=1) # rowvar true (default), each row is a variable
Code returns, considering 2 variables:
代码返回,考虑 2 个变量:
array([[ 1. , 3. ],
[ 3. , 14.33333333]])
回答by lbsweek
according the document, you should expect variable vector in column:
根据文档,您应该期望列中的变量向量:
If we examine N-dimensional samples, X = [x1, x2, ..., xn]^T
though later it says each row is a variable
虽然后来它说每一行都是一个变量
Each row of m represents a variable.
so you need input your matrix as transpose
所以你需要输入你的矩阵作为转置
x=np.random.normal(size=25)
y=np.random.normal(size=25)
X = np.array([x,y])
np.cov(X.T)
and according to wikipedia: https://en.wikipedia.org/wiki/Covariance_matrix
并根据维基百科:https: //en.wikipedia.org/wiki/Covariance_matrix
X is column vector variable
X = [X1,X2, ..., Xn]^T
COV = E[X * X^T] - μx * μx^T // μx = E[X]
you can implement it yourself:
你可以自己实现:
# X each row is variable
X = X - X.mean(axis=0)
h,w = X.shape
COV = X.T @ X / (h-1)

