如何用 Pandas 计算协方差矩阵

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42094890/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:54:56  来源:igfitidea点击:

How to calculate covariance Matrix with Pandas

pythonpandasnumpydataframecovariance

提问by JulienCoo

I'm trying to figure out how to calculate a covariance matrix with Pandas. I'm not a data scientist or a finance guy, i'm just a regular dev going a out of his league.

我想弄清楚如何用 Pandas 计算协方差矩阵。我不是数据科学家或财务人员,我只是一个普通的开发人员,正在脱离他的联盟。

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(252, 4)), columns=list('ABCD'))
print(df.cov())

So, if I do this, I get that kind of output:

所以,如果我这样做,我会得到那种输出:

enter image description here

在此处输入图片说明

I find that the number are huge, and i was expecting them to be closer to zero. Do i have to calculate the return before getting the cov ?

我发现这个数字很大,我希望它们接近于零。我是否必须在获得 cov 之前计算回报?

Does anyone familiar with this could explain this a little bit or point me to a good link with explanation ? I couldn't find any link to Covariance Matrix For Dummies.

有没有熟悉这个的人可以解释一下,或者给我指出一个很好的解释链接?我找不到 Covariance Matrix For Dummies 的任何链接。

Regards, Julien

问候, 朱利安

回答by Okroshiashvili

Covarianceis a measure of the degree to which returns on two assets (or any two vector or array) move in tandem. A positive covariance means that asset returns move together, while a negative covariance means returns move inversely.

协方差是衡量两个资产(或任何两个向量或数组)的回报同步移动的程度。正协方差意味着资产收益一起移动,而负协方差意味着收益反向移动。

On the other side we have:

另一方面,我们有:

The correlation coefficientis a measure that determines the degree to which two variables' movements are associated. Note that the correlation coefficient measures linear relationshipbetween two arrays/vector/asset.

相关系数是确定两个变量运动相关联的程度的量度。请注意,相关系数衡量两个数组/向量/资产之间的线性关系

So, portfolio managers try to reduce covariance between two assets and keep the correlation coefficient negative to have enough diversification in the portfolio. Meaning that a decrease in one asset's return will not cause a decrease in return of the second asset(That's why we need negative correlation).

因此,投资组合经理试图降低两种资产之间的协方差,并保持相关系数为负,以使投资组合具有足够的多样化。这意味着一项资产回报的减少不会导致第二项资产的回报减少(这就是我们需要负相关的原因)。

Maybe you meant correlation coefficient close to zero, not covariance.

也许你的意思是相关系数接近于零,而不是协方差。

回答by AlketCecaj

The fact that you haven't provided a seed for your randomly generated numbers makes th reproducibility of your experiment difficoult. However, I tried the code you are providing here and the closer covariance matrix I get is this one :

您没有为随机生成的数字提供种子这一事实使您的实验难以重现。但是,我尝试了您在此处提供的代码,我得到的更接近的协方差矩阵是这个:

covariance matrix

协方差矩阵

To understand why the numbers in your cov_matrix are so huge you should first understand what is a covarance matrix. The covariance matrix is is a matrix that has as elements in the i, j position the the covariance between the i-th and j-th elements of a random vector.

要了解为什么 cov_matrix 中的数字如此之大,您首先应该了解什么是协方差矩阵。协方差矩阵是一个矩阵,它在 i、j 位置具有随机向量的第 i 个和第 j 个元素之间的协方差作为元素。

A good link you might check is https://en.wikipedia.org/wiki/Covariance_matrix. Also understanding the correlation matrix might help : https://en.wikipedia.org/wiki/Correlation_and_dependence#Correlation_matrices

您可能会检查的一个很好的链接是https://en.wikipedia.org/wiki/Covariance_matrix。另外了解相关矩阵可能会有所帮助:https: //en.wikipedia.org/wiki/Correlation_and_dependence#Correlation_matrices