如何用 Pandas 计算协方差矩阵
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42094890/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to calculate covariance Matrix with Pandas
提问by JulienCoo
I'm trying to figure out how to calculate a covariance matrix with Pandas. I'm not a data scientist or a finance guy, i'm just a regular dev going a out of his league.
我想弄清楚如何用 Pandas 计算协方差矩阵。我不是数据科学家或财务人员,我只是一个普通的开发人员,正在脱离他的联盟。
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(252, 4)), columns=list('ABCD'))
print(df.cov())
So, if I do this, I get that kind of output:
所以,如果我这样做,我会得到那种输出:
I find that the number are huge, and i was expecting them to be closer to zero. Do i have to calculate the return before getting the cov ?
我发现这个数字很大,我希望它们接近于零。我是否必须在获得 cov 之前计算回报?
Does anyone familiar with this could explain this a little bit or point me to a good link with explanation ? I couldn't find any link to Covariance Matrix For Dummies.
有没有熟悉这个的人可以解释一下,或者给我指出一个很好的解释链接?我找不到 Covariance Matrix For Dummies 的任何链接。
Regards, Julien
问候, 朱利安
回答by Okroshiashvili
Covarianceis a measure of the degree to which returns on two assets (or any two vector or array) move in tandem. A positive covariance means that asset returns move together, while a negative covariance means returns move inversely.
协方差是衡量两个资产(或任何两个向量或数组)的回报同步移动的程度。正协方差意味着资产收益一起移动,而负协方差意味着收益反向移动。
On the other side we have:
另一方面,我们有:
The correlation coefficientis a measure that determines the degree to which two variables' movements are associated. Note that the correlation coefficient measures linear relationshipbetween two arrays/vector/asset.
的相关系数是确定两个变量运动相关联的程度的量度。请注意,相关系数衡量两个数组/向量/资产之间的线性关系。
So, portfolio managers try to reduce covariance between two assets and keep the correlation coefficient negative to have enough diversification in the portfolio. Meaning that a decrease in one asset's return will not cause a decrease in return of the second asset(That's why we need negative correlation).
因此,投资组合经理试图降低两种资产之间的协方差,并保持相关系数为负,以使投资组合具有足够的多样化。这意味着一项资产回报的减少不会导致第二项资产的回报减少(这就是我们需要负相关的原因)。
Maybe you meant correlation coefficient close to zero, not covariance.
也许你的意思是相关系数接近于零,而不是协方差。
回答by AlketCecaj
The fact that you haven't provided a seed for your randomly generated numbers makes th reproducibility of your experiment difficoult. However, I tried the code you are providing here and the closer covariance matrix I get is this one :
您没有为随机生成的数字提供种子这一事实使您的实验难以重现。但是,我尝试了您在此处提供的代码,我得到的更接近的协方差矩阵是这个:
To understand why the numbers in your cov_matrix are so huge you should first understand what is a covarance matrix. The covariance matrix is is a matrix that has as elements in the i, j position the the covariance between the i-th and j-th elements of a random vector.
要了解为什么 cov_matrix 中的数字如此之大,您首先应该了解什么是协方差矩阵。协方差矩阵是一个矩阵,它在 i、j 位置具有随机向量的第 i 个和第 j 个元素之间的协方差作为元素。
A good link you might check is https://en.wikipedia.org/wiki/Covariance_matrix. Also understanding the correlation matrix might help : https://en.wikipedia.org/wiki/Correlation_and_dependence#Correlation_matrices
您可能会检查的一个很好的链接是https://en.wikipedia.org/wiki/Covariance_matrix。另外了解相关矩阵可能会有所帮助:https: //en.wikipedia.org/wiki/Correlation_and_dependence#Correlation_matrices