pd.corrwith 在具有不同列名的 Pandas 数据帧上
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27079249/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pd.corrwith on pandas dataframes with different column names
提问by themachinist
I would like to get the pearson r between x1 and each of the three columns in y, in an efficient manner.
我想以有效的方式在 x1 和 y 中的三列中的每一列之间获得 pearson r。
It appears that pd.corrwith() is only able to calculate this for columns that have exactly the same column labels e.g. x and y.
似乎 pd.corrwith() 只能为具有完全相同的列标签(例如 x 和 y)的列计算此值。
This seems a bit impractical, as I presume computing correlations between different variables would be a common problem.
这似乎有点不切实际,因为我认为计算不同变量之间的相关性将是一个常见问题。
In [1]: import pandas as pd; import numpy as np
In [2]: x = pd.DataFrame(np.random.randn(5,3),columns=['A','B','C'])
In [3]: y = pd.DataFrame(np.random.randn(5,3),columns=['A','B','C'])
In [4]: x1 = pd.DataFrame(x.ix[:,0])
In [5]: x.corrwith(y)
Out[5]:
A -0.752631
B -0.525705
C 0.516071
dtype: float64
In [6]: x1.corrwith(y)
Out[6]:
A -0.752631
B NaN
C NaN
dtype: float64
回答by seth-p
You can accomplish what you want using DataFrame.corrwith(Series)rather than DataFrame.corrwith(DataFrame):
您可以使用DataFrame.corrwith(Series)而不是DataFrame.corrwith(DataFrame):
In [203]: x1 = x['A']
In [204]: y.corrwith(x1)
Out[204]:
A 0.347629
B -0.480474
C -0.729303
dtype: float64
Alternatively, you can form the matrix of correlations between each column of xand each column of yas follows:
或者,您可以形成每列x和每列之间的相关矩阵,y如下所示:
In [214]: pd.expanding_corr(x, y, pairwise=True).iloc[-1, :, :]
Out[214]:
A B C
A 0.347629 -0.480474 -0.729303
B -0.334814 0.778019 0.654583
C -0.453273 0.212057 0.149544
Alas DataFrame.corrwith()doesn't have a pairwise=Trueoption.
可惜DataFrame.corrwith()没有一个pairwise=True选项。
回答by Primer
You might do this (with np.random.seed(0)):
你可以这样做(使用np.random.seed(0)):
x1 = pd.DataFrame(pd.Series(x.ix[:,0]).repeat(x.shape[1]).reshape(x.shape), columns=x.columns)
x1.corrwith(y)
to get this result:
得到这个结果:
A -0.509
B 0.041
C -0.732

