pd.corrwith 在具有不同列名的 Pandas 数据帧上

Question

提问by themachinist

I would like to get the pearson r between x1 and each of the three columns in y, in an efficient manner.

我想以有效的方式在 x1 和 y 中的三列中的每一列之间获得 pearson r。

It appears that pd.corrwith() is only able to calculate this for columns that have exactly the same column labels e.g. x and y.

似乎 pd.corrwith() 只能为具有完全相同的列标签（例如 x 和 y）的列计算此值。

This seems a bit impractical, as I presume computing correlations between different variables would be a common problem.

这似乎有点不切实际，因为我认为计算不同变量之间的相关性将是一个常见问题。

In [1]: import pandas as pd; import numpy as np

In [2]: x = pd.DataFrame(np.random.randn(5,3),columns=['A','B','C'])

In [3]: y = pd.DataFrame(np.random.randn(5,3),columns=['A','B','C'])

In [4]: x1 = pd.DataFrame(x.ix[:,0])

In [5]: x.corrwith(y)
Out[5]:
A   -0.752631
B   -0.525705
C    0.516071
dtype: float64

In [6]: x1.corrwith(y)
Out[6]:
A   -0.752631
B         NaN
C         NaN
dtype: float64

Answer 1

回答by seth-p

You can accomplish what you want using DataFrame.corrwith(Series)rather than DataFrame.corrwith(DataFrame):

您可以使用DataFrame.corrwith(Series)而不是DataFrame.corrwith(DataFrame)：

In [203]: x1 = x['A']

In [204]: y.corrwith(x1)
Out[204]:
A    0.347629
B   -0.480474
C   -0.729303
dtype: float64

Alternatively, you can form the matrix of correlations between each column of xand each column of yas follows:

或者，您可以形成每列x和每列之间的相关矩阵，y如下所示：

In [214]: pd.expanding_corr(x, y, pairwise=True).iloc[-1, :, :]
Out[214]:
          A         B         C
A  0.347629 -0.480474 -0.729303
B -0.334814  0.778019  0.654583
C -0.453273  0.212057  0.149544

Alas DataFrame.corrwith()doesn't have a pairwise=Trueoption.

可惜DataFrame.corrwith()没有一个pairwise=True选项。

Answer 2

回答by Primer

You might do this (with np.random.seed(0)):

你可以这样做（使用np.random.seed(0)）：

x1 = pd.DataFrame(pd.Series(x.ix[:,0]).repeat(x.shape[1]).reshape(x.shape), columns=x.columns)
x1.corrwith(y)

to get this result:

得到这个结果：

A   -0.509
B    0.041
C   -0.732

pd.corrwith 在具有不同列名的 Pandas 数据帧上

提问by themachinist

回答by seth-p

回答by Primer

相关推荐

最近更新

标签

pd.corrwith 在具有不同列名的 Pandas 数据帧上

提问by themachinist

回答by seth-p

回答by Primer

相关推荐

将空值添加到 Pandas 数据框

pandas 如何在熊猫数据框中使用列表作为值？

来自csv的第一行和最后一行的Python pandas DataFrame

无法使用 to_json 将 Pandas DataFrame 转换为 json

相关推荐

最近更新

标签