pandas 熊猫 corr() 与 corrwith()

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46041148/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:23:36  来源:igfitidea点击:

Pandas corr() vs corrwith()

pythonpandas

提问by BaluJr.

What is the reason of Pandas to provide two different correlation functions?

Pandas 提供两种不同的关联函数的原因是什么?

DataFrame.corrwith(other, axis=0, drop=False): Correlation between rows or columns of two DataFrame objectsCompute pairwise

DataFrame.corrwith(other, axis=0, drop=False):两个 DataFrame 对象的行或列之间的相关性计算成对

vs.

对比

DataFrame.corr(method='pearson', min_periods=1): Compute pairwise correlation of columns, excluding NA/null values

DataFrame.corr(method='pearson', min_periods=1):计算列的成对相关性,不包括 NA/null 值

(from pandas 0.20.3 documentation)

(来自Pandas 0.20.3 文档)

采纳答案by ffeast

The first one computes correlation with another dataframe:

第一个计算与另一个数据帧的相关性:

between rows or columns of two DataFrame objects

在两个 DataFrame 对象的行或列之间

The second one computes it with itself

第二个用自己计算

Compute pairwise correlation of columns

计算列的成对相关性

回答by JohnE

Basic Answer:

基本答案:

Here's an example that might make it more clear:

这是一个可能更清楚的例子:

np.random.seed(123)
df1=pd.DataFrame( np.random.randn(3,2), columns=list('ab') )
df2=pd.DataFrame( np.random.randn(3,2), columns=list('ac') )

As noted by @ffeast, use corrto compare numerical columns within the same dataframe. Non-numerical columns will automatically be skipped.

正如@ffeast 所指出的,用于corr比较同一数据框中的数字列。非数字列将被自动跳过。

df1.corr()

          a         b
a  1.000000 -0.840475
b -0.840475  1.000000

You can compare columns of df1 & df2 with corrwith. Note that only columns with the same namesare compared:

您可以将 df1 和 df2 的列与corrwith. 请注意,仅比较具有相同名称的列:

df1.corrwith(df2)

a    0.993085
b         NaN
c         NaN

Additional options:

其他选项:

If you want pandas to ignore the column names and just compare the first row of df1 to the first row of df2, then you could rename the columns of df2 to match the columns of df1 like this:

如果您希望 Pandas 忽略列名并仅将 df1 的第一行与 df2 的第一行进行比较,那么您可以重命名 df2 的列以匹配 df1 的列,如下所示:

df1.corrwith(df2.set_axis( df1.columns, axis='columns', inplace=False))

a    0.993085
b    0.969220

Note that df1 and df2 need to have the same number of columns in that case.

请注意,在这种情况下,df1 和 df2 需要具有相同的列数。

Finally, a kitchen sink approach: you could also simply horizontally concatenate the two datasets and then use corr(). The advantage is that this basically works regardless of the number of columns and how they are named, but the disadvantage is that you might get more output than you want or need:

最后,厨房水槽方法:您也可以简单地水平连接两个数据集,然后使用corr(). 优点是无论列的数量和名称如何,这基本上都有效,但缺点是您可能会获得比您想要或需要的更多的输出:

pd.concat([df1,df2],axis=1).corr()

          a         b         a         c
a  1.000000 -0.840475  0.993085 -0.681203
b -0.840475  1.000000 -0.771050  0.969220
a  0.993085 -0.771050  1.000000 -0.590545
c -0.681203  0.969220 -0.590545  1.000000