pandas 数据框中两列的相关系数与 .corr()

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49350445/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:20:44  来源:igfitidea点击:

Correlation coefficient of two columns in pandas dataframe with .corr()

pythonpandascorrelation

提问by florence-y

I would like to calculate the correlation coefficient between two columns of a pandas data frame after making a column boolean in nature. The original tablehad two columns: a GroupColumn with one of two treatment groups, now boolean, and an AgeGroup. Those are the two columns I'm looking to calculate the correlation coefficient.

我想在本质上创建列布尔值后计算Pandas数据框的两列之间的相关系数。原来的table有两列:一个Group带有两个处理组之一的列,现在是布尔值,还有一个Age组。这些是我要计算相关系数的两列。

I tried the .corr()method, with:

我尝试了该.corr()方法,其中:

table.corr(method='pearson')

but have this returned to me: enter image description here

但这是否还给我: 在此处输入图片说明

I have pasted the first 25 rows of boolean tablebelow. I don't know if I'm missing parameters, or how to interpret this result. It's also strange that it's 1 as well. Thanks in advance!

我在下面粘贴了前 25 行布尔值table。我不知道我是否缺少参数,或者如何解释这个结果。同样奇怪的是它也是1。提前致谢!

    Group  Age
0      1   50
1      1   59
2      1   22
3      1   48
4      1   53
5      1   48
6      1   29
7      1   44
8      1   28
9      1   42
10     1   35
11     0   54
12     0   43
13     1   50
14     1   62
15     0   64
16     0   39
17     1   40
18     1   59
19     1   46
20     0   56
21     1   21
22     1   45
23     0   41
24     1   46
25     0   35

回答by Brad Solomon

Calling .corr()on the entire DataFrame gives you a full correlation matrix:

调用.corr()整个 DataFrame 为您提供完整的相关矩阵:

>>> table.corr()
        Group     Age
Group  1.0000 -0.1533
Age   -0.1533  1.0000

You can use the separate Series instead:

您可以改用单独的系列:

>>> table['Group'].corr(table['Age'])
-0.15330486289034567

This should be faster than using the full matrix and indexing it (with df.corr().iat['Group', 'Age']). Also, this should work whether Groupis bool or int dtype.

这应该比使用完整矩阵并对其进行索引(使用df.corr().iat['Group', 'Age'])更快。此外,无论Group是 bool 还是 int dtype ,这都应该有效。