pandas 数据框中两列的相关系数与 .corr()

Question

提问by florence-y

I would like to calculate the correlation coefficient between two columns of a pandas data frame after making a column boolean in nature. The original tablehad two columns: a GroupColumn with one of two treatment groups, now boolean, and an AgeGroup. Those are the two columns I'm looking to calculate the correlation coefficient.

我想在本质上创建列布尔值后计算Pandas数据框的两列之间的相关系数。原来的table有两列：一个Group带有两个处理组之一的列，现在是布尔值，还有一个Age组。这些是我要计算相关系数的两列。

I tried the .corr()method, with:

我尝试了该.corr()方法，其中：

table.corr(method='pearson')

but have this returned to me:

但这是否还给我：

I have pasted the first 25 rows of boolean tablebelow. I don't know if I'm missing parameters, or how to interpret this result. It's also strange that it's 1 as well. Thanks in advance!

我在下面粘贴了前 25 行布尔值table。我不知道我是否缺少参数，或者如何解释这个结果。同样奇怪的是它也是1。提前致谢！

    Group  Age
0      1   50
1      1   59
2      1   22
3      1   48
4      1   53
5      1   48
6      1   29
7      1   44
8      1   28
9      1   42
10     1   35
11     0   54
12     0   43
13     1   50
14     1   62
15     0   64
16     0   39
17     1   40
18     1   59
19     1   46
20     0   56
21     1   21
22     1   45
23     0   41
24     1   46
25     0   35

Answer 1

回答by Brad Solomon

Calling .corr()on the entire DataFrame gives you a full correlation matrix:

调用.corr()整个 DataFrame 为您提供完整的相关矩阵：

>>> table.corr()
        Group     Age
Group  1.0000 -0.1533
Age   -0.1533  1.0000

You can use the separate Series instead:

您可以改用单独的系列：

>>> table['Group'].corr(table['Age'])
-0.15330486289034567

This should be faster than using the full matrix and indexing it (with df.corr().iat['Group', 'Age']). Also, this should work whether Groupis bool or int dtype.

这应该比使用完整矩阵并对其进行索引（使用df.corr().iat['Group', 'Age']）更快。此外，无论Group是 bool 还是 int dtype ，这都应该有效。

pandas 数据框中两列的相关系数与 .corr()

提问by florence-y

回答by Brad Solomon

相关推荐

最近更新

标签

pandas 数据框中两列的相关系数与 .corr()

提问by florence-y

回答by Brad Solomon

相关推荐

pandas 如何在熊猫中用滚动平均值填充nan值

pandas 熊猫分类错误：“无法在具有新类别的分类上设置项目，请先设置类别”

pandas 如何从熊猫数据框中的时间戳列中删除时区

Python：matplotlib/pandas - 将数据框绘制为子图中的表格

相关推荐

最近更新

标签