pandas 如何将相关矩阵绘制为一组椭圆,类似于 R 露天包?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34556180/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:27:03  来源:igfitidea点击:

How can I plot a correlation matrix as a set of ellipses, similar to the R open-air package?

pythonnumpypandasmatplotlibcorrelation

提问by Han Zhengzu

The figure below is plotted using the open-air R package:

下图是使用open-air R包绘制的:

a correlation matrix showing the relationships between variables

显示变量之间关系的相关矩阵

I know matplotlib has the plt.matshowfunction,
but it can't clearly show the relation between variables at the same time.

我知道matplotlib有这个plt.matshow功能,
但是它不能同时清楚地显示变量之间的关系。

Here is my early work:

这是我早期的作品:

dfis a pandas dataframe with 7 variables shows like below:

df是一个包含 7 个变量的 Pandas 数据框,如下所示:

enter image description here

在此处输入图片说明

I don't know how to attach a .csvfile to StackOverflow.

我不知道如何将.csv文件附加到 StackOverflow。

Using plt.matshow(df.corr(),cmap = plt.cm.Greens), the figure shows like this:

使用plt.matshow(df.corr(),cmap = plt.cm.Greens),如图所示:

enter image description here

在此处输入图片说明

The second figure can't represent the correlation relations of the variables as clearly as the first one.

第二个数字不能像第一个数字那样清楚地表示变量的相关关系。

Edit:

编辑:

I upload the csv file to Google docs here.

我在此处将 csv 文件上传到 Google 文档。

回答by ali_m

I'm not aware of any existing Python library that does these "ellipse plots", but it's not particularly hard to implement using a matplotlib.collections.EllipseCollection:

我不知道有任何现有的 Python 库可以执行这些“椭圆图”,但是使用以下命令来实现并不是特别困难matplotlib.collections.EllipseCollection

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib.collections import EllipseCollection

def plot_corr_ellipses(data, ax=None, **kwargs):

    M = np.array(data)
    if not M.ndim == 2:
        raise ValueError('data must be a 2D array')
    if ax is None:
        fig, ax = plt.subplots(1, 1, subplot_kw={'aspect':'equal'})
        ax.set_xlim(-0.5, M.shape[1] - 0.5)
        ax.set_ylim(-0.5, M.shape[0] - 0.5)

    # xy locations of each ellipse center
    xy = np.indices(M.shape)[::-1].reshape(2, -1).T

    # set the relative sizes of the major/minor axes according to the strength of
    # the positive/negative correlation
    w = np.ones_like(M).ravel()
    h = 1 - np.abs(M).ravel()
    a = 45 * np.sign(M).ravel()

    ec = EllipseCollection(widths=w, heights=h, angles=a, units='x', offsets=xy,
                           transOffset=ax.transData, array=M.ravel(), **kwargs)
    ax.add_collection(ec)

    # if data is a DataFrame, use the row/column names as tick labels
    if isinstance(data, pd.DataFrame):
        ax.set_xticks(np.arange(M.shape[1]))
        ax.set_xticklabels(data.columns, rotation=90)
        ax.set_yticks(np.arange(M.shape[0]))
        ax.set_yticklabels(data.index)

    return ec

For example, using your data:

例如,使用您的数据:

data = df.corr()
fig, ax = plt.subplots(1, 1)
m = plot_corr_ellipses(data, ax=ax, cmap='Greens')
cb = fig.colorbar(m)
cb.set_label('Correlation coefficient')
ax.margins(0.1)

enter image description here

在此处输入图片说明

Negative correlations can be plotted as ellipses with the opposite orientation:

负相关可以绘制为相反方向的椭圆:

fig2, ax2 = plt.subplots(1, 1)
data2 = np.linspace(-1, 1, 9).reshape(3, 3)
m2 = plot_corr_ellipses(data2, ax=ax2, cmap='seismic', clim=[-1, 1])
cb2 = fig2.colorbar(m2)
ax2.margins(0.3)

enter image description here

在此处输入图片说明

回答by Stefan

Assuming you are interested in showing cluster relations, the seabornpackage mentioned in the comments also has a clustermap. Using your correlation matrix (looks like you want to show correlation coefficients as intin the [-100, 100]range, you could do the following:

假设您对显示集群关系感兴趣,seaborn评论中提到的包也有一个clustermap。使用您的相关矩阵(看起来像你想显示的相关系数为int[-100, 100]范围内,你可以做到以下几点:

corr = df.corr().mul(100).astype(int)

     GX   HG   RM   SJ   XB   XN   ZG
GX  100   77   62   71   48   66   57
HG   77  100   69   74   61   61   58
RM   62   69  100   75   48   64   68
SJ   71   74   75  100   50   70   65
XB   48   61   48   50  100   46   51
XN   66   61   64   70   46  100   75
ZG   57   58   68   65   51   75  100

and then use seaborn.clustermap()as follows:

然后使用seaborn.clustermap()如下:

import seaborn as sns
sns.clustermap(data=corr, annot=True, fmt='d', cmap='Greens').savefig('cluster.png')

enter image description here

在此处输入图片说明

回答by Mengshan

I just discovered this Python package biokit today. It provides a very handy function to create various kinds of correlation charts. For example:

我今天刚刚发现了这个 Python 包 biokit。它提供了一个非常方便的功能来创建各种相关图表。例如:

In [1]: import pandas as pd

In [2]: import matplotlib.pyplot as plt
   ...: from biokit.viz import corrplot

In [6]: corr
Out[6]: 
      GX    HG    RM    SJ    XB    XN    ZG
GX  1.00 -0.77  0.62  0.71  0.48  0.66  0.57
HG -0.77  1.00  0.69  0.74  0.61  0.61  0.58
RM  0.62  0.69  1.00  0.75  0.48  0.64  0.68
SJ  0.71  0.74  0.75  1.00  0.50  0.70  0.65
XB  0.48  0.61  0.48  0.50  1.00 -0.46  0.51
XN  0.66  0.61  0.64  0.70 -0.46  1.00  0.75
ZG  0.57  0.58  0.68  0.65  0.51  0.75  1.00

I took Stefan's data and modified it a little bit. Let's assume this is a correlation matrix. Now to create a correlation chart, you can simply do this:

我拿了 Stefan 的数据并稍微修改了一下。让我们假设这是一个相关矩阵。现在要创建相关图表,您可以简单地执行以下操作:

In [7]: c = corrplot.Corrplot(corr)
   ...: c.plot()

Correlation chart with ellipses

带椭圆的相关图

You can read more examples here.

您可以在此处阅读更多示例。