Python 相关热图

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39409866/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:13:42  来源:igfitidea点击:

Correlation heatmap

pythoncorrelation

提问by Marko

I want to represent correlation matrix using a heatmap. There is something called correlogramin R, but I don't think there's such a thing in Python.

我想使用热图表示相关矩阵。R 中有一种叫做相关图的东西,但我认为 Python 中没有这种东西。

How can I do this? The values go from -1 to 1, for example:

我怎样才能做到这一点?值从 -1 到 1,例如:

[[ 1.          0.00279981  0.95173379  0.02486161 -0.00324926 -0.00432099]
 [ 0.00279981  1.          0.17728303  0.64425774  0.30735071  0.37379443]
 [ 0.95173379  0.17728303  1.          0.27072266  0.02549031  0.03324756]
 [ 0.02486161  0.64425774  0.27072266  1.          0.18336236  0.18913512]
 [-0.00324926  0.30735071  0.02549031  0.18336236  1.          0.77678274]
 [-0.00432099  0.37379443  0.03324756  0.18913512  0.77678274  1.        ]]

I was able to produce the following heatmap based on another question, but the problem is that my values get 'cut' at 0, so I would like to have a map which goes from blue(-1) to red(1), or something like that, but here values below 0 are not presented in an adequate way.

我能够根据另一个问题生成以下热图,但问题是我的值在 0 处被“切割”,所以我想要一张从蓝色(-1)到红色(1)的地图,或者类似的东西,但这里低于 0 的值没有以适当的方式呈现。

enter image description here

在此处输入图片说明

Here's the code for that:

这是代码:

plt.imshow(correlation_matrix,cmap='hot',interpolation='nearest')

回答by mrandrewandrade

Another alternative is to use the heatmap function in seaborn to plot the covariance. This example uses the Auto data set from the ISLR package in R (the same as in the example you showed).

另一种选择是使用 seaborn 中的热图函数来绘制协方差。此示例使用 R 中 ISLR 包中的 Auto 数据集(与您展示的示例相同)。

import pandas.rpy.common as com
import seaborn as sns
%matplotlib inline

# load the R package ISLR
infert = com.importr("ISLR")

# load the Auto dataset
auto_df = com.load_data('Auto')

# calculate the correlation matrix
corr = auto_df.corr()

# plot the heatmap
sns.heatmap(corr, 
        xticklabels=corr.columns,
        yticklabels=corr.columns)

enter image description here

在此处输入图片说明

If you wanted to be even more fancy, you can use Pandas Style, for example:

如果你想更花哨,你可以使用Pandas Style,例如:

cmap = cmap=sns.diverging_palette(5, 250, as_cmap=True)

def magnify():
    return [dict(selector="th",
                 props=[("font-size", "7pt")]),
            dict(selector="td",
                 props=[('padding', "0em 0em")]),
            dict(selector="th:hover",
                 props=[("font-size", "12pt")]),
            dict(selector="tr:hover td:hover",
                 props=[('max-width', '200px'),
                        ('font-size', '12pt')])
]

corr.style.background_gradient(cmap, axis=1)\
    .set_properties(**{'max-width': '80px', 'font-size': '10pt'})\
    .set_caption("Hover to magify")\
    .set_precision(2)\
    .set_table_styles(magnify())

enter image description here

在此处输入图片说明

回答by FatiHe

If your data is in a Pandas DataFrame, you can use Seaborn's heatmapfunction to create your desired plot.

如果您的数据在 Pandas DataFrame 中,您可以使用 Seaborn 的heatmap函数来创建您想要的图。

import seaborn as sns

Var_Corr = df.corr()
# plot the heatmap and annotation on it
sns.heatmap(Var_Corr, xticklabels=Var_Corr.columns, yticklabels=Var_Corr.columns, annot=True)

Correlation plot

Correlation plot

From the question, it looks like the data is in a NumPy array. If that array has the name numpy_data, before you can use the step above, you would want to put it into a Pandas DataFrame using the following:

从问题来看,数据看起来像是在 NumPy 数组中。如果该数组具有 name numpy_data,则在使用上述步骤之前,您需要使用以下命令将其放入 Pandas DataFrame 中:

import pandas as pd
df = pd.DataFrame(numpy_data)

回答by vestland

The code below will produce this plot:

下面的代码将产生这个图:

enter image description here

在此处输入图片说明

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# A list with your data slightly edited
l = [1.0,0.00279981,0.95173379,0.02486161,-0.00324926,-0.00432099,
0.00279981,1.0,0.17728303,0.64425774,0.30735071,0.37379443,
0.95173379,0.17728303,1.0,0.27072266,0.02549031,0.03324756,
0.02486161,0.64425774,0.27072266,1.0,0.18336236,0.18913512,
-0.00324926,0.30735071,0.02549031,0.18336236,1.0,0.77678274,
-0.00432099,0.37379443,0.03324756,0.18913512,0.77678274,1.00]

# Split list
n = 6
data = [l[i:i + n] for i in range(0, len(l), n)]

# A dataframe
df = pd.DataFrame(data)

def CorrMtx(df, dropDuplicates = True):

    # Your dataset is already a correlation matrix.
    # If you have a dateset where you need to include the calculation
    # of a correlation matrix, just uncomment the line below:
    # df = df.corr()

    # Exclude duplicate correlations by masking uper right values
    if dropDuplicates:    
        mask = np.zeros_like(df, dtype=np.bool)
        mask[np.triu_indices_from(mask)] = True

    # Set background color / chart style
    sns.set_style(style = 'white')

    # Set up  matplotlib figure
    f, ax = plt.subplots(figsize=(11, 9))

    # Add diverging colormap from red to blue
    cmap = sns.diverging_palette(250, 10, as_cmap=True)

    # Draw correlation plot with or without duplicates
    if dropDuplicates:
        sns.heatmap(df, mask=mask, cmap=cmap, 
                square=True,
                linewidth=.5, cbar_kws={"shrink": .5}, ax=ax)
    else:
        sns.heatmap(df, cmap=cmap, 
                square=True,
                linewidth=.5, cbar_kws={"shrink": .5}, ax=ax)


CorrMtx(df, dropDuplicates = False)

I put this together after it was announced that the outstanding seaborn corrplotwas to be deprecated. The snippet above makes a resembling correlation plot based on seaborn heatmap. You can also specify the color range and select whether or not to drop duplicate correlations. Notice that I've used the same numbers as you, but that I've put them in a pandas dataframe. Regarding the choice of colors you can have a look at the documents for sns.diverging_palette. You asked for blue, but that falls out of this particular range of the color scale with your sample data. For both observations of 0.95173379, try changing to -0.95173379 and you'll get this:

在宣布seaborn corrplot要弃用未完成的项目后,我将其放在一起。上面的代码片段基于 制作了一个类似的相关图seaborn heatmap。您还可以指定颜色范围并选择是否删除重复的相关性。请注意,我使用了与您相同的数字,但我将它们放入了一个 Pandas 数据框中。关于颜色的选择,您可以查看sns.diverging_palette的文档。您要求使用蓝色,但这超出了您的样本数据的色标的特定范围。对于 0.95173379 的两个观测值,尝试更改为 -0.95173379,您将得到:

enter image description here

在此处输入图片说明

回答by Bernhard

You can use matplotlibfor this. There's a similar question which shows how you can achieve what you want: Plotting a 2D heatmap with Matplotlib

您可以为此使用matplotlib。有一个类似的问题,它展示了如何实现你想要的:Plotting a 2D heatmap with Matplotlib

回答by ypnos

  1. Use the 'jet' colormap for a transition between blue and red.
  2. Use pcolor()with the vmin, vmaxparameters.
  1. 使用“jet”颜色图实现蓝色和红色之间的过渡。
  2. 使用pcolor()vminvmax参数。

It is detailed in this answer: https://stackoverflow.com/a/3376734/21974

在这个答案中有详细说明:https: //stackoverflow.com/a/3376734/21974