pandas 如何将相关矩阵绘制为一组椭圆,类似于 R 露天包?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34556180/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I plot a correlation matrix as a set of ellipses, similar to the R open-air package?
提问by Han Zhengzu
The figure below is plotted using the open-air R package:
下图是使用open-air R包绘制的:
I know matplotlib has the plt.matshow
function,
but it can't clearly show the relation between variables at the same time.
我知道matplotlib有这个plt.matshow
功能,
但是它不能同时清楚地显示变量之间的关系。
Here is my early work:
这是我早期的作品:
dfis a pandas dataframe with 7 variables shows like below:
df是一个包含 7 个变量的 Pandas 数据框,如下所示:
I don't know how to attach a .csv
file to StackOverflow.
我不知道如何将.csv
文件附加到 StackOverflow。
Using plt.matshow(df.corr(),cmap = plt.cm.Greens)
, the figure shows like this:
使用plt.matshow(df.corr(),cmap = plt.cm.Greens)
,如图所示:
The second figure can't represent the correlation relations of the variables as clearly as the first one.
第二个数字不能像第一个数字那样清楚地表示变量的相关关系。
Edit:
编辑:
I upload the csv file to Google docs here.
我在此处将 csv 文件上传到 Google 文档。
回答by ali_m
I'm not aware of any existing Python library that does these "ellipse plots", but it's not particularly hard to implement using a matplotlib.collections.EllipseCollection
:
我不知道有任何现有的 Python 库可以执行这些“椭圆图”,但是使用以下命令来实现并不是特别困难matplotlib.collections.EllipseCollection
:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib.collections import EllipseCollection
def plot_corr_ellipses(data, ax=None, **kwargs):
M = np.array(data)
if not M.ndim == 2:
raise ValueError('data must be a 2D array')
if ax is None:
fig, ax = plt.subplots(1, 1, subplot_kw={'aspect':'equal'})
ax.set_xlim(-0.5, M.shape[1] - 0.5)
ax.set_ylim(-0.5, M.shape[0] - 0.5)
# xy locations of each ellipse center
xy = np.indices(M.shape)[::-1].reshape(2, -1).T
# set the relative sizes of the major/minor axes according to the strength of
# the positive/negative correlation
w = np.ones_like(M).ravel()
h = 1 - np.abs(M).ravel()
a = 45 * np.sign(M).ravel()
ec = EllipseCollection(widths=w, heights=h, angles=a, units='x', offsets=xy,
transOffset=ax.transData, array=M.ravel(), **kwargs)
ax.add_collection(ec)
# if data is a DataFrame, use the row/column names as tick labels
if isinstance(data, pd.DataFrame):
ax.set_xticks(np.arange(M.shape[1]))
ax.set_xticklabels(data.columns, rotation=90)
ax.set_yticks(np.arange(M.shape[0]))
ax.set_yticklabels(data.index)
return ec
For example, using your data:
例如,使用您的数据:
data = df.corr()
fig, ax = plt.subplots(1, 1)
m = plot_corr_ellipses(data, ax=ax, cmap='Greens')
cb = fig.colorbar(m)
cb.set_label('Correlation coefficient')
ax.margins(0.1)
Negative correlations can be plotted as ellipses with the opposite orientation:
负相关可以绘制为相反方向的椭圆:
fig2, ax2 = plt.subplots(1, 1)
data2 = np.linspace(-1, 1, 9).reshape(3, 3)
m2 = plot_corr_ellipses(data2, ax=ax2, cmap='seismic', clim=[-1, 1])
cb2 = fig2.colorbar(m2)
ax2.margins(0.3)
回答by Stefan
Assuming you are interested in showing cluster relations, the seaborn
package mentioned in the comments also has a clustermap. Using your correlation matrix (looks like you want to show correlation coefficients as int
in the [-100, 100]
range, you could do the following:
假设您对显示集群关系感兴趣,seaborn
评论中提到的包也有一个clustermap。使用您的相关矩阵(看起来像你想显示的相关系数为int
的[-100, 100]
范围内,你可以做到以下几点:
corr = df.corr().mul(100).astype(int)
GX HG RM SJ XB XN ZG
GX 100 77 62 71 48 66 57
HG 77 100 69 74 61 61 58
RM 62 69 100 75 48 64 68
SJ 71 74 75 100 50 70 65
XB 48 61 48 50 100 46 51
XN 66 61 64 70 46 100 75
ZG 57 58 68 65 51 75 100
and then use seaborn.clustermap()
as follows:
然后使用seaborn.clustermap()
如下:
import seaborn as sns
sns.clustermap(data=corr, annot=True, fmt='d', cmap='Greens').savefig('cluster.png')
回答by Mengshan
I just discovered this Python package biokit today. It provides a very handy function to create various kinds of correlation charts. For example:
我今天刚刚发现了这个 Python 包 biokit。它提供了一个非常方便的功能来创建各种相关图表。例如:
In [1]: import pandas as pd
In [2]: import matplotlib.pyplot as plt
...: from biokit.viz import corrplot
In [6]: corr
Out[6]:
GX HG RM SJ XB XN ZG
GX 1.00 -0.77 0.62 0.71 0.48 0.66 0.57
HG -0.77 1.00 0.69 0.74 0.61 0.61 0.58
RM 0.62 0.69 1.00 0.75 0.48 0.64 0.68
SJ 0.71 0.74 0.75 1.00 0.50 0.70 0.65
XB 0.48 0.61 0.48 0.50 1.00 -0.46 0.51
XN 0.66 0.61 0.64 0.70 -0.46 1.00 0.75
ZG 0.57 0.58 0.68 0.65 0.51 0.75 1.00
I took Stefan's data and modified it a little bit. Let's assume this is a correlation matrix. Now to create a correlation chart, you can simply do this:
我拿了 Stefan 的数据并稍微修改了一下。让我们假设这是一个相关矩阵。现在要创建相关图表,您可以简单地执行以下操作:
In [7]: c = corrplot.Corrplot(corr)
...: c.plot()
Correlation chart with ellipses
You can read more examples here.
您可以在此处阅读更多示例。