Pandas scattermatrix 中的类标签
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22943894/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
class labels in Pandas scattermatrix
提问by bgschiller
This question has been asked before, Multiple data in scatter matrix, but didn't receive an answer.
之前有人问过这个问题,散点矩阵中的多个数据,但没有收到答案。
I'd like to make a scatter matrix, something like in the pandas docs, but with differently colored markers for different classes. For example, I'd like some points to appear in green and others in blue depending on the value of one of the columns (or a separate list).
我想制作一个散点矩阵,类似于pandas docs 中的内容,但对于不同的类使用不同颜色的标记。例如,我希望某些点显示为绿色,而其他点显示为蓝色,具体取决于其中一列(或单独列表)的值。
Here's an example using the Iris dataset. The color of the points represents the species of Iris -- Setosa, Versicolor, or Virginica.
这是使用 Iris 数据集的示例。点的颜色代表鸢尾花的种类——Setosa、Versicolor 或 Virginica。


Does pandas (or matplotlib) have a way to make a chart like that?
pandas(或 matplotlib)有没有办法制作这样的图表?
回答by bgschiller
Update: This functionality is now in the latest version of Seaborn. Here's an example.
更新:此功能现在在最新版本的 Seaborn 中。这是一个例子。
The following was my stopgap measure:
以下是我的权宜之计:
def factor_scatter_matrix(df, factor, palette=None):
'''Create a scatter matrix of the variables in df, with differently colored
points depending on the value of df[factor].
inputs:
df: pandas.DataFrame containing the columns to be plotted, as well
as factor.
factor: string or pandas.Series. The column indicating which group
each row belongs to.
palette: A list of hex codes, at least as long as the number of groups.
If omitted, a predefined palette will be used, but it only includes
9 groups.
'''
import matplotlib.colors
import numpy as np
from pandas.tools.plotting import scatter_matrix
from scipy.stats import gaussian_kde
if isinstance(factor, basestring):
factor_name = factor #save off the name
factor = df[factor] #extract column
df = df.drop(factor_name,axis=1) # remove from df, so it
# doesn't get a row and col in the plot.
classes = list(set(factor))
if palette is None:
palette = ['#e41a1c', '#377eb8', '#4eae4b',
'#994fa1', '#ff8101', '#fdfc33',
'#a8572c', '#f482be', '#999999']
color_map = dict(zip(classes,palette))
if len(classes) > len(palette):
raise ValueError('''Too many groups for the number of colors provided.
We only have {} colors in the palette, but you have {}
groups.'''.format(len(palette), len(classes)))
colors = factor.apply(lambda group: color_map[group])
axarr = scatter_matrix(df,figsize=(10,10),marker='o',c=colors,diagonal=None)
for rc in xrange(len(df.columns)):
for group in classes:
y = df[factor == group].icol(rc).values
gkde = gaussian_kde(y)
ind = np.linspace(y.min(), y.max(), 1000)
axarr[rc][rc].plot(ind, gkde.evaluate(ind),c=color_map[group])
return axarr, color_map
As an example, we'll use the same dataset as in the question, available here
例如,我们将使用与问题中相同的数据集,可在此处获得
>>> import pandas as pd
>>> iris = pd.read_csv('iris.csv')
>>> axarr, color_map = factor_scatter_matrix(iris,'Name')
>>> color_map
{'Iris-setosa': '#377eb8',
'Iris-versicolor': '#4eae4b',
'Iris-virginica': '#e41a1c'}


Hope this is helpful!
希望这是有帮助的!
回答by jrjc
You can also call the scattermatrix from pandas as follow :
您还可以按如下方式从 Pandas 调用 scattermatrix:
pd.scatter_matrix(df,color=colors)
with colorsbeing an list of size len(df)containing colors
与colors被的大小的列表len(df)的颜色含

