Pandas scattermatrix 中的类标签

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22943894/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:54:32  来源:igfitidea点击:

class labels in Pandas scattermatrix

pythonmatplotlibpandasscatter-plot

提问by bgschiller

This question has been asked before, Multiple data in scatter matrix, but didn't receive an answer.

之前有人问过这个问题,散点矩阵中的多个数据,但没有收到答案。

I'd like to make a scatter matrix, something like in the pandas docs, but with differently colored markers for different classes. For example, I'd like some points to appear in green and others in blue depending on the value of one of the columns (or a separate list).

我想制作一个散点矩阵,类似于pandas docs 中的内容,但对于不同的类使用不同颜色的标记。例如,我希望某些点显示为绿色,而其他点显示为蓝色,具体取决于其中一列(或单独列表)的值。

Here's an example using the Iris dataset. The color of the points represents the species of Iris -- Setosa, Versicolor, or Virginica.

这是使用 Iris 数据集的示例。点的颜色代表鸢尾花的种类——Setosa、Versicolor 或 Virginica。

iris scattermatrix with class labels

带有类标签的虹膜散点矩阵

Does pandas (or matplotlib) have a way to make a chart like that?

pandas(或 matplotlib)有没有办法制作这样的图表?

回答by bgschiller

Update: This functionality is now in the latest version of Seaborn. Here's an example.

更新:此功能现在在最新版本的 Seaborn 中。这是一个例子

The following was my stopgap measure:

以下是我的权宜之计:

def factor_scatter_matrix(df, factor, palette=None):
    '''Create a scatter matrix of the variables in df, with differently colored
    points depending on the value of df[factor].
    inputs:
        df: pandas.DataFrame containing the columns to be plotted, as well 
            as factor.
        factor: string or pandas.Series. The column indicating which group 
            each row belongs to.
        palette: A list of hex codes, at least as long as the number of groups.
            If omitted, a predefined palette will be used, but it only includes
            9 groups.
    '''
    import matplotlib.colors
    import numpy as np
    from pandas.tools.plotting import scatter_matrix
    from scipy.stats import gaussian_kde

    if isinstance(factor, basestring):
        factor_name = factor #save off the name
        factor = df[factor] #extract column
        df = df.drop(factor_name,axis=1) # remove from df, so it 
        # doesn't get a row and col in the plot.

    classes = list(set(factor))

    if palette is None:
        palette = ['#e41a1c', '#377eb8', '#4eae4b', 
                   '#994fa1', '#ff8101', '#fdfc33', 
                   '#a8572c', '#f482be', '#999999']

    color_map = dict(zip(classes,palette))

    if len(classes) > len(palette):
        raise ValueError('''Too many groups for the number of colors provided.
We only have {} colors in the palette, but you have {}
groups.'''.format(len(palette), len(classes)))

    colors = factor.apply(lambda group: color_map[group])
    axarr = scatter_matrix(df,figsize=(10,10),marker='o',c=colors,diagonal=None)


    for rc in xrange(len(df.columns)):
        for group in classes:
            y = df[factor == group].icol(rc).values
            gkde = gaussian_kde(y)
            ind = np.linspace(y.min(), y.max(), 1000)
            axarr[rc][rc].plot(ind, gkde.evaluate(ind),c=color_map[group])

    return axarr, color_map

As an example, we'll use the same dataset as in the question, available here

例如,我们将使用与问题中相同的数据集,可在此处获得

>>> import pandas as pd
>>> iris = pd.read_csv('iris.csv')
>>> axarr, color_map = factor_scatter_matrix(iris,'Name')
>>> color_map
{'Iris-setosa': '#377eb8',
 'Iris-versicolor': '#4eae4b',
 'Iris-virginica': '#e41a1c'}

iris_scatter_matrix

iris_scatter_matrix

Hope this is helpful!

希望这是有帮助的!

回答by jrjc

You can also call the scattermatrix from pandas as follow :

您还可以按如下方式从 Pandas 调用 scattermatrix:

pd.scatter_matrix(df,color=colors)

with colorsbeing an list of size len(df)containing colors

colors被的大小的列表len(df)的颜色含