pandas 如何用来自不同数据集的“边际”（分布直方图）覆盖 Seaborn 联合图

Question

提问by Nonchalant

I have plotted a Seaborn JointPlotfrom a set of "observed counts vs concentration" which are stored in a pandas DataFrame. I would like to overlay (on the same set of axes) a marginal (ie: univariate distribution) of the "expected counts" for each concentration on top of the existing marginal, so that the difference can be easily compared.

我JointPlot从一组存储在 pandas 中的“观察到的计数与浓度”中绘制了一个 Seaborn DataFrame。我想在现有边缘之上叠加（在同一组轴上）每个浓度的“预期计数”的边缘（即：单变量分布），以便可以轻松比较差异。

This graph is very similar to what I want, although it will have different axes and only two datasets:

该图与我想要的非常相似，尽管它具有不同的轴并且只有两个数据集：

Here is an example of how my data is laid out and related:

以下是我的数据如何布局和关联的示例：

df_observed

df_观察到

x axis--> log2(concentration): 1,1,1,2,3,3,3 (zero-counts have been omitted)

y axis--> log2(count): 4.5, 5.7, 5.0, 9.3, 16.0, 16.5, 15.4 (zero-counts have been omitted)

df_expected

x axis--> log2(concentration): 1,1,1,2,2,2,3,3,3

an overlaying of the distribution of df_expectedon top of that of df_observedwould therefore indicate where there were counts missing at each concentration.

因此，在的分布df_expected之上叠加分布df_observed将表明每个浓度的计数缺失的位置。

What I currently have

我目前拥有的

Jointplot with the observed counts at each concentration Separate jointplot of the expected counts at each concentration. I want the marginal from this plot to be overlaid on top of the marginal from the above jointplot

带有每个浓度下观察到的计数的联合图每个浓度下预期计数的单独联合图。我希望这个情节的边缘覆盖在上述联合情节的边缘之上

PS: I am new to Stack Overflow so any suggestions about how to better ask questions will be met with gratitude. Also, I have searched extensively for an answer to my question but to no avail. In addition, a Plotly solution would be equally helpful. Thank you

PS：我是 Stack Overflow 的新手，所以任何关于如何更好地提问的建议都将不胜感激。此外，我已经广泛搜索了我的问题的答案，但无济于事。此外，Plotly 解决方案同样有帮助。谢谢

Answer 1

回答by ntg

Wrote a function to plot it, very loosly based on @blue_chip's idea. You might still need to tweak it a bit for your specific needs.

写了一个函数来绘制它，非常松散地基于@blue_chip 的想法。您可能仍需要根据您的特定需求对其进行一些调整。

Here is an example usage:

这是一个示例用法：

Example data:

示例数据：

import seaborn as sns, numpy as np, matplotlib.pyplot as plt, pandas as 

pd
n=1000
m1=-3
m2=3

df1 = pd.DataFrame((np.random.randn(n)+m1).reshape(-1,2), columns=['x','y'])
df2 = pd.DataFrame((np.random.randn(n)+m2).reshape(-1,2), columns=['x','y'])
df3 = pd.DataFrame(df1.values+df2.values, columns=['x','y'])
df1['kind'] = 'dist1'
df2['kind'] = 'dist2'
df3['kind'] = 'dist1+dist2'
df=pd.concat([df1,df2,df3])

Function definition:

函数定义：

def multivariateGrid(col_x, col_y, col_k, df, k_is_color=False, scatter_alpha=.5):
    def colored_scatter(x, y, c=None):
        def scatter(*args, **kwargs):
            args = (x, y)
            if c is not None:
                kwargs['c'] = c
            kwargs['alpha'] = scatter_alpha
            plt.scatter(*args, **kwargs)

        return scatter

    g = sns.JointGrid(
        x=col_x,
        y=col_y,
        data=df
    )
    color = None
    legends=[]
    for name, df_group in df.groupby(col_k):
        legends.append(name)
        if k_is_color:
            color=name
        g.plot_joint(
            colored_scatter(df_group[col_x],df_group[col_y],color),
        )
        sns.distplot(
            df_group[col_x].values,
            ax=g.ax_marg_x,
            color=color,
        )
        sns.distplot(
            df_group[col_y].values,
            ax=g.ax_marg_y,
            color=color,            
            vertical=True
        )
    # Do also global Hist:
    sns.distplot(
        df[col_x].values,
        ax=g.ax_marg_x,
        color='grey'
    )
    sns.distplot(
        df[col_y].values.ravel(),
        ax=g.ax_marg_y,
        color='grey',
        vertical=True
    )
    plt.legend(legends)

Usage:

用法：

multivariateGrid('x', 'y', 'kind', df=df)

Answer 2

回答by blue_chip

Whenever I try to modify a JointPlot more than for what it was intended for, I turn to a JointGrid instead. It allows you to change the parameters of the plots in the marginals.

每当我尝试修改 JointPlot 而不是它的用途时，我都会转而使用 JointGrid。它允许您更改边缘图的参数。

Below is an example of a working JointGrid where I add another histogram for each marginal. These histograms represent the expected value that you wanted to add. Keep in mind that I generated random data so it probably doesn't look like yours.

下面是一个工作 JointGrid 的例子，我为每个边缘添加另一个直方图。这些直方图代表您想要添加的预期值。请记住，我生成了随机数据，所以它可能看起来不像你的。

Take a look at the code, where I altered the range of each second histogram to match the range from the observed data.

看一下代码，我改变了每秒直方图的范围以匹配观察数据的范围。

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.randn(100,4), columns = ['x', 'y', 'z', 'w'])

plt.ion()
plt.show()
plt.pause(0.001)

p = sns.JointGrid(
    x = df['x'],
    y = df['y']
    )

p = p.plot_joint(
    plt.scatter
    )

p.ax_marg_x.hist(
    df['x'],
    alpha = 0.5
    )

p.ax_marg_y.hist(
    df['y'],
    orientation = 'horizontal',
    alpha = 0.5
    )

p.ax_marg_x.hist(
    df['z'],
    alpha = 0.5,
    range = (np.min(df['x']), np.max(df['x']))
    )

p.ax_marg_y.hist(
    df['w'],
    orientation = 'horizontal',
    alpha = 0.5,
    range = (np.min(df['y']), np.max(df['y'])),
    )

The part where I call plt.ion plt.show plt.pauseis what I use to display the figure. Otherwise, no figure appears on my computer. You might not need this part.

我调用的部分plt.ion plt.show plt.pause是我用来显示图形的部分。否则，我的电脑上不会出现任何图形。您可能不需要这部分。

Welcome to Stack Overflow!

欢迎使用堆栈溢出！

Answer 3

回答by mwaskom

You can plot directly onto the JointGrid.ax_marg_xand JointGrid.ax_marg_yattributes, which are the underlying matplotlib axes.

您可以直接在JointGrid.ax_marg_x和JointGrid.ax_marg_y属性上绘图，它们是底层的 matplotlib 轴。

pandas 如何用来自不同数据集的“边际”（分布直方图）覆盖 Seaborn 联合图

提问by Nonchalant

回答by ntg

回答by blue_chip

回答by mwaskom

相关推荐

最近更新

标签

pandas 如何用来自不同数据集的“边际”（分布直方图）覆盖 Seaborn 联合图

提问by Nonchalant

回答by ntg

回答by blue_chip

回答by mwaskom

相关推荐

pandas Python：如何将数据框字典变成一个大数据框，其中列名是前一个字典的键？

pandas np.where 多个返回值

pandas 熊猫格兰杰因果关系

在 Pandas DataFrame 列上应用阈值

相关推荐

最近更新

标签