pandas 如何用来自不同数据集的“边际”(分布直方图)覆盖 Seaborn 联合图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35920885/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to overlay a Seaborn jointplot with a "marginal" (distribution histogram) from a different dataset
提问by Nonchalant
I have plotted a Seaborn JointPlot
from a set of "observed counts vs concentration" which are stored in a pandas DataFrame
. I would like to overlay (on the same set of axes) a marginal (ie: univariate distribution) of the "expected counts" for each concentration on top of the existing marginal, so that the difference can be easily compared.
我JointPlot
从一组存储在 pandas 中的“观察到的计数与浓度”中绘制了一个 Seaborn DataFrame
。我想在现有边缘之上叠加(在同一组轴上)每个浓度的“预期计数”的边缘(即:单变量分布),以便可以轻松比较差异。
This graph is very similar to what I want, although it will have different axes and only two datasets:
该图与我想要的非常相似,尽管它具有不同的轴并且只有两个数据集:
Here is an example of how my data is laid out and related:
以下是我的数据如何布局和关联的示例:
df_observed
df_观察到
x axis--> log2(concentration): 1,1,1,2,3,3,3 (zero-counts have been omitted)
y axis--> log2(count): 4.5, 5.7, 5.0, 9.3, 16.0, 16.5, 15.4 (zero-counts have been omitted)
df_expected
df_expected
x axis--> log2(concentration): 1,1,1,2,2,2,3,3,3
an overlaying of the distribution of df_expected
on top of that of df_observed
would therefore indicate where there were counts missing at each concentration.
因此,在 的分布df_expected
之上叠加分布df_observed
将表明每个浓度的计数缺失的位置。
What I currently have
我目前拥有的
Jointplot with the observed counts at each concentrationSeparate jointplot of the expected counts at each concentration. I want the marginal from this plot to be overlaid on top of the marginal from the above jointplot
带有每个浓度下观察到的计数的联合图 每个浓度下预期计数的单独联合图。我希望这个情节的边缘覆盖在上述联合情节的边缘之上
PS: I am new to Stack Overflow so any suggestions about how to better ask questions will be met with gratitude. Also, I have searched extensively for an answer to my question but to no avail. In addition, a Plotly solution would be equally helpful. Thank you
PS:我是 Stack Overflow 的新手,所以任何关于如何更好地提问的建议都将不胜感激。此外,我已经广泛搜索了我的问题的答案,但无济于事。此外,Plotly 解决方案同样有帮助。谢谢
回答by ntg
Wrote a function to plot it, very loosly based on @blue_chip's idea. You might still need to tweak it a bit for your specific needs.
写了一个函数来绘制它,非常松散地基于@blue_chip 的想法。您可能仍需要根据您的特定需求对其进行一些调整。
Here is an example usage:
这是一个示例用法:
Example data:
示例数据:
import seaborn as sns, numpy as np, matplotlib.pyplot as plt, pandas as
pd
n=1000
m1=-3
m2=3
df1 = pd.DataFrame((np.random.randn(n)+m1).reshape(-1,2), columns=['x','y'])
df2 = pd.DataFrame((np.random.randn(n)+m2).reshape(-1,2), columns=['x','y'])
df3 = pd.DataFrame(df1.values+df2.values, columns=['x','y'])
df1['kind'] = 'dist1'
df2['kind'] = 'dist2'
df3['kind'] = 'dist1+dist2'
df=pd.concat([df1,df2,df3])
Function definition:
函数定义:
def multivariateGrid(col_x, col_y, col_k, df, k_is_color=False, scatter_alpha=.5):
def colored_scatter(x, y, c=None):
def scatter(*args, **kwargs):
args = (x, y)
if c is not None:
kwargs['c'] = c
kwargs['alpha'] = scatter_alpha
plt.scatter(*args, **kwargs)
return scatter
g = sns.JointGrid(
x=col_x,
y=col_y,
data=df
)
color = None
legends=[]
for name, df_group in df.groupby(col_k):
legends.append(name)
if k_is_color:
color=name
g.plot_joint(
colored_scatter(df_group[col_x],df_group[col_y],color),
)
sns.distplot(
df_group[col_x].values,
ax=g.ax_marg_x,
color=color,
)
sns.distplot(
df_group[col_y].values,
ax=g.ax_marg_y,
color=color,
vertical=True
)
# Do also global Hist:
sns.distplot(
df[col_x].values,
ax=g.ax_marg_x,
color='grey'
)
sns.distplot(
df[col_y].values.ravel(),
ax=g.ax_marg_y,
color='grey',
vertical=True
)
plt.legend(legends)
Usage:
用法:
multivariateGrid('x', 'y', 'kind', df=df)
回答by blue_chip
Whenever I try to modify a JointPlot more than for what it was intended for, I turn to a JointGrid instead. It allows you to change the parameters of the plots in the marginals.
每当我尝试修改 JointPlot 而不是它的用途时,我都会转而使用 JointGrid。它允许您更改边缘图的参数。
Below is an example of a working JointGrid where I add another histogram for each marginal. These histograms represent the expected value that you wanted to add. Keep in mind that I generated random data so it probably doesn't look like yours.
下面是一个工作 JointGrid 的例子,我为每个边缘添加另一个直方图。这些直方图代表您想要添加的预期值。请记住,我生成了随机数据,所以它可能看起来不像你的。
Take a look at the code, where I altered the range of each second histogram to match the range from the observed data.
看一下代码,我改变了每秒直方图的范围以匹配观察数据的范围。
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randn(100,4), columns = ['x', 'y', 'z', 'w'])
plt.ion()
plt.show()
plt.pause(0.001)
p = sns.JointGrid(
x = df['x'],
y = df['y']
)
p = p.plot_joint(
plt.scatter
)
p.ax_marg_x.hist(
df['x'],
alpha = 0.5
)
p.ax_marg_y.hist(
df['y'],
orientation = 'horizontal',
alpha = 0.5
)
p.ax_marg_x.hist(
df['z'],
alpha = 0.5,
range = (np.min(df['x']), np.max(df['x']))
)
p.ax_marg_y.hist(
df['w'],
orientation = 'horizontal',
alpha = 0.5,
range = (np.min(df['y']), np.max(df['y'])),
)
The part where I call plt.ion plt.show plt.pause
is what I use to display the figure. Otherwise, no figure appears on my computer. You might not need this part.
我调用的部分plt.ion plt.show plt.pause
是我用来显示图形的部分。否则,我的电脑上不会出现任何图形。您可能不需要这部分。
Welcome to Stack Overflow!
欢迎使用堆栈溢出!
回答by mwaskom
You can plot directly onto the JointGrid.ax_marg_x
and JointGrid.ax_marg_y
attributes, which are the underlying matplotlib axes.
您可以直接在JointGrid.ax_marg_x
和JointGrid.ax_marg_y
属性上绘图,它们是底层的 matplotlib 轴。