pandas 使用 Seaborn FacetGrid 从数据框中绘制错误条

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24878095/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:16:42  来源:igfitidea点击:

Plotting errors bars from dataframe using Seaborn FacetGrid

pythonmatplotlibplotpandasseaborn

提问by elfnor

I want to plot error bars from a column in a pandas dataframe on a Seaborn FacetGrid

我想从 Seaborn FacetGrid 上的 Pandas 数据框中的一列绘制误差线

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar']*2,
                   'B' : ['one', 'one', 'two', 'three',
                         'two', 'two', 'one', 'three'],
                  'C' : np.random.randn(8),
                  'D' : np.random.randn(8)})
df

Example dataframe

示例数据框

    A       B        C           D
0   foo     one      0.445827   -0.311863
1   bar     one      0.862154   -0.229065
2   foo     two      0.290981   -0.835301
3   bar     three    0.995732    0.356807
4   foo     two      0.029311    0.631812
5   bar     two      0.023164   -0.468248
6   foo     one     -1.568248    2.508461
7   bar     three   -0.407807    0.319404

This code works for fixed size error bars:

此代码适用于固定大小的误差线:

g = sns.FacetGrid(df, col="A", hue="B", size =5)
g.map(plt.errorbar, "C", "D",yerr=0.5, fmt='o');

enter image description here

在此处输入图片说明

But I can't get it to work using values from the dataframe

但我无法使用数据帧中的值使其工作

df['E'] = abs(df['D']*0.5)
g = sns.FacetGrid(df, col="A", hue="B", size =5)
g.map(plt.errorbar, "C", "D", yerr=df['E']);

or

或者

g = sns.FacetGrid(df, col="A", hue="B", size =5)
g.map(plt.errorbar, "C", "D", yerr='E');

both produce screeds of errors

两者都会产生大量错误

EDIT:

编辑:

After lots of matplotlib doc reading, and assorted stackoverflow answers, here is a pure matplotlib solution

在阅读了大量 matplotlib 文档和各种 stackoverflow 答案之后,这里是一个纯 matplotlib 解决方案

#define a color palette index based on column 'B'
df['cind'] = pd.Categorical(df['B']).labels

#how many categories in column 'A'
cats = df['A'].unique()
cats.sort()

#get the seaborn colour palette and convert to array
cp = sns.color_palette()
cpa = np.array(cp)

#draw a subplot for each category in column "A"
fig, axs = plt.subplots(nrows=1, ncols=len(cats), sharey=True)
for i,ax in enumerate(axs):
    df_sub = df[df['A'] == cats[i]]
    col = cpa[df_sub['cind']]
    ax.scatter(df_sub['C'], df_sub['D'], c=col)
    eb = ax.errorbar(df_sub['C'], df_sub['D'], yerr=df_sub['E'], fmt=None)
    a, (b, c), (d,) = eb.lines
    d.set_color(col)

Other than the labels, and axis limits its OK. Its plotted a separate subplot for each category in column 'A', colored by the category in column 'B'. (Note the random data is different to that above)

除了标签,轴限制了它的确定。它为“A”列中的每个类别绘制了一个单独的子图,由“B”列中的类别着色。(注意随机数据与上面不同)

I'd still like a pandas/seaborn solution if anyone has any ideas?

如果有人有任何想法,我仍然想要Pandas/seaborn 解决方案?

enter image description here

在此处输入图片说明

采纳答案by mwaskom

When using FacetGrid.map, anything that refers to the dataDataFrame must be passed as a positional argument. This will work in your case because yerris the third positional argument for plt.errorbar, though to demonstrate I'm going to use the tips dataset:

使用时FacetGrid.map,任何引用dataDataFrame 的内容都必须作为位置参数传递。这将适用于您的情况,因为yerr是 的第三个位置参数plt.errorbar,但为了演示我将使用提示数据集:

from scipy import stats
tips_all = sns.load_dataset("tips")
tips_grouped = tips_all.groupby(["smoker", "size"])
tips = tips_grouped.mean()
tips["CI"] = tips_grouped.total_bill.apply(stats.sem) * 1.96
tips.reset_index(inplace=True)

I can then plot using FacetGridand errorbar:

然后我可以使用FacetGridand进行绘图errorbar

g = sns.FacetGrid(tips, col="smoker", size=5)
g.map(plt.errorbar, "size", "total_bill", "CI", marker="o")

enter image description here

在此处输入图片说明

However, keep in mind that the there are seaborn plotting functions for going from a full dataset to plots with errorbars (using bootstrapping), so for a lot of applications this may not be necessary. For example, you could use factorplot:

但是,请记住,有用于从完整数据集到带有误差条的绘图(使用引导)的 seaborn 绘图函数,因此对于许多应用程序,这可能不是必需的。例如,您可以使用factorplot

sns.factorplot("size", "total_bill", col="smoker",
               data=tips_all, kind="point")

enter image description here

在此处输入图片说明

Or lmplot:

lmplot

sns.lmplot("size", "total_bill", col="smoker",
           data=tips_all, fit_reg=False, x_estimator=np.mean)

enter image description here

在此处输入图片说明

回答by mwaskom

You aren't showing what df['E']actually is, and if it is a list of the same length as df['C']and df['D'].

你没有显示df['E']实际是什么,如果它是一个与df['C']and长度相同的列表df['D']

The yerrkeyword argument (kwarg) takes either a single value that will be applied for every element in the lists for keys C and D from the dataframe, or it needs a list of values the same length as those lists.

yerr关键字参数(kwarg)采用了将在列出了从数据帧被应用于每一个元素为键C和d是单个值,或者它需要的值的列表相同的长度的那些列表。

So, C, D, and E must all be associated with lists of the same length, or C and D must be lists of the same length and E must be associated with a single floator int. If that single floator intis inside a list, you must extract it, like df['E'][0].

因此,C、D 和 E 必须都与相同长度的列表相关联,或者 C 和 D 必须是相同长度的列表,而 E 必须与单个float或相关联int。如果该单曲floatint在列表中,则必须提取它,例如df['E'][0].

Example matplotlibcode with yerr: http://matplotlib.org/1.2.1/examples/pylab_examples/errorbar_demo.html

示例matplotlib代码yerrhttp: //matplotlib.org/1.2.1/examples/pylab_examples/errorbar_demo.html

Bar plot API documentation describing yerr: http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.bar

条形图 API 文档描述yerrhttp: //matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.bar