pandas 当输入是 DataFrame 时在 seaborn 中对箱线图进行分组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25284859/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 15:36:31  来源:igfitidea点击:

Grouping boxplots in seaborn when input is a DataFrame

matplotlibpandasseaborn

提问by Arman

I intend to plot multiple columns in a pandas dataframe, all grouped by another column using groupbyinside seaborn.boxplot. There is a nice answer here, for a similar problem in matplotlibmatplotlib: Group boxplotsbut given the fact that seaborn.boxplotcomes with groupbyoption I thought it could be much easier to do this in seaborn.

我打算在 a 中绘制多个列pandas dataframe,所有列都使用groupbyinside由另一列分组seaborn.boxplot。对于matplotlibmatplotlib: Group boxplots 中的类似问题,这里有一个很好的答案,但考虑到选项seaborn.boxplot附带的事实,groupby我认为在seaborn.

Here we go with a reproducible example that fails:

在这里,我们使用一个失败的可重现示例:

import seaborn as sns
import pandas as pd
df = pd.DataFrame(
[
[2, 4, 5, 6, 1],
[4, 5, 6, 7, 2],
[5, 4, 5, 5, 1],
[10, 4, 7, 8, 2],
[9, 3, 4, 6, 2],
[3, 3, 4, 4, 1]
], columns=['a1', 'a2', 'a3', 'a4', 'b'])

#Plotting by seaborn
sns.boxplot(df[['a1','a2', 'a3', 'a4']], groupby=df.b)

What I get is something that completely ignores groupbyoption:

我得到的是完全忽略groupby选项的东西:

Failed groupby

分组失败

Whereas if I do this with one column it works thanks to another SO question Seaborn groupby pandas Series:

而如果我用一个专栏来做这件事,这要归功于另一个 SO 问题Seaborn groupby pandas Series

sns.boxplot(df.a1, groupby=df.b)

seaborn that does not fail

不会失败的seaborn

So I would like to get all my columns in one plot (all columns come in a similar scale).

所以我想把我所有的列都放在一个图中(所有列都有相似的比例)。

EDIT:

编辑:

The above SO question was edited and now includes a 'not clean' answer to this problem, but it would be nice if someone has a better idea for this problem.

上面的 SO 问题已被编辑,现在包含对此问题的“不干净”答案,但如果有人对此问题有更好的主意,那就太好了。

采纳答案by MrT77

You can use directly boxplot(I imagine when the question was asked, that was not possible, but with seabornversion > 0.6 it is).

您可以直接使用boxplot(我想在提出问题时,这是不可能的,但是seaborn版本 > 0.6 是)。

As explained by @mwaskom, you have to "melt" the sample dataframe into its "long-form" where each column is a variable and each row is an observation:

正如@mwaskom 所解释的那样,您必须将示例数据框“融合”为“长格式”,其中每一列都是一个变量,每一行都是一个观察值:

df_long = pd.melt(df, "b", var_name="a", value_name="c")

Then you just plot it:

然后你只需绘制它:

sns.boxplot(x="a", hue="b", y="c", data=df_long)

plot obtained with boxplot

用箱线图获得的图

回答by mwaskom

As the other answers note, the boxplotfunction is limited to plotting a single "layer" of boxplots, and the groupbyparameter only has an effect when the input is a Series and you have a second variable you want to use to bin the observations into each box..

正如其他答案所指出的,该boxplot函数仅限于绘制groupby箱线图的单个“层”,并且该参数仅在输入是系列并且您有第二个变量要用于将观察结果放入每个框中时才有效..

However, you can accomplish what I think you're hoping for with the factorplotfunction, using kind="box". But, you'll first have to "melt" the sample dataframe into what is called long-form or "tidy" format where each column is a variable and each row is an observation:

但是,您可以factorplot使用kind="box". 但是,您首先必须将示例数据帧“融合”成所谓的长格式或“整洁”格式,其中每一列都是一个变量,每一行都是一个观察值:

df_long = pd.melt(df, "b", var_name="a", value_name="c")

Then it's very simple to plot:

然后绘制非常简单:

sns.factorplot("a", hue="b", y="c", data=df_long, kind="box")

enter image description here

在此处输入图片说明

回答by chrisb

It isn't really any better than the answer you linked, but I think the way to achieve this in seaborn is using the FacetGridfeature, as the groupby parameter is only defined for Series passed to the boxplot function.

它并不比您链接的答案更好,但我认为在 seaborn 中实现这一点的方法是使用该FacetGrid功能,因为 groupby 参数仅针对传递给 boxplot 函数的 Series 定义。

Here's some code - the pd.meltis necessary because (as best I can tell) the facet mapping can only take individual columns as parameters, so the data need to be turned into a 'long' format.

这是一些代码 - 这pd.melt是必要的,因为(据我所知)构面映射只能将单个列作为参数,因此需要将数据转换为“长”格式。

g = sns.FacetGrid(pd.melt(df, id_vars='b'), col='b')
g.map(sns.boxplot, 'value', 'variable')

faceted seaborn boxplot

多面seaborn箱线图

回答by jrjc

Seaborn's groupby function takes Series not DataFrames, that's why it's not working.

Seaborn 的 groupby 函数采用 Series 而不是 DataFrames,这就是它不起作用的原因。

As a work around, you can do this :

作为解决方法,您可以这样做:

fig, ax = plt.subplots(1,2, sharey=True)
for i, grp in enumerate(df.filter(regex="a").groupby(by=df.b)):
    sns.boxplot(grp[1], ax=ax[i])

it gives : sns

它给 : 网络安全

Note that df.filter(regex="a")is equivalent to df[['a1','a2', 'a3', 'a4']]

请注意,df.filter(regex="a")相当于df[['a1','a2', 'a3', 'a4']]

   a1  a2  a3  a4
0   2   4   5   6
1   4   5   6   7
2   5   4   5   5
3  10   4   7   8
4   9   3   4   6
5   3   3   4   4

Hope this helps

希望这可以帮助