pandas 当输入是 DataFrame 时在 seaborn 中对箱线图进行分组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25284859/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Grouping boxplots in seaborn when input is a DataFrame
提问by Arman
I intend to plot multiple columns in a pandas dataframe
, all grouped by another column using groupby
inside seaborn.boxplot
. There is a nice answer here, for a similar problem in matplotlib
matplotlib: Group boxplotsbut given the fact that seaborn.boxplot
comes with groupby
option I thought it could be much easier to do this in seaborn
.
我打算在 a 中绘制多个列pandas dataframe
,所有列都使用groupby
inside由另一列分组seaborn.boxplot
。对于matplotlib
matplotlib: Group boxplots 中的类似问题,这里有一个很好的答案,但考虑到选项seaborn.boxplot
附带的事实,groupby
我认为在seaborn
.
Here we go with a reproducible example that fails:
在这里,我们使用一个失败的可重现示例:
import seaborn as sns
import pandas as pd
df = pd.DataFrame(
[
[2, 4, 5, 6, 1],
[4, 5, 6, 7, 2],
[5, 4, 5, 5, 1],
[10, 4, 7, 8, 2],
[9, 3, 4, 6, 2],
[3, 3, 4, 4, 1]
], columns=['a1', 'a2', 'a3', 'a4', 'b'])
#Plotting by seaborn
sns.boxplot(df[['a1','a2', 'a3', 'a4']], groupby=df.b)
What I get is something that completely ignores groupby
option:
我得到的是完全忽略groupby
选项的东西:
Whereas if I do this with one column it works thanks to another SO question Seaborn groupby pandas Series:
而如果我用一个专栏来做这件事,这要归功于另一个 SO 问题Seaborn groupby pandas Series:
sns.boxplot(df.a1, groupby=df.b)
So I would like to get all my columns in one plot (all columns come in a similar scale).
所以我想把我所有的列都放在一个图中(所有列都有相似的比例)。
EDIT:
编辑:
The above SO question was edited and now includes a 'not clean' answer to this problem, but it would be nice if someone has a better idea for this problem.
上面的 SO 问题已被编辑,现在包含对此问题的“不干净”答案,但如果有人对此问题有更好的主意,那就太好了。
采纳答案by MrT77
You can use directly boxplot
(I imagine when the question was asked, that was not possible, but with seaborn
version > 0.6 it is).
您可以直接使用boxplot
(我想在提出问题时,这是不可能的,但是seaborn
版本 > 0.6 是)。
As explained by @mwaskom, you have to "melt" the sample dataframe into its "long-form" where each column is a variable and each row is an observation:
正如@mwaskom 所解释的那样,您必须将示例数据框“融合”为“长格式”,其中每一列都是一个变量,每一行都是一个观察值:
df_long = pd.melt(df, "b", var_name="a", value_name="c")
Then you just plot it:
然后你只需绘制它:
sns.boxplot(x="a", hue="b", y="c", data=df_long)
回答by mwaskom
As the other answers note, the boxplot
function is limited to plotting a single "layer" of boxplots, and the groupby
parameter only has an effect when the input is a Series and you have a second variable you want to use to bin the observations into each box..
正如其他答案所指出的,该boxplot
函数仅限于绘制groupby
箱线图的单个“层”,并且该参数仅在输入是系列并且您有第二个变量要用于将观察结果放入每个框中时才有效..
However, you can accomplish what I think you're hoping for with the factorplot
function, using kind="box"
. But, you'll first have to "melt" the sample dataframe into what is called long-form or "tidy" format where each column is a variable and each row is an observation:
但是,您可以factorplot
使用kind="box"
. 但是,您首先必须将示例数据帧“融合”成所谓的长格式或“整洁”格式,其中每一列都是一个变量,每一行都是一个观察值:
df_long = pd.melt(df, "b", var_name="a", value_name="c")
Then it's very simple to plot:
然后绘制非常简单:
sns.factorplot("a", hue="b", y="c", data=df_long, kind="box")
回答by chrisb
It isn't really any better than the answer you linked, but I think the way to achieve this in seaborn is using the FacetGrid
feature, as the groupby parameter is only defined for Series passed to the boxplot function.
它并不比您链接的答案更好,但我认为在 seaborn 中实现这一点的方法是使用该FacetGrid
功能,因为 groupby 参数仅针对传递给 boxplot 函数的 Series 定义。
Here's some code - the pd.melt
is necessary because (as best I can tell) the facet mapping can only take individual columns as parameters, so the data need to be turned into a 'long' format.
这是一些代码 - 这pd.melt
是必要的,因为(据我所知)构面映射只能将单个列作为参数,因此需要将数据转换为“长”格式。
g = sns.FacetGrid(pd.melt(df, id_vars='b'), col='b')
g.map(sns.boxplot, 'value', 'variable')
回答by jrjc
Seaborn's groupby function takes Series not DataFrames, that's why it's not working.
Seaborn 的 groupby 函数采用 Series 而不是 DataFrames,这就是它不起作用的原因。
As a work around, you can do this :
作为解决方法,您可以这样做:
fig, ax = plt.subplots(1,2, sharey=True)
for i, grp in enumerate(df.filter(regex="a").groupby(by=df.b)):
sns.boxplot(grp[1], ax=ax[i])
it gives :
它给 :
Note that df.filter(regex="a")
is equivalent to df[['a1','a2', 'a3', 'a4']]
请注意,df.filter(regex="a")
相当于df[['a1','a2', 'a3', 'a4']]
a1 a2 a3 a4
0 2 4 5 6
1 4 5 6 7
2 5 4 5 5
3 10 4 7 8
4 9 3 4 6
5 3 3 4 4
Hope this helps
希望这可以帮助