在 python pandas 中按列分层的箱线图

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23232989/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:57:34  来源:igfitidea点击:

Boxplot stratified by column in python pandas

pythonmatplotlibpandasboxplot

提问by user308827

I would like to draw a boxplot for the following pandas dataframe:

我想为以下Pandas数据框绘制箱线图:

> p1.head(10)

   N0_YLDF    MAT
0     1.29  13.67
1     2.32  10.67
2     6.24  11.29
3     5.34  21.29
4     6.35  41.67
5     5.35  91.67
6     9.32  21.52
7     6.32  31.52
8     3.33  13.52
9     4.56  44.52

I want the boxplots to be of the column 'N0_YLDF', but they should be stratified by 'MAT'. When I use the foll. command:

我希望箱线图属于“N0_YLDF”列,但它们应按“MAT”分层。当我使用 foll. 命令:

p1.boxplot(column='N0_YLDF',by='MAT')

It uses all the unique MAT values, which in the full p1 dataframe number around 15,000. This results in an incomprehensible boxplot.

它使用所有唯一的 MAT 值,在完整的 p1 数据帧中大约有 15,000 个。这会导致难以理解的箱线图。

Is there any way I can stratify the MAT values, so that I get a different boxplot of N0_YLDF for the first quartile of MAT values and so on....

有什么方法可以对 MAT 值进行分层,以便我得到 N0_YLDF 的不同箱线图,用于 MAT 值的第一个四分位数等等......

thanks!

谢谢!

采纳答案by CT Zhu

pandas.qcutwill give you the quantiles, but a histogram-like operation will require some numpytrickery which comes in handy here:

pandas.qcut会给你分位数,但类似直方图的操作需要一些numpy技巧,在这里派上用场:

_, breaks = np.histogram(df.MAT, bins=5)
ax = df.boxplot(column='N0_YLDF', by='Class')
ax.xaxis.set_ticklabels(['%s'%val for i, val in enumerate(breaks) if i in df.Class])

enter image description here

在此处输入图片说明

The dataframe now looks like this:

数据框现在看起来像这样:

   N0_YLDF    MAT  Class
0     1.29  13.67      1
1     2.32  10.67      0
2     6.24  11.29      1
3     5.34  21.29      1
4     6.35  41.67      2
5     5.35  91.67      5
6     9.32  21.52      1
7     6.32  31.52      2
8     3.33  13.52      1
9     4.56  44.52      3

[10 rows x 3 columns]

It can also be used to get the quartile plot:

它还可以用于获取四分位数图:

breaks = np.asarray(np.percentile(df.MAT, [25,50,75,100]))
df['Class'] = (df.MAT.values > breaks[..., np.newaxis]).sum(0)
ax = df.boxplot(column='N0_YLDF', by='Class')
ax.xaxis.set_ticklabels(['%s'%val for val in breaks])

enter image description here

在此处输入图片说明

回答by Marius

Pandas has the cutand qcutfunctions to make stratifying variables like this easy:

Pandas 具有cutqcut功能,可以轻松地对这样的变量进行分层:

# Just asking for split into 4 equal groups (i.e. quartiles) here,
# but you can split on custom quantiles by passing in an array
p1['MAT_quartiles'] = pd.qcut(p1['MAT'], 4, labels=['0-25%', '25-50%', '50-75%', '75-100%'])
p1.boxplot(column='N0_YLDF', by='MAT_quartiles')

Output:

输出:

enter image description here

在此处输入图片说明