与 Pandas 并排的箱线图

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44975337/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:56:40  来源:igfitidea点击:

Side-by-side boxplots with Pandas

pythonpandasboxplot

提问by Arnold Klein

I need to plot comparison of five variable, stored in pandas dataframe. I used an example from here, it worked, but now I need to change the axes and titles, but I'm struggling to do so.

我需要绘制存储在 pandas 中的五个变量的比较dataframe。我从这里使用了一个例子,它有效,但现在我需要更改轴和标题,但我正在努力这样做。

Here is my data:

这是我的数据:

df1.groupby('cls').head()
Out[171]: 
   sensitivity  specificity  accuracy       ppv       auc       cls
0     0.772091     0.824487  0.802966  0.799290  0.863700       sig
1     0.748931     0.817238  0.776366  0.785910  0.859041       sig
2     0.774016     0.805909  0.801975  0.789840  0.853132       sig
3     0.826670     0.730071  0.795715  0.784150  0.850024       sig
4     0.781112     0.803839  0.824709  0.791530  0.863411       sig
0     0.619048     0.748290  0.694969  0.686138  0.713899  baseline
1     0.642348     0.702076  0.646216  0.674683  0.712632  baseline
2     0.567344     0.765410  0.710650  0.665614  0.682502  baseline
3     0.644046     0.733645  0.754621  0.683485  0.734299  baseline
4     0.710077     0.653871  0.707933  0.684313  0.732997  baseline

Here is my code:

这是我的代码:

>> fig, axes = plt.subplots(ncols=5, figsize=(12, 5), sharey=True)
>> df1.query("cls in ['sig', 'baseline']").boxplot(by='cls', return_type='axes', ax=axes)

And the resulting pictures are:

结果图片是:

pictures of results

结果图片

How to:

如何:

  • change the title ('Boxplot groupped by cls')
  • get rid of annoying [cls] plotted along the horizontal line
  • reorder the plotted categories as they appear in df1? (first sensitivity, followed by speci...)
  • 更改标题('Boxplot 按 cls 分组')
  • 摆脱沿着水平线绘制的烦人的 [cls]
  • 对出现在 df1 中的绘制类别重新排序?(首先是灵敏度,然后是具体……)

采纳答案by Ian Thompson

I suggest using seaborn

我建议使用 seaborn

Here is an example that might help you:

这是一个可能对您有所帮助的示例:

Imports

进口

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

Make data

制作数据

data = {'sensitivity' : np.random.normal(loc = 0, size = 10),
        'specificity' : np.random.normal(loc = 0, size = 10),
        'accuracy' : np.random.normal(loc = 0, size = 10),
        'ppv' : np.random.normal(loc = 0, size = 10),
        'auc' : np.random.normal(loc = 0, size = 10),
        'cls' : ['sig', 'sig', 'sig', 'sig', 'sig', 'baseline', 'baseline', 'baseline', 'baseline', 'baseline']}

df = pd.DataFrame(data)
df

Seaborn has a nifty tool called factorplotthat creates a grid of subplots where the rows/cols are built with your data. To be able to do this, we need to "melt" the dfinto a more usable shape.

Seaborn 有一个漂亮的工具factorplot,它可以创建一个子图网格,其中的行/列是用您的数据构建的。为了能够做到这一点,我们需要“融化”df成一个更有用的形状。

df_melt = df.melt(id_vars = 'cls',
                  value_vars = ['accuracy',
                                'auc',
                                'ppv',
                                'sensitivity',
                                'specificity'],
                  var_name = 'columns')

Now we can create the factorplotusing the col "columns".

现在我们可以创建factorplot使用 col “列”。

a = sns.factorplot(data = df_melt,
                   x = 'cls',
                   y = 'value',
                   kind = 'box', # type of plot
                   col = 'columns',
                   col_order = ['sensitivity', # custom order of boxplots
                                'specificity',
                                'accuracy',
                                'ppv',
                                'auc']).set_titles('{col_name}') # remove 'column = ' part of title

plt.show()

factorplot

因子图

You can also just use Seaborn's boxplot.

您也可以只使用 Seaborn 的箱线图。

b = sns.boxplot(data = df_melt,
                hue = 'cls', # different colors for different 'cls'
                x = 'columns',
                y = 'value',
                order = ['sensitivity', # custom order of boxplots
                         'specificity',
                         'accuracy',
                         'ppv',
                         'auc'])

sns.plt.title('Boxplot grouped by cls') # You can change the title here
plt.show()

boxplot

箱形图

This will give you the same plot but all in one figure instead of subplots. It also allows you to change the title of the figure with one line. Unfortunately I can't find a way to remove the 'columns' subtitle but hopefully this will get you what you need.

这将为您提供相同的图,但都在一个图中,而不是子图。它还允许您用一行更改图形的标题。不幸的是,我找不到删除“列”副标题的方法,但希望这能满足您的需求。

EDIT

编辑

To view the plots sideways: Factorplot Swap your xand yvalues, change col = 'columns'to row = 'columns', change col_order = [...]to row_order = [...], and change '{col_name}'to '{row_name}'like so

要横向查看图: Factorplot 交换您的xy值,更改col = 'columns'row = 'columns'、更改col_order = [...]row_order = [...]和更改'{col_name}''{row_name}'喜欢这样

a1 = sns.factorplot(data = df_melt,
                    x = 'value',
                    y = 'cls',
                    kind = 'box', # type of plot
                    row = 'columns',
                    row_order = ['sensitivity', # custom order of boxplots
                                 'specificity',
                                 'accuracy',
                                 'ppv',
                                 'auc']).set_titles('{row_name}') # remove 'column = ' part of title

plt.show()

h factorplotBoxplot Swap your xand yvalues then add the parameter orient = 'h'like so

因子图Boxplot 交换您的xy值,然后orient = 'h'像这样添加参数

b1 = sns.boxplot(data = df_melt,
                 hue = 'cls',
                 x = 'value',
                 y = 'columns',
                 order = ['sensitivity', # custom order of boxplots
                         'specificity',
                         'accuracy',
                         'ppv',
                         'auc'],
                 orient = 'h')

sns.plt.title('Boxplot grouped by cls')
plt.show()

h boxplot

箱线图

回答by pceccon

Maybe this helps you:

也许这对你有帮助:

fig, axes = pyplot.subplots(ncols=4, figsize=(12, 5), sharey=True)
df.query("E in [1, 2]").boxplot(by='E', return_type='axes', ax=axes, column=list('bcda')) # Keeping original columns order
pyplot.suptitle('Boxplot') # Changing title
[ax.set_xlabel('') for ax in axes] # Changing xticks for all plots