使用 python 和 matplotlib 获取箱线图中使用的值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23461713/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:01:12  来源:igfitidea点击:

Obtaining values used in boxplot, using python and matplotlib

pythonnumpymatplotlibscipy

提问by Yuxiang Wang

I can draw a boxplot from data:

我可以从数据中绘制箱线图:

import numpy as np
import matplotlib.pyplot as plt

data = np.random.rand(100)
plt.boxplot(data)

Then, the box will range from the 25th-percentile to 75th-percentile, and the whisker will range from the smallest value to the largest value between (25th-percentile - 1.5*IQR, 75th-percentile + 1.5*IQR), where the IQR denotes the inter-quartile range. (Of course, the value 1.5 is customizable).

然后,框的范围将从第 25 个百分位数到第 75 个百分位数,而胡须的范围将从最小值到最大值介于 (25th-percentile - 1.5*IQR, 75th-percentile + 1.5*IQR) 之间,其中IQR 表示四分位距。(当然,值 1.5 是可自定义的)。

Now I want to know the values used in the boxplot, i.e. the median, upper and lower quartile, the upper whisker end point and the lower whisker end point. While the former three is easy to obtain by using np.median() and np.percentile(), the end point of the whiskers will require some verbose coding:

现在我想知道箱线图中使用的值,即中位数、上四分位数和下四分位数、上晶须端点和下晶须端点。虽然前三个很容易通过使用 np.median() 和 np.percentile() 获得,但胡须的终点需要一些冗长的编码:

median = np.median(data)
upper_quartile = np.percentile(data, 75)
lower_quartile = np.percentile(data, 25)

iqr = upper_quartile - lower_quartile
upper_whisker = data[data<=upper_quartile+1.5*iqr].max()
lower_whisker = data[data>=lower_quartile-1.5*iqr].min()

I was wondering, while this is acceptable, would there be a neater way to do this? It seems that the values should be ready to pull-out from the boxplot, as it's already drawn.

我想知道,虽然这是可以接受的,但有没有更简洁的方法来做到这一点?似乎这些值应该准备好从箱线图中提取出来,因为它已经绘制好了。

Thank you!

谢谢!

采纳答案by CT Zhu

Why do you want to do so? what you are doing is already pretty direct.

你为什么要这样做?你在做什么已经很直接了。

Yeah, if you want to fetch them for the plot, when the plot is already made, simply use the get_ydata()method.

是的,如果你想为绘图获取它们,当绘图已经完成时,只需使用该get_ydata()方法。

B = plt.boxplot(data)
[item.get_ydata() for item in B['whiskers']]

It returns an array of the shape (2,) for each whiskers, the second element is the value we want:

它为每个须返回一个形状为 (2,) 的数组,第二个元素是我们想要的值:

[item.get_ydata()[1] for item in B['whiskers']]

回答by t_warsop

I've had this recently and have written a function to extract the boxplot values from the boxplot as a pandas dataframe.

我最近有了这个,并编写了一个函数来从箱线图中提取箱线图值作为熊猫数据框。

The function is:

功能是:

def get_box_plot_data(labels, bp):
    rows_list = []

    for i in range(len(labels)):
        dict1 = {}
        dict1['label'] = labels[i]
        dict1['lower_whisker'] = bp['whiskers'][i*2].get_ydata()[1]
        dict1['lower_quartile'] = bp['boxes'][i].get_ydata()[1]
        dict1['median'] = bp['medians'][i].get_ydata()[1]
        dict1['upper_quartile'] = bp['boxes'][i].get_ydata()[2]
        dict1['upper_whisker'] = bp['whiskers'][(i*2)+1].get_ydata()[1]
        rows_list.append(dict1)

    return pd.DataFrame(rows_list)

And is called by passing an array of labels (the ones that you would pass to the boxplot plotting function) and the data returned by the boxplot function itself.

并通过传递一组标签(您将传递给 boxplot 绘图函数的标签)和 boxplot 函数本身返回的数据来调用。

For example:

例如:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

def get_box_plot_data(labels, bp):
    rows_list = []

    for i in range(len(labels)):
        dict1 = {}
        dict1['label'] = labels[i]
        dict1['lower_whisker'] = bp['whiskers'][i*2].get_ydata()[1]
        dict1['lower_quartile'] = bp['boxes'][i].get_ydata()[1]
        dict1['median'] = bp['medians'][i].get_ydata()[1]
        dict1['upper_quartile'] = bp['boxes'][i].get_ydata()[2]
        dict1['upper_whisker'] = bp['whiskers'][(i*2)+1].get_ydata()[1]
        rows_list.append(dict1)

    return pd.DataFrame(rows_list)

data1 = np.random.normal(loc = 0, scale = 1, size = 1000)
data2 = np.random.normal(loc = 5, scale = 1, size = 1000)
data3 = np.random.normal(loc = 10, scale = 1, size = 1000)

labels = ['data1', 'data2', 'data3']
bp = plt.boxplot([data1, data2, data3], labels=labels)
print(get_box_plot_data(labels, bp))
plt.show()

Outputs the following from get_box_plot_data:

输出以下内容get_box_plot_data

   label  lower_whisker  lower_quartile    median  upper_quartile  upper_whisker
0  data1      -2.491652       -0.587869  0.047543        0.696750       2.559301
1  data2       2.351567        4.310068  4.984103        5.665910       7.489808
2  data3       7.227794        9.278931  9.947674       10.661581      12.733275

And produces the following plot: enter image description here

并产生以下情节: 在此处输入图片说明