Pandas groupby boxplots 的样式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19453994/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:15:18  来源:igfitidea点击:

Styling of Pandas groupby boxplots

pythonmatplotlibpandas

提问by Walton Jones

The normal matplotlib boxplot command in Python returns a dictionary with keys for the boxes, median, whiskers, fliers, and caps. This makes styling really easy.

Python 中的普通 matplotlib boxplot 命令返回一个字典,其中包含框、中值、须线、传单和大写字母的键。这使得造型非常容易。

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Create a dataframe and subset it for a boxplot
df1 = pd.DataFrame(rand(10), columns=['Col1'] )
df1['X'] = pd.Series(['A','B','A','B','A','B','A','B','A','B'])
boxes= [df1[df1['X'] == 'A'].Col1, df1[df1['X'] == 'B'].Col1]

# Call the standard matplotlib boxplot function,
# which returns a dictionary including the parts of the graph
mbp = plt.boxplot(boxes)
print(type(mbp))

# This dictionary output makes styling the boxplot easy
plt.setp(mbp['boxes'], color='blue')
plt.setp(mbp['medians'], color='red')
plt.setp(mbp['whiskers'], color='blue')
plt.setp(mbp['fliers'], color='blue')

The Pandas library has an "optimized" boxplot function for its grouped (hierarchically indexed ) dataframes. Instead of returning several dictionaries for each group, however, it returns an matplotlib.axes.AxesSubplot object. This makes styling very difficult.

Pandas 库为其分组(分层索引)数据框提供了一个“优化”的箱线图函数。但是,它不会为每个组返回多个字典,而是返回一个 matplotlib.axes.AxesSubplot 对象。这使得造型非常困难。

# Pandas has a built-in boxplot function that returns
# a matplotlib.axes.AxesSubplot object
pbp = df1.boxplot(by='X')
print(type(pbp))

# Similar attempts at styling obviously return TypeErrors
plt.setp(pbp['boxes'], color='blue')
plt.setp(pbp['medians'], color='red')
plt.setp(pbp['whiskers'], color='blue')
plt.setp(pbp['fliers'], color='blue')

Is this AxisSubplot object produced by the pandas df.boxplot(by='X') function accessible?

这个由Pandas df.boxplot(by='X') 函数生成的 AxisSubplot 对象是否可访问?

采纳答案by CT Zhu

I am afraid you have to hard code. Take the pandasexample: http://pandas.pydata.org/pandas-docs/stable/visualization.html#box-plotting

恐怕你必须硬编码。就拿pandas例如:http://pandas.pydata.org/pandas-docs/stable/visualization.html#box-plotting

from pandas import *
import matplotlib
from numpy.random import rand
import matplotlib.pyplot as plt
df = DataFrame(rand(10,2), columns=['Col1', 'Col2'] )
df['X'] = Series(['A','A','A','A','A','B','B','B','B','B'])
bp = df.boxplot(by='X')
cl=bp[0].get_children()
cl=[item for item in cl if isinstance(item, matplotlib.lines.Line2D)]

Now lets identify which one is the boxes, median's, etc:

现在让我们确定哪个是方框、中位数等:

for i, item in enumerate(cl):
    if item.get_xdata().mean()>0:
        bp[0].text(item.get_xdata().mean(), item.get_ydata().mean(), str(i), va='center', ha='center')

And the plot looks like this:

情节是这样的:

enter image description here

在此处输入图片说明

Each bar consists of 8 items. e.g, The 5th item is the median. The 7th and 8th items are probably the fliers, which we don't have any here.

每个栏由 8 个项目组成。例如,第 5 项是中位数。第 7 和第 8 项可能是传单,我们这里没有。

Knowing these, to modify some part of the bar is easy. If we want to set the median to have linewidthof 2:

知道了这些,修改吧的某些部分就很容易了。如果我们想将中位数设置linewidth为 2:

for i in range(_your_number_of_classes_2_in_this_case):
    cl[5+i*8].set_linewidth(2.)

回答by vishakad

You could also specify the return_typeas dict. This will return the boxplot properties directly in a dictionary, which is indexed by each column that was plotted in the boxplot.

您还可以指定return_typeas dict。这将直接在字典中返回箱线图属性,该字典由箱线图中绘制的每一列索引。

To use the example above (in IPython):

要使用上面的示例(在 IPython 中):

from pandas import *
import matplotlib
from numpy.random import rand
import matplotlib.pyplot as plt
df = DataFrame(rand(10,2), columns=['Col1', 'Col2'] )
df['X'] = Series(['A','A','A','A','A','B','B','B','B','B'])
bp = df.boxplot( by='X', return_type='dict' )

>>> bp.keys()
['Col1', 'Col2']

>>> bp['Col1'].keys()
['boxes', 'fliers', 'medians', 'means', 'whiskers', 'caps']

Now, changing linewidths is a matter of a list comprehension :

现在,改变线宽是一个列表理解的问题:

>>> [ [item.set_linewidth( 2 ) for item in bp[key]['medians']] for key in bp.keys() ]
[[None, None], [None, None]]