pandas 如何将自定义列顺序(在分类上)应用于熊猫箱线图?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15541440/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:43:38  来源:igfitidea点击:

How to apply custom column order (on Categorical) to pandas boxplot?

pythonpandasboxplotcategorical-data

提问by smci

EDIT: this question arose with pandas ~0.13 and was obsoleted by direct support somewhere between version 0.15-0.18 (as per @Cireo's late answer)

编辑:这个问题出现在 pandas ~0.13 并被 0.15-0.18 版本之间的某个地方的直接支持所淘汰(根据@Cireo 的最新答案



I can get a boxplotof a salary column in a pandas DataFrame...

我可以boxplot在 Pandas DataFrame 中获得一个薪水列......

train.boxplot(column='Salary', by='Category', sym='')

...however I can't figure out how to define the index-order used on column 'Category' - I want to supply my own custom order, according to another criterion:

...但是我不知道如何定义“类别”列上使用的索引顺序 - 我想根据另一个标准提供我自己的自定义顺序:

category_order_by_mean_salary = train.groupby('Category')['Salary'].mean().order().keys()

How can I apply my custom column order to the boxplot columns? (other than ugly kludging the column names with a prefix to force ordering)

如何将自定义列顺序应用于箱线图列?(除了丑陋的列名加上前缀以强制排序)

'Category' is a string (really, should be a categorical, but this was back in 0.13, where categorical was a third-class citizen) column taking 27 distinct values: ['Accounting & Finance Jobs','Admin Jobs',...,'Travel Jobs']. So it can be easily factorized with pd.Categorical.from_array()

'Category' 是一个字符串(真的,应该是一个分类的,但这是在 0.13 中,分类是三等公民)列采用 27 个不同的值:['Accounting & Finance Jobs','Admin Jobs',...,'Travel Jobs']. 所以它可以很容易地分解为pd.Categorical.from_array()

On inspection, the limitation is inside pandas.tools.plotting.py:boxplot(), which converts the column object without allowing ordering:

检查时,限制在 inside pandas.tools.plotting.py:boxplot(),它转换列对象而不允许排序:

I suppose I could either hack up a custom version of pandas boxplot(), or reach into the internals of the object. And also file an enhance request.

我想我可以破解自定义版本的 pandas boxplot(),或者进入对象的内部。并提交增强请求。

采纳答案by Paul H

Hard to say how to do this without a working example. My first guess would be to just add an integer column with the orders that you want.

如果没有工作示例,很难说如何做到这一点。我的第一个猜测是只添加一个带有您想要的订单的整数列。

A simple, brute-force way would be to add each boxplot one at a time.

一种简单的强力方法是一次添加每个箱线图。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.rand(37,4), columns=list('ABCD'))
columns_my_order = ['C', 'A', 'D', 'B']
fig, ax = plt.subplots()
for position, column in enumerate(columns_my_order):
    ax.boxplot(df[column], positions=[position])

ax.set_xticks(range(position+1))
ax.set_xticklabels(columns_my_order)
ax.set_xlim(xmin=-0.5)
plt.show()

enter image description here

在此处输入图片说明

回答by Zhenyu

Actually I got stuck with the same question. And I solved it by making a map and reset the xticklabels, with code as follows:

实际上我被同样的问题困住了。我通过制作地图并重置xticklabels来解决它,代码如下:

df = pd.DataFrame({"A":["d","c","d","c",'d','c','a','c','a','c','a','c']})
df['val']=(np.random.rand(12))
df['B']=df['A'].replace({'d':'0','c':'1','a':'2'})
ax=df.boxplot(column='val',by='B')
ax.set_xticklabels(list('dca'))

回答by Cireo

Note that pandas can now create categorical columns. If you don't mind having all the columns present in your graph, or trimming them appropriately, you can do something like the below:

请注意,pandas 现在可以创建分类列。如果您不介意在图表中显示所有列,或适当修剪它们,您可以执行以下操作:

http://pandas.pydata.org/pandas-docs/stable/categorical.html

http://pandas.pydata.org/pandas-docs/stable/categorical.html

df['Category'] = df['Category'].astype('category', ordered=True)

Recent pandas also appears to allow positionsto pass all the way through from frame to axes.

最近的Pandas似乎也允许positions从框架到轴一直通过。

回答by Cireo

EDIT: this is the right answer after direct support was added somewhere between version 0.15-0.18

编辑:这是在版本 0.15-0.18 之间添加直接支持后的正确答案



Adding a separate answer, which perhaps could be another question - feedback appreciated.

添加一个单独的答案,这可能是另一个问题 - 感谢反馈。

I wanted to add a custom column order within a groupby, which posed many problems for me. In the end, I had to avoid trying to use boxplotfrom a groupbyobject, and instead go through each subplot myself to provide explicit positions.

我想在 groupby 中添加自定义列顺序,这给我带来了很多问题。最后,我不得不避免尝试使用boxplotfromgroupby对象,而是自己遍历每个子图以提供明确的位置。

import matplotlib.pyplot as plt
import pandas as pd

df = pd.DataFrame()
df['GroupBy'] = ['g1', 'g2', 'g3', 'g4'] * 6
df['PlotBy'] = [chr(ord('A') + i) for i in xrange(24)]
df['SortBy'] = list(reversed(range(24)))
df['Data'] = [i * 10 for i in xrange(24)]

# Note that this has no effect on the boxplot
df = df.sort_values(['GroupBy', 'SortBy'])
for group, info in df.groupby('GroupBy'):
    print 'Group: %r\n%s\n' % (group, info)

# With the below, cannot use
#  - sort data beforehand (not preserved, can't access in groupby)
#  - categorical (not all present in every chart)
#  - positional (different lengths and sort orders per group)
# df.groupby('GroupBy').boxplot(layout=(1, 5), column=['Data'], by=['PlotBy'])

fig, axes = plt.subplots(1, df.GroupBy.nunique(), sharey=True)
for ax, (g, d) in zip(axes, df.groupby('GroupBy')):
    d.boxplot(column=['Data'], by=['PlotBy'], ax=ax, positions=d.index.values)
plt.show()

Within my final code, it was even slightly more involved to determine positions because I had multiple data points for each sortby value, and I ended up having to do the below:

在我的最终代码中,确定位置的过程更加复杂,因为每个 sortby 值都有多个数据点,我最终不得不执行以下操作:

to_plot = data.sort_values([sort_col]).groupby(group_col)
for ax, (group, group_data) in zip(axes, to_plot):
    # Use existing sorting
    ordering = enumerate(group_data[sort_col].unique())
    positions = [ind for val, ind in sorted((v, i) for (i, v) in ordering)]
    ax = group_data.boxplot(column=[col], by=[plot_by], ax=ax, positions=positions)

回答by eric R

If you're not happy with the default column order in your boxplot, you can change it to a specific order by setting the columnparameter in the boxplot function.

如果您对箱线图中的默认列顺序不满意,可以通过在箱线图中设置参数将其更改为特定顺序。

check the two examples below:

检查以下两个示例:

np.random.seed(0)
df = pd.DataFrame(np.random.rand(37,4), columns=list('ABCD'))

##
plt.figure()
df.boxplot()
plt.title("default column order")

##
plt.figure()
df.boxplot(column=['C','A', 'D', 'B'])
plt.title("Specified column order")

enter image description here

在此处输入图片说明

回答by Thomas G.

As Cireo pointed out:

正如 Cireo 指出的那样:

Use the new positions=attribute:

使用新的位置 =属性:

df.boxplot(column=['Data'], by=['PlotBy'], positions=df.index.values)

df.boxplot(column=['Data'], by=['PlotBy'], positions=df.index.values)

I know this is precised before but it is not clear / summarized enough to newbies like me

我知道这是精确的,但对像我这样的新手来说还不够清楚/总结

回答by VH2020

This can be resolved by applying a categorical order. You can decide on the ranking yourself. I'll give an example with days of week.

这可以通过应用分类顺序来解决。您可以自己决定排名。我将举一个星期几的例子。

  • Provide categorical order to weekday

    #List categorical variables in correct order
    weekday = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
    #Assign the above list to category ranking
    wDays = pd.api.types.CategoricalDtype(ordered= True, categories=Weekday)
    #Apply this to the specific column in DataFrame
    df['Weekday'] = df['Weekday'].astype(wDays)
    # Then generate your plot
    plt.figure(figsize = [15, 10])
    sns.boxplot(data = flights_samp, x = 'Weekday', y = 'Y Axis Variable', color = colour)
    
  • 提供工作日的分类顺序

    #List categorical variables in correct order
    weekday = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
    #Assign the above list to category ranking
    wDays = pd.api.types.CategoricalDtype(ordered= True, categories=Weekday)
    #Apply this to the specific column in DataFrame
    df['Weekday'] = df['Weekday'].astype(wDays)
    # Then generate your plot
    plt.figure(figsize = [15, 10])
    sns.boxplot(data = flights_samp, x = 'Weekday', y = 'Y Axis Variable', color = colour)
    

回答by Fernanda

It might sound kind of silly, but many of the plot allow you to determine the order. For example:

这听起来可能有点傻,但许多情节允许您确定顺序。例如:

Library & dataset

图书馆和数据集

import seaborn as sns
df = sns.load_dataset('iris')

Specific order

具体顺序

p1=sns.boxplot(x='species', y='sepal_length', data=df, order=["virginica", "versicolor", "setosa"])
sns.plt.show()