pandas 如何按中值对熊猫中的箱线图进行排序?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21912634/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:42:56  来源:igfitidea点击:

How can I sort a boxplot in pandas by the median values?

pythonpandasboxplot

提问by Fred S

I want to draw a boxplot of column Zin dataframe dfby the categories Xand Y. How can I sort the boxplot by the median, in descending order?

我想Z在数据框中df按类别XY. 如何按中位数降序对箱线图进行排序?

import pandas as pd
import random
n = 100
# this is probably a strange way to generate random data; please feel free to correct it
df = pd.DataFrame({"X": [random.choice(["A","B","C"]) for i in range(n)], 
                   "Y": [random.choice(["a","b","c"]) for i in range(n)],
                   "Z": [random.gauss(0,1) for i in range(n)]})
df.boxplot(column="Z", by=["X", "Y"])

Note that this questionis very similar, but they use a different data structure. I'm relatively new to pandas (and have only done some tutorials on python in general), so I couldn't figure out how to make my data work with the answer posted there. This may well be more of a reshaping than a plotting question. Maybe there is a solution using groupby?

请注意,这个问题非常相似,但它们使用不同的数据结构。我对 Pandas 比较陌生(并且一般只做过一些关于 python 的教程),所以我无法弄清楚如何使我的数据与发布在那里的答案一起工作。这可能更像是一个重塑而不是一个绘图问题。也许有一个解决方案使用groupby

回答by Alvaro Fuentes

You can use the answer in How to sort a boxplot by the median values in pandasbut first you need to group your data and create a new data frame:

您可以使用如何按Pandas中的中值对箱线图进行排序中的答案,但首先您需要对数据进行分组并创建一个新的数据框:

import pandas as pd
import random
import matplotlib.pyplot as plt

n = 100
# this is probably a strange way to generate random data; please feel free to correct it
df = pd.DataFrame({"X": [random.choice(["A","B","C"]) for i in range(n)], 
                   "Y": [random.choice(["a","b","c"]) for i in range(n)],
                   "Z": [random.gauss(0,1) for i in range(n)]})
grouped = df.groupby(["X", "Y"])

df2 = pd.DataFrame({col:vals['Z'] for col,vals in grouped})

meds = df2.median()
meds.sort_values(ascending=False, inplace=True)
df2 = df2[meds.index]
df2.boxplot()

plt.show()

plot

阴谋

回答by J Wang

Similar answerto Alvaro Fuentes' in function form for more portability

与Alvaro Fuentes类似的答案在函数形式中更具可移植性

import pandas as pd

def boxplot_sorted(df, by, column):
  df2 = pd.DataFrame({col:vals[column] for col, vals in df.groupby(by)})
  meds = df2.median().sort_values()
  df2[meds.index].boxplot(rot=90)

boxplot_sorted(df, by=["X", "Y"], column="Z")

回答by rocksNwaves

To answer the question in the title, without addressing the extra detail of plotting all combinations of two categorical variables:

要回答标题中的问题,而不涉及绘制两个分类变量的所有组合的额外细节:

n = 100
df = pd.DataFrame({"X": [np.random.choice(["A","B","C","D"]) for i in range(n)],      
                   "Z": [np.random.normal(0, 10) for i in range(n)]})

grouped = df.loc[:,['Category, 'Variable']] \
    .groupby(['Category']) \
    .median() \
    .sort_values(by='Category')

sns.boxplot(x=df.Category, y=df.Variable, order=grouped.index)

enter image description here

在此处输入图片说明

I've added this solution because it is hard to reduce the accepted answer to a single variable, and I'm sure people are looking for a way to do that. I myself came to this question multiple time looking for such an answer.

我添加了这个解决方案是因为很难将接受的答案减少到单个变量,我相信人们正在寻找一种方法来做到这一点。我自己多次来到这个问题寻找这样的答案。