Python 带有 Pandas groupby multiindex 的 Boxplot,用于来自 multiindex 的指定子级别
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18498690/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Boxplot with pandas groupby multiindex, for specified sublevels from multiindex
提问by pbreach
Ok so I have a dataframe which contains timeseries data that has a multiline index for each columns. Here is a sample of what the data looks like and it is in csv format. Loading the data is not an issue here.
好的,所以我有一个包含时间序列数据的数据框,每个列都有一个多行索引。这是数据外观的示例,它采用 csv 格式。加载数据在这里不是问题。
What I want to do is to be able to create a boxplot with this data grouped according to different catagories in a specific line of the multiindex. For example if I were to group by 'SPECIES' I would have the groups, 'aq', 'gr', 'mix', 'sed' and a box for each group at a specific time in the timeseries.
我想要做的是能够使用根据多索引特定行中的不同类别分组的数据创建箱线图。例如,如果我要按“SPECIES”分组,我将在 timeseries 的特定时间为每个组提供组“aq”、“gr”、“mix”、“sed”和一个框。
I've tried this:
我试过这个:
grouped = data['2013-08-17'].groupby(axis=1, level='SPECIES')
grouped.boxplot()
but it gives me a boxplot (flat line) for each point in the group rather than for the grouped set. Is there an easy way to do this? I don't have any problems grouping as I can aggregate the groups any which way I want, but I can't get them to boxplot.
但它为组中的每个点而不是分组集提供了一个箱线图(平线)。是否有捷径可寻?我没有任何分组问题,因为我可以按我想要的任何方式聚合组,但我无法将它们放入箱线图。
采纳答案by pbreach
I think I figured it out, maybe this will be helpful to someone:
我想我想通了,也许这对某人有帮助:
grouped = data['2013-08-17'].groupby(axis=1, level='SPECIES').T
grouped.boxplot()
Basically groupby output needed to be transposed so that the boxplot showed the right grouping:
基本上 groupby 输出需要转置,以便箱线图显示正确的分组:
回答by rafaelvalle
This should work in version 0.16:
这应该适用于 0.16 版:
data['2013-08-17'].boxplot(by='SPECIES')
回答by schlump
this code:
这段代码:
data['2013-08-17'].boxplot(by='SPECIES')
Will not work, as boxplot is a function for a DataFrame and not a Series.
将不起作用,因为 boxplot 是 DataFrame 而不是系列的函数。
While in Pandas > 0.18.1 the boxplot function has the argument columns
which defines from what column the data is taken from.
而在 Pandas > 0.18.1 中, boxplot 函数有一个参数columns
,该参数定义数据来自哪一列。
So
所以
data.boxplot(column='2013-08-17',by='SPECIES')
should return the desired result.
应该返回所需的结果。
An example with the Iris dataset:
鸢尾花数据集的示例:
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('https://raw.githubusercontent.com/pandas-dev/pandas/master/pandas/tests/data/iris.csv')
fig, ax = plt.subplots(figsize=(10,8))
plt.suptitle('')
data.boxplot(column=['SepalLength'], by='Name', ax=ax)
creates:
创建:
plt.suptitle('')
turns off the annoying automatic subtitle. And of course the column arguments accepts lists of columns... so
关闭烦人的自动字幕。当然,列参数接受列列表......所以
data.boxplot(column=['SepalLength', 'SepalWidth'], by='Name', ax=ax)
also works.
也有效。