Python 带有 Pandas groupby multiindex 的 Boxplot，用于来自 multiindex 的指定子级别

Question

提问by pbreach

Ok so I have a dataframe which contains timeseries data that has a multiline index for each columns. Here is a sample of what the data looks like and it is in csv format. Loading the data is not an issue here.

好的，所以我有一个包含时间序列数据的数据框，每个列都有一个多行索引。这是数据外观的示例，它采用 csv 格式。加载数据在这里不是问题。

enter image description here

在此处输入图片说明

What I want to do is to be able to create a boxplot with this data grouped according to different catagories in a specific line of the multiindex. For example if I were to group by 'SPECIES' I would have the groups, 'aq', 'gr', 'mix', 'sed' and a box for each group at a specific time in the timeseries.

我想要做的是能够使用根据多索引特定行中的不同类别分组的数据创建箱线图。例如，如果我要按“SPECIES”分组，我将在 timeseries 的特定时间为每个组提供组“aq”、“gr”、“mix”、“sed”和一个框。

I've tried this:

我试过这个：

grouped = data['2013-08-17'].groupby(axis=1, level='SPECIES')
grouped.boxplot()

but it gives me a boxplot (flat line) for each point in the group rather than for the grouped set. Is there an easy way to do this? I don't have any problems grouping as I can aggregate the groups any which way I want, but I can't get them to boxplot.

但它为组中的每个点而不是分组集提供了一个箱线图（平线）。是否有捷径可寻？我没有任何分组问题，因为我可以按我想要的任何方式聚合组，但我无法将它们放入箱线图。

Answer 1

采纳答案by pbreach

I think I figured it out, maybe this will be helpful to someone:

我想我想通了，也许这对某人有帮助：

grouped = data['2013-08-17'].groupby(axis=1, level='SPECIES').T
grouped.boxplot()

Basically groupby output needed to be transposed so that the boxplot showed the right grouping:

基本上 groupby 输出需要转置，以便箱线图显示正确的分组：

enter image description here

在此处输入图片说明

Answer 2

回答by rafaelvalle

This should work in version 0.16:

这应该适用于 0.16 版：

data['2013-08-17'].boxplot(by='SPECIES')

Answer 3

回答by schlump

this code:

这段代码：

data['2013-08-17'].boxplot(by='SPECIES')

Will not work, as boxplot is a function for a DataFrame and not a Series.

将不起作用，因为 boxplot 是 DataFrame 而不是系列的函数。

While in Pandas > 0.18.1 the boxplot function has the argument columnswhich defines from what column the data is taken from.

而在 Pandas > 0.18.1 中， boxplot 函数有一个参数columns，该参数定义数据来自哪一列。

So

所以

data.boxplot(column='2013-08-17',by='SPECIES')

should return the desired result.

应该返回所需的结果。

An example with the Iris dataset:

鸢尾花数据集的示例：

import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv('https://raw.githubusercontent.com/pandas-dev/pandas/master/pandas/tests/data/iris.csv')
fig, ax = plt.subplots(figsize=(10,8))
plt.suptitle('')
data.boxplot(column=['SepalLength'], by='Name', ax=ax)

creates:

创建：

plt.suptitle('')

turns off the annoying automatic subtitle. And of course the column arguments accepts lists of columns... so

关闭烦人的自动字幕。当然，列参数接受列列表......所以

data.boxplot(column=['SepalLength', 'SepalWidth'], by='Name', ax=ax)

also works.

也有效。

Python 带有 Pandas groupby multiindex 的 Boxplot，用于来自 multiindex 的指定子级别

提问by pbreach

采纳答案by pbreach

回答by rafaelvalle

回答by schlump

相关推荐

最近更新

标签

Python 带有 Pandas groupby multiindex 的 Boxplot，用于来自 multiindex 的指定子级别

提问by pbreach

采纳答案by pbreach

回答by rafaelvalle

回答by schlump

相关推荐

Python 熊猫更新sql

如何使用参数中带有关键字“self”的 Python 函数

无法从 JSON 对象在 Python 中打印字符 '\u2019'

Python 使用 SQLAlchemy 批量插入 Pandas 数据帧

相关推荐

最近更新

标签