Pandas DataFrame 多列的并排箱线图

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42760965/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:10:31  来源:igfitidea点击:

Side-by-side boxplot of multiple columns of a pandas DataFrame

pythonpandasplotseabornboxplot

提问by Fred S

One year of sample data:

一年的样本数据:

import pandas as pd
import numpy.random as rnd
import seaborn as sns
n = 365
df = pd.DataFrame(data = {"A":rnd.randn(n), "B":rnd.randn(n)+1},
                  index=pd.date_range(start="2017-01-01", periods=n, freq="D"))

I want to boxplot these data side-by-side grouped by the month (i.e., two boxes per month, one for Aand one for B).

我想将这些数据按月份并排分组(即,每月两个框,一个用于A,一个用于B)。

For a single column sns.boxplot(df.index.month, df["A"])works fine. However, sns.boxplot(df.index.month, df[["A", "B"]])throws an error (ValueError: cannot copy sequence with size 2 to array axis with dimension 365). Melting the data by the index (pd.melt(df, id_vars=df.index, value_vars=["A", "B"], var_name="column")) in order to use seaborn's hueproperty as a workaround doesn't work either (TypeError: unhashable type: 'DatetimeIndex').

对于单列sns.boxplot(df.index.month, df["A"])工作正常。但是,sns.boxplot(df.index.month, df[["A", "B"]])会引发错误 ( ValueError: cannot copy sequence with size 2 to array axis with dimension 365)。通过索引 ( pd.melt(df, id_vars=df.index, value_vars=["A", "B"], var_name="column"))融化数据以使用 seaborn 的hue属性作为解决方法也不起作用 ( TypeError: unhashable type: 'DatetimeIndex')。

(A solution doesn't necessarily need to use seaborn, if it is easier using plain matplotlib.)

(如果使用普通 matplotlib 更容易,解决方案不一定需要使用 seaborn。)

Edit

编辑

I found a workaround that basically produces what I want. However, it becomes somewhat awkward to work with once the DataFrame includes more variables than I want to plot. So if there is a more elegant/direct way to do it, please share!

我找到了一个基本上可以产生我想要的解决方法。但是,一旦 DataFrame 包含的变量多于我想要绘制的变量,使用起来就会有些尴尬。所以如果有更优雅/直接的方法来做到这一点,请分享!

df_stacked = df.stack().reset_index()
df_stacked.columns = ["date", "vars", "vals"]
df_stacked.index = df_stacked["date"]
sns.boxplot(x=df_stacked.index.month, y="vals", hue="vars", data=df_stacked)

Produces: Side-by-side boxplot of A and B, grouped by month.

产生: A 和 B 的并排箱线图,按月分组。

回答by Timothy Sweetser

here's a solution using pandas melting and seaborn:

这是使用Pandas融化和seaborn的解决方案:

import pandas as pd
import numpy.random as rnd
import seaborn as sns
n = 365
df = pd.DataFrame(data = {"A": rnd.randn(n),
                          "B": rnd.randn(n)+1,
                          "C": rnd.randn(n) + 10, # will not be plotted
                         },
                  index=pd.date_range(start="2017-01-01", periods=n, freq="D"))
df['month'] = df.index.month
df_plot = df.melt(id_vars='month', value_vars=["A", "B"])
sns.boxplot(x='month', y='value', hue='variable', data=df_plot)

回答by benSooraj

I do not understand your question completely, but you might to take a look at this approach using matplotlib. Not the best solution though.

我不完全理解您的问题,但您可以使用matplotlib. 虽然不是最好的解决方案。

1) Break dfinto 12 DataFrames by months, all stacked in a list

1)s分解df成12个DataFrames month,全部堆叠在一个列表中

DFList = []
for group in df_3.groupby(df_3.index.month):
    DFList.append(group[1])

2) Plot them one after the other in a loop:

2)在一个循环中一个接一个地绘制它们:

for _ in range(12):
    DFList[_].plot(kind='box', subplots=True, layout=(2,2), sharex=True, sharey=True, figsize=(7,7))

plt.show()

3) Here's a snapshot of the 1st three rows:

3)这是第一三行的快照:

enter image description here

在此处输入图片说明

You might also want to checkout matplotlib's add_subplotmethod

您可能还想 checkoutmatplotlibadd_subplot方法

回答by Grégory Belhumeur

month_dfs = []
for group in df.groupby(df.index.month):
    month_dfs.append(group[1])

plt.figure(figsize=(30,5))
for i,month_df in enumerate(month_dfs):
    axi = plt.subplot(1, len(month_dfs), i + 1)
    month_df.plot(kind='box', subplots=False, ax = axi)
    plt.title(i+1)
    plt.ylim([-4, 4])

plt.show()

Will give this

会给这个

Not exactly what you're looking for but you get to keep a readable DataFrame if you add more variables.

不完全是您要查找的内容,但是如果添加更多变量,您可以保持可读的 DataFrame。

You can also easily remove the axis by using

您还可以使用以下命令轻松删除轴

if i > 0:
        y_axis = axi.axes.get_yaxis()
        y_axis.set_visible(False)

in the loop before plt.show()

在循环之前 plt.show()

回答by foglerit

This is quite straight-forward using Altair:

使用Altair非常简单:

alt.Chart(
    df.reset_index().melt(id_vars = ["index"], value_vars=["A", "B"]).assign(month = lambda x: x["index"].dt.month)
).mark_boxplot(
    extent='min-max'
).encode(
    alt.X('variable:N', title=''),
    alt.Y('value:Q'),
    column='month:N',
    color='variable:N'
)

enter image description hereThe code above melts the DataFrame and adds a monthcolumn. Then Altair creates box-plots for each variable broken down by months as the plot columns.

在此处输入图片说明上面的代码融化了 DataFrame 并添加了一month列。然后 Altair 为按月细分的每个变量创建箱线图作为绘图列。