Pandas 直方图 df.hist() group by

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45883598/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 15:48:42  来源:igfitidea点击:

Pandas histogram df.hist() group by

pandasmatplotlibhistogram

提问by Hangon

How to plot a histogram with pandas DataFrame.hist() using group by? I have a data frame with 5 columns: "A", "B", "C", "D" and "Group"

如何使用 group by 用 Pandas DataFrame.hist() 绘制直方图?我有一个包含 5 列的数据框:“A”、“B”、“C”、“D”和“组”

There are two Groups classes: "yes" and "no"

有两个组类:“是”和“否”

Using:

使用:

df.hist() 

I get the hist for each of the 4 columns.

我得到了 4 列中每一列的历史记录。

enter image description here

在此处输入图片说明

Now I would like to get the same 4 graphs but with blue bars (group="yes") and red bars (group = "no").

现在我想获得相同的 4 个图形,但带有蓝色条(group="yes")和红色条(group="no")。

I tried this withouth success:

我试过这个没有成功:

df.hist(by = "group")

pandas hist went wrong

熊猫历史出错了

采纳答案by Brad Solomon

This is not the most flexible workaround but will work for your question specifically.

这不是最灵活的解决方法,但会专门针对您的问题。

def sephist(col):
    yes = df[df['group'] == 'yes'][col]
    no = df[df['group'] == 'no'][col]
    return yes, no

for num, alpha in enumerate('abcd'):
    plt.subplot(2, 2, num)
    plt.hist(sephist(alpha)[0], bins=25, alpha=0.5, label='yes', color='b')
    plt.hist(sephist(alpha)[1], bins=25, alpha=0.5, label='no', color='r')
    plt.legend(loc='upper right')
    plt.title(alpha)
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)

enter image description here

在此处输入图片说明

You could make this more generic by:

您可以通过以下方式使其更通用:

  • adding a dfand byparameter to sephist: def sephist(df, by, col)
  • making the subplots loop more flexible: for num, alpha in enumerate(df.columns)
  • 添加一个dfby参数到sephistdef sephist(df, by, col)
  • 使子图循环更灵活: for num, alpha in enumerate(df.columns)

Because the first argument to matplotlib.pyplot.histcan take

因为第一个参数matplotlib.pyplot.hist可以取

either a single array or a sequency of arrays which are not required to be of the same length

单个数组或不需要相同长度的数组序列

...an alternattive would be:

...另一种选择是:

for num, alpha in enumerate('abcd'):
    plt.subplot(2, 2, num)
    plt.hist((sephist(alpha)[0], sephist(alpha)[1]), bins=25, alpha=0.5, label=['yes', 'no'], color=['r', 'b'])
    plt.legend(loc='upper right')
    plt.title(alpha)
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)

enter image description here

在此处输入图片说明

回答by ImportanceOfBeingErnest

Using Seaborn

使用 Seaborn

If you are open to use Seaborn, a plot with multiple subplots and multiple variables within each subplot can easily be made using seaborn.FacetGrid.

如果您愿意使用 Seaborn,则可以轻松地使用seaborn.FacetGrid.

import numpy as np; np.random.seed(1)
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.randn(300,4), columns=list("ABCD"))
df["group"] = np.random.choice(["yes", "no"], p=[0.32,0.68],size=300)

df2 = pd.melt(df, id_vars='group', value_vars=list("ABCD"), value_name='value')

bins=np.linspace(df2.value.min(), df2.value.max(), 10)
g = sns.FacetGrid(df2, col="variable", hue="group", palette="Set1", col_wrap=2)
g.map(plt.hist, 'value', bins=bins, ec="k")

g.axes[-1].legend()
plt.show()

enter image description here

在此处输入图片说明