Pandas 直方图 df.hist() group by

Question

提问by Hangon

How to plot a histogram with pandas DataFrame.hist() using group by? I have a data frame with 5 columns: "A", "B", "C", "D" and "Group"

如何使用 group by 用 Pandas DataFrame.hist() 绘制直方图？我有一个包含 5 列的数据框：“A”、“B”、“C”、“D”和“组”

There are two Groups classes: "yes" and "no"

有两个组类：“是”和“否”

Using:

使用：

df.hist()

I get the hist for each of the 4 columns.

我得到了 4 列中每一列的历史记录。

Now I would like to get the same 4 graphs but with blue bars (group="yes") and red bars (group = "no").

现在我想获得相同的 4 个图形，但带有蓝色条（group="yes"）和红色条（group="no"）。

I tried this withouth success:

我试过这个没有成功：

df.hist(by = "group")

Answer 1

采纳答案by Brad Solomon

This is not the most flexible workaround but will work for your question specifically.

这不是最灵活的解决方法，但会专门针对您的问题。

def sephist(col):
    yes = df[df['group'] == 'yes'][col]
    no = df[df['group'] == 'no'][col]
    return yes, no

for num, alpha in enumerate('abcd'):
    plt.subplot(2, 2, num)
    plt.hist(sephist(alpha)[0], bins=25, alpha=0.5, label='yes', color='b')
    plt.hist(sephist(alpha)[1], bins=25, alpha=0.5, label='no', color='r')
    plt.legend(loc='upper right')
    plt.title(alpha)
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)

You could make this more generic by:

您可以通过以下方式使其更通用：

adding a dfand byparameter to sephist: def sephist(df, by, col)
making the subplots loop more flexible: for num, alpha in enumerate(df.columns)

添加一个df和by参数到sephist：def sephist(df, by, col)
使子图循环更灵活： for num, alpha in enumerate(df.columns)

Because the first argument to matplotlib.pyplot.histcan take

因为第一个参数matplotlib.pyplot.hist可以取

either a single array or a sequency of arrays which are not required to be of the same length

单个数组或不需要相同长度的数组序列

...an alternattive would be:

...另一种选择是：

for num, alpha in enumerate('abcd'):
    plt.subplot(2, 2, num)
    plt.hist((sephist(alpha)[0], sephist(alpha)[1]), bins=25, alpha=0.5, label=['yes', 'no'], color=['r', 'b'])
    plt.legend(loc='upper right')
    plt.title(alpha)
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)

Answer 2

回答by ImportanceOfBeingErnest

Using Seaborn

使用 Seaborn

If you are open to use Seaborn, a plot with multiple subplots and multiple variables within each subplot can easily be made using seaborn.FacetGrid.

如果您愿意使用 Seaborn，则可以轻松地使用seaborn.FacetGrid.

import numpy as np; np.random.seed(1)
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.randn(300,4), columns=list("ABCD"))
df["group"] = np.random.choice(["yes", "no"], p=[0.32,0.68],size=300)

df2 = pd.melt(df, id_vars='group', value_vars=list("ABCD"), value_name='value')

bins=np.linspace(df2.value.min(), df2.value.max(), 10)
g = sns.FacetGrid(df2, col="variable", hue="group", palette="Set1", col_wrap=2)
g.map(plt.hist, 'value', bins=bins, ec="k")

g.axes[-1].legend()
plt.show()

Pandas 直方图 df.hist() group by

提问by Hangon

采纳答案by Brad Solomon

回答by ImportanceOfBeingErnest

Using Seaborn

使用 Seaborn

相关推荐

最近更新

标签

Pandas 直方图 df.hist() group by

提问by Hangon

采纳答案by Brad Solomon

回答by ImportanceOfBeingErnest

Using Seaborn

使用 Seaborn

相关推荐

vba 使自定义函数不返回任何内容 - 不是 0，不是空字符串，而是什么都不返回

pandas 减去两个数据帧

vba 如何在 Excel 中动态插入列？

重命名 Pandas 中的 MultiIndex 列

相关推荐

最近更新

标签