Pandas 直方图 df.hist() group by
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45883598/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas histogram df.hist() group by
提问by Hangon
How to plot a histogram with pandas DataFrame.hist() using group by? I have a data frame with 5 columns: "A", "B", "C", "D" and "Group"
如何使用 group by 用 Pandas DataFrame.hist() 绘制直方图?我有一个包含 5 列的数据框:“A”、“B”、“C”、“D”和“组”
There are two Groups classes: "yes" and "no"
有两个组类:“是”和“否”
Using:
使用:
df.hist()
I get the hist for each of the 4 columns.
我得到了 4 列中每一列的历史记录。
Now I would like to get the same 4 graphs but with blue bars (group="yes") and red bars (group = "no").
现在我想获得相同的 4 个图形,但带有蓝色条(group="yes")和红色条(group="no")。
I tried this withouth success:
我试过这个没有成功:
df.hist(by = "group")
采纳答案by Brad Solomon
This is not the most flexible workaround but will work for your question specifically.
这不是最灵活的解决方法,但会专门针对您的问题。
def sephist(col):
yes = df[df['group'] == 'yes'][col]
no = df[df['group'] == 'no'][col]
return yes, no
for num, alpha in enumerate('abcd'):
plt.subplot(2, 2, num)
plt.hist(sephist(alpha)[0], bins=25, alpha=0.5, label='yes', color='b')
plt.hist(sephist(alpha)[1], bins=25, alpha=0.5, label='no', color='r')
plt.legend(loc='upper right')
plt.title(alpha)
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)
You could make this more generic by:
您可以通过以下方式使其更通用:
- adding a
df
andby
parameter tosephist
:def sephist(df, by, col)
- making the subplots loop more flexible:
for num, alpha in enumerate(df.columns)
- 添加一个
df
和by
参数到sephist
:def sephist(df, by, col)
- 使子图循环更灵活:
for num, alpha in enumerate(df.columns)
Because the first argument to matplotlib.pyplot.hist
can take
因为第一个参数matplotlib.pyplot.hist
可以取
either a single array or a sequency of arrays which are not required to be of the same length
单个数组或不需要相同长度的数组序列
...an alternattive would be:
...另一种选择是:
for num, alpha in enumerate('abcd'):
plt.subplot(2, 2, num)
plt.hist((sephist(alpha)[0], sephist(alpha)[1]), bins=25, alpha=0.5, label=['yes', 'no'], color=['r', 'b'])
plt.legend(loc='upper right')
plt.title(alpha)
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)
回答by ImportanceOfBeingErnest
Using Seaborn
使用 Seaborn
If you are open to use Seaborn, a plot with multiple subplots and multiple variables within each subplot can easily be made using seaborn.FacetGrid
.
如果您愿意使用 Seaborn,则可以轻松地使用seaborn.FacetGrid
.
import numpy as np; np.random.seed(1)
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randn(300,4), columns=list("ABCD"))
df["group"] = np.random.choice(["yes", "no"], p=[0.32,0.68],size=300)
df2 = pd.melt(df, id_vars='group', value_vars=list("ABCD"), value_name='value')
bins=np.linspace(df2.value.min(), df2.value.max(), 10)
g = sns.FacetGrid(df2, col="variable", hue="group", palette="Set1", col_wrap=2)
g.map(plt.hist, 'value', bins=bins, ec="k")
g.axes[-1].legend()
plt.show()