Pandas 并排堆积条形图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47494557/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas side-by-side stacked bar plot
提问by PyRsquared
I want to create a stacked bar plot of the titanic dataset. The plot needs to group by "Pclass", "Sex" and "Survived". I have managed to do this with a lot of tedious numpy manipulation to produce the normalized plot below (where "M" is male and "F" is female)
我想创建泰坦尼克号数据集的堆积条形图。情节需要按“Pclass”、“Sex”和“Survived”分组。我设法通过大量繁琐的 numpy 操作来生成下面的标准化图(其中“M”是男性,“F”是女性)
Is there a way to do this using pandas inbuilt plotting functionality?
有没有办法使用Pandas内置的绘图功能来做到这一点?
I have tried this:
我试过这个:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('train.csv')
df_grouped = df.groupby(['Survived','Sex','Pclass'])['Survived'].count()
df_grouped.unstack().plot(kind='bar',stacked=True, colormap='Blues', grid=True, figsize=(13,5));
Which is not what I want. Is there anyway to produce the first plot using pandas plotting? Thanks in advance
这不是我想要的。反正有没有使用Pandas绘图生成第一个绘图?提前致谢
回答by fuglede
The resulting bars will not neighbour each other as in your first figure, but outside of that, pandas lets you do what you want as follows:
生成的条形不会像您的第一个图中那样彼此相邻,但除此之外,pandas 可以让您按如下方式执行所需的操作:
df_g = df.groupby(['Pclass', 'Sex'])['Survived'].agg([np.mean, lambda x: 1-np.mean(x)])
df_g.columns = ['Survived', 'Died']
df_g.plot.bar(stacked=True)
Here, the horizontal grouping of patches is complicated by the requirement of stacking. If, for instance, we only cared about the value of "Survived", pandas could take care of it out-of-the-box.
这里,由于堆叠的要求,补丁的水平分组变得复杂。例如,如果我们只关心“Survived”的价值,pandas 可以开箱即用。
df.groupby(['Pclass', 'Sex'])['Survived'].mean().unstack().plot.bar()
If an ad hoc solution suffices for post-processing the plot, doing so is also not terribly complicated:
如果临时解决方案足以对绘图进行后处理,那么这样做也不是非常复杂:
import numpy as np
from matplotlib import ticker
df_g = df.groupby(['Pclass', 'Sex'])['Survived'].agg([np.mean, lambda x: 1-np.mean(x)])
df_g.columns = ['Survived', 'Died']
ax = df_g.plot.bar(stacked=True)
# Move back every second patch
for i in range(6):
new_x = ax.patches[i].get_x() - (i%2)/2
ax.patches[i].set_x(new_x)
ax.patches[i+6].set_x(new_x)
# Update tick locations correspondingly
minor_tick_locs = [x.get_x()+1/4 for x in ax.patches[:6]]
major_tick_locs = np.array([x.get_x()+1/4 for x in ax.patches[:6]]).reshape(3, 2).mean(axis=1)
ax.set_xticks(minor_tick_locs, minor=True)
ax.set_xticks(major_tick_locs)
# Use indices from dataframe as tick labels
minor_tick_labels = df_g.index.levels[1][df_g.index.labels[1]].values
major_tick_labels = df_g.index.levels[0].values
ax.xaxis.set_ticklabels(minor_tick_labels, minor=True)
ax.xaxis.set_ticklabels(major_tick_labels)
# Remove ticks and organize tick labels to avoid overlap
ax.tick_params(axis='x', which='both', bottom='off')
ax.tick_params(axis='x', which='minor', rotation=45)
ax.tick_params(axis='x', which='major', pad=35, rotation=0)