Pandas 并排堆积条形图

Question

提问by PyRsquared

I want to create a stacked bar plot of the titanic dataset. The plot needs to group by "Pclass", "Sex" and "Survived". I have managed to do this with a lot of tedious numpy manipulation to produce the normalized plot below (where "M" is male and "F" is female)

我想创建泰坦尼克号数据集的堆积条形图。情节需要按“Pclass”、“Sex”和“Survived”分组。我设法通过大量繁琐的 numpy 操作来生成下面的标准化图（其中“M”是男性，“F”是女性）

Is there a way to do this using pandas inbuilt plotting functionality?

有没有办法使用Pandas内置的绘图功能来做到这一点？

I have tried this:

我试过这个：

import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('train.csv')
df_grouped = df.groupby(['Survived','Sex','Pclass'])['Survived'].count()
df_grouped.unstack().plot(kind='bar',stacked=True,  colormap='Blues', grid=True, figsize=(13,5));

Which is not what I want. Is there anyway to produce the first plot using pandas plotting? Thanks in advance

这不是我想要的。反正有没有使用Pandas绘图生成第一个绘图？提前致谢

Answer 1

回答by fuglede

The resulting bars will not neighbour each other as in your first figure, but outside of that, pandas lets you do what you want as follows:

生成的条形不会像您的第一个图中那样彼此相邻，但除此之外，pandas 可以让您按如下方式执行所需的操作：

df_g = df.groupby(['Pclass', 'Sex'])['Survived'].agg([np.mean, lambda x: 1-np.mean(x)])
df_g.columns = ['Survived', 'Died']
df_g.plot.bar(stacked=True)

Here, the horizontal grouping of patches is complicated by the requirement of stacking. If, for instance, we only cared about the value of "Survived", pandas could take care of it out-of-the-box.

这里，由于堆叠的要求，补丁的水平分组变得复杂。例如，如果我们只关心“Survived”的价值，pandas 可以开箱即用。

df.groupby(['Pclass', 'Sex'])['Survived'].mean().unstack().plot.bar()

If an ad hoc solution suffices for post-processing the plot, doing so is also not terribly complicated:

如果临时解决方案足以对绘图进行后处理，那么这样做也不是非常复杂：

import numpy as np
from matplotlib import ticker

df_g = df.groupby(['Pclass', 'Sex'])['Survived'].agg([np.mean, lambda x: 1-np.mean(x)])
df_g.columns = ['Survived', 'Died']
ax = df_g.plot.bar(stacked=True)

# Move back every second patch
for i in range(6):
    new_x = ax.patches[i].get_x() - (i%2)/2
    ax.patches[i].set_x(new_x)
    ax.patches[i+6].set_x(new_x)

# Update tick locations correspondingly
minor_tick_locs = [x.get_x()+1/4 for x in ax.patches[:6]]
major_tick_locs = np.array([x.get_x()+1/4 for x in ax.patches[:6]]).reshape(3, 2).mean(axis=1)
ax.set_xticks(minor_tick_locs, minor=True)
ax.set_xticks(major_tick_locs)

# Use indices from dataframe as tick labels
minor_tick_labels = df_g.index.levels[1][df_g.index.labels[1]].values
major_tick_labels = df_g.index.levels[0].values
ax.xaxis.set_ticklabels(minor_tick_labels, minor=True)
ax.xaxis.set_ticklabels(major_tick_labels)

# Remove ticks and organize tick labels to avoid overlap
ax.tick_params(axis='x', which='both', bottom='off')
ax.tick_params(axis='x', which='minor', rotation=45)
ax.tick_params(axis='x', which='major', pad=35, rotation=0)

Pandas 并排堆积条形图

提问by PyRsquared

回答by fuglede

相关推荐

最近更新

标签

Pandas 并排堆积条形图

提问by PyRsquared

回答by fuglede

相关推荐

pandas pd.read_csv 有没有办法用其他字符替换 NaN 值？

pandas python中的地理编码使用API​​密钥从地址获取纬度和经度

pandas 使用python从指数分布和模型中生成随机数

pandas 计数的python数据透视表

相关推荐

最近更新

标签

pandas python中的地理编码使用API密钥从地址获取纬度和经度