pandas 具有堆叠组件的直方图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/22226375/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Histogram with stacked components
提问by 8one6
Let's say that I have a value that I've measured every day for the past 90 days. I would like to plot a histogram of the values, but I want to make it easy for the viewer to see where the measurements have accumulated over certain non-overlapping subsets of the past 90 days. I want to do this by "subdividing" each bar of the histogram into chunks. One chunk for the earliest observations, one for more recent, one for the most recent.
假设我有一个在过去 90 天内每天都测量的值。我想绘制这些值的直方图,但我想让查看者轻松查看过去 90 天的某些非重叠子集中的测量值累积的位置。我想通过将直方图的每个条“细分”成块来做到这一点。一组用于最早的观察,一组用于较近的观察,一组用于最近的观察。
This sounds like a job for df.plot(kind='bar', stacked=True)but I'm having trouble getting the details right.
这听起来像是一份工作,df.plot(kind='bar', stacked=True)但我无法正确获取详细信息。
Here's what I have so far:
这是我到目前为止所拥有的:
import numpy as np
import pandas as pd
import seaborn as sbn
np.random.seed(0)
data = pd.DataFrame({'values': np.random.randn(90)})
data['bin'] = pd.cut(data['values'], 15, labels=False)
forhist = pd.DataFrame({'first70': data[:70].groupby('bin').count()['bin'],
                         'next15': data[70:85].groupby('bin').count()['bin'],
                         'last5': data[85:].groupby('bin').count()['bin']})
forhist.plot(kind='bar', stacked=True)
And that gives me:
这给了我:


This graph has some shortcomings:
这个图有一些缺点:
- The bars are stacked in the wrong order.  last5should be on top andnext15in the middle. I.e. they should be stacked in the order of the columns inforhist.
- There is horizontal space between the bars
- The x-axis is labeled with integers rather than something indicative of the values the bins represent.  My "first choice" would be to have the x-axis labelled exactly as it would be if I just ran data['values'].hist(). My "second choice" would be to have the x-axis labelled with the "bin names" that I would get if I didpd.cut(data['values'], 15). In my code, I usedlabels=Falsebecause if I didn't do that, it would have used the bin edge labels (as strings) as the bar labels, and it would have put these in alphabetical order, making the graph basically useless.
- 酒吧以错误的顺序堆叠。  last5应该在顶部和next15中间。即它们应该按列的顺序堆叠forhist。
- 酒吧之间有水平空间
- x 轴用整数标记,而不是表示 bin 代表的值的东西。我的“第一选择”是将 x 轴完全标记为我刚运行时的样子data['values'].hist()。我的“第二选择”是将 x 轴标上“bin 名称”,如果我这样做了,我会得到pd.cut(data['values'], 15)。在我的代码中,我使用labels=False是因为如果我不这样做,它将使用 bin 边缘标签(作为字符串)作为条形标签,并且它会按字母顺序放置这些标签,使图形基本上无用。
What's the best way to approach this? I feel like I'm using very clumsy functions so far.
解决这个问题的最佳方法是什么?我觉得到目前为止我正在使用非常笨拙的功能。
采纳答案by 8one6
Ok, here's one way to attack it, using features from the matplotlibhistfunction itself:
好的,这是一种攻击它的方法,使用matplotlibhist函数本身的特性:
fig, ax = plt.subplots(1, 1, figsize=(9, 5))
ax.hist([data.ix[low:high, 'values'] for low, high in [(0, 70), (70, 85), (85, 90)]],
         bins=15,
         stacked=True,
         rwidth=1.0,
         label=['first70', 'next15', 'last5'])
ax.legend()
Which gives:
这使:



