pandas 具有堆叠组件的直方图

Question

提问by 8one6

Let's say that I have a value that I've measured every day for the past 90 days. I would like to plot a histogram of the values, but I want to make it easy for the viewer to see where the measurements have accumulated over certain non-overlapping subsets of the past 90 days. I want to do this by "subdividing" each bar of the histogram into chunks. One chunk for the earliest observations, one for more recent, one for the most recent.

假设我有一个在过去 90 天内每天都测量的值。我想绘制这些值的直方图，但我想让查看者轻松查看过去 90 天的某些非重叠子集中的测量值累积的位置。我想通过将直方图的每个条“细分”成块来做到这一点。一组用于最早的观察，一组用于较近的观察，一组用于最近的观察。

This sounds like a job for df.plot(kind='bar', stacked=True)but I'm having trouble getting the details right.

这听起来像是一份工作，df.plot(kind='bar', stacked=True)但我无法正确获取详细信息。

Here's what I have so far:

这是我到目前为止所拥有的：

import numpy as np
import pandas as pd
import seaborn as sbn

np.random.seed(0)

data = pd.DataFrame({'values': np.random.randn(90)})
data['bin'] = pd.cut(data['values'], 15, labels=False)
forhist = pd.DataFrame({'first70': data[:70].groupby('bin').count()['bin'],
                         'next15': data[70:85].groupby('bin').count()['bin'],
                         'last5': data[85:].groupby('bin').count()['bin']})

forhist.plot(kind='bar', stacked=True)

And that gives me:

这给了我：

poor result

结果不佳

This graph has some shortcomings:

这个图有一些缺点：

The bars are stacked in the wrong order. last5should be on top and next15in the middle. I.e. they should be stacked in the order of the columns in forhist.
There is horizontal space between the bars
The x-axis is labeled with integers rather than something indicative of the values the bins represent. My "first choice" would be to have the x-axis labelled exactly as it would be if I just ran data['values'].hist(). My "second choice" would be to have the x-axis labelled with the "bin names" that I would get if I did pd.cut(data['values'], 15). In my code, I used labels=Falsebecause if I didn't do that, it would have used the bin edge labels (as strings) as the bar labels, and it would have put these in alphabetical order, making the graph basically useless.

酒吧以错误的顺序堆叠。 last5应该在顶部和next15中间。即它们应该按列的顺序堆叠forhist。
酒吧之间有水平空间
x 轴用整数标记，而不是表示 bin 代表的值的东西。我的“第一选择”是将 x 轴完全标记为我刚运行时的样子data['values'].hist()。我的“第二选择”是将 x 轴标上“bin 名称”，如果我这样做了，我会得到pd.cut(data['values'], 15)。在我的代码中，我使用labels=False是因为如果我不这样做，它将使用 bin 边缘标签（作为字符串）作为条形标签，并且它会按字母顺序放置这些标签，使图形基本上无用。

What's the best way to approach this? I feel like I'm using very clumsy functions so far.

解决这个问题的最佳方法是什么？我觉得到目前为止我正在使用非常笨拙的功能。

Answer 1

采纳答案by 8one6

Ok, here's one way to attack it, using features from the matplotlibhistfunction itself:

好的，这是一种攻击它的方法，使用matplotlibhist函数本身的特性：

fig, ax = plt.subplots(1, 1, figsize=(9, 5))
ax.hist([data.ix[low:high, 'values'] for low, high in [(0, 70), (70, 85), (85, 90)]],
         bins=15,
         stacked=True,
         rwidth=1.0,
         label=['first70', 'next15', 'last5'])
ax.legend()

Which gives:

这使：

better

更好的

pandas 具有堆叠组件的直方图

提问by 8one6

采纳答案by 8one6

相关推荐

最近更新

标签

pandas 具有堆叠组件的直方图

提问by 8one6

采纳答案by 8one6

相关推荐

将 Pandas DataFrame.plot 填充到 matplotlib 子图中

pandas 关于pandas value_counts 函数的说明

pandas 检查数据框是否为布尔类型熊猫

pandas 如何在系列的开头附加/插入一个项目？

相关推荐

最近更新

标签