pandas 熊猫图直方图数据框索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27157522/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas plot histogram data frame index
提问by DigitalMusicology
I have the following data frame (df) in pandas:
我在Pandas中有以下数据框(df):
NetPrice Units Royalty
Price
3.65 9.13 171 57.60
3.69 9.23 13 4.54
3.70 9.25 129 43.95
3.80 9.49 122 42.76
3.90 9.74 105 38.30
3.94 9.86 158 57.35
3.98 9.95 37 13.45
4.17 10.42 69 27.32
4.82 12.04 176 77.93
4.84 24.22 132 59.02
5.16 12.91 128 60.81
5.22 13.05 129 62.00
I am trying to create a histogram on the index ("Price) with an y-axis of "Units" . I started with the following:
我正在尝试在 y 轴为 "Units" 的索引 ("Price) 上创建直方图。我从以下内容开始:
plt.hist(df.index)
This gives me a histogram plotting the price. How can I add the Units to the y-axis? Right now it is just a "scale".
这给了我一个绘制价格的直方图。如何将单位添加到 y 轴?现在它只是一个“规模”。
Thank you!
谢谢!
回答by JD Long
Because your data is already partially aggregated, you can't use the hist()methods directly. Like @snorthway said in the comments, you can do this with a bar chart. Only you need to put your data in buckets first. My favorite way to put data in buckets is with the pandas cut()method.
因为您的数据已经部分聚合,所以您不能hist()直接使用这些方法。就像@snorthway 在评论中所说的那样,您可以使用条形图来做到这一点。只有您需要先将数据放入存储桶中。我最喜欢的将数据放入桶的cut()方法是使用 pandas方法。
Let's set up some example data since you didn't provide some that's easy to use:
让我们设置一些示例数据,因为您没有提供一些易于使用的数据:
np.random.seed(1)
n = 1000
df = pd.DataFrame({'Price' : np.random.normal(5,2,size=n),
'Units' : np.random.randint(100, size=n)})
Let's put the prices into 10 evenly spaced buckets:
让我们将价格放入 10 个均匀分布的桶中:
df['bucket'] = pd.cut(df.Price, 10)
print df.head()
Price Units bucket
0 8.248691 98 (7.307, 8.71]
1 3.776487 8 (3.0999, 4.502]
2 3.943656 89 (3.0999, 4.502]
3 2.854063 27 (1.697, 3.0999]
4 6.730815 29 (5.905, 7.307]
So now we have a field that contains the bucket range. If you want to give those buckets other names, you can read about that in the excellent Pandas documentation. Now we can use the Pandas groupby()method and sum()to add up the units:
所以现在我们有一个包含存储桶范围的字段。如果您想为这些存储桶赋予其他名称,您可以在优秀的Pandas 文档 中阅读相关内容。现在我们可以使用 Pandasgroupby()方法并将sum()单位相加:
newdf = df[['bucket','Units']].groupby('bucket').sum()
print newdf
Units
bucket
(-1.122, 0.295] 492
(0.295, 1.697] 1663
(1.697, 3.0999] 5003
(3.0999, 4.502] 11084
(4.502, 5.905] 15144
(5.905, 7.307] 11053
(7.307, 8.71] 4424
(8.71, 10.112] 1008
(10.112, 11.515] 77
(11.515, 12.917] 122
That looks like a winner... now let's plot it:
这看起来像一个赢家......现在让我们绘制它:
newdf.plot(kind='bar')



