Python matplotlib/pandas 中是否有参数将直方图的 Y 轴作为百分比?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17874063/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is there a parameter in matplotlib/pandas to have the Y axis of a histogram as percentage?
提问by d1337
I would like to compare two histograms by having the Y axis show the percentage of each column from the overall dataset size instead of an absolute value. Is that possible? I am using Pandas and matplotlib. Thanks
我想通过让 Y 轴显示每列占整个数据集大小的百分比而不是绝对值来比较两个直方图。那可能吗?我正在使用 Pandas 和 matplotlib。谢谢
采纳答案by Rutger Kassies
The density=True
(normed=True
for matplotlib < 2.2.0
) returns a histogram for which np.sum(pdf * np.diff(bins))
equals 1. If you want the sum of the histogram to be 1 you can use Numpy's histogram() and normalize the results yourself.
在density=True
(normed=True
为matplotlib < 2.2.0
)返回其直方图np.sum(pdf * np.diff(bins))
如果你想直方图的总和为1,您可以使用numpy的的直方图()和规范的结果自己等于1。
x = np.random.randn(30)
fig, ax = plt.subplots(1,2, figsize=(10,4))
ax[0].hist(x, density=True, color='grey')
hist, bins = np.histogram(x)
ax[1].bar(bins[:-1], hist.astype(np.float32) / hist.sum(), width=(bins[1]-bins[0]), color='grey')
ax[0].set_title('normed=True')
ax[1].set_title('hist = hist / hist.sum()')
Btw: Strange plotting glitch at the first bin of the left plot.
顺便说一句:左侧图的第一个 bin 处出现奇怪的绘图故障。
回答by rshield
Pandas plotting can accept any extra keyword arguments from the respective matplotlib function. So for completeness from the comments of others here, this is how one would do it:
Pandas 绘图可以接受来自相应 matplotlib 函数的任何额外关键字参数。因此,从其他人的评论来看,这里的完整性是这样的:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100,2), columns=list('AB'))
df.hist(density=1)
Also, for direct comparison this may be a good way as well:
此外,对于直接比较,这也可能是一个好方法:
df.plot(kind='hist', density=1, bins=20, stacked=False, alpha=.5)
回答by hobs
Looks like @CarstenK?nig found the right way:
看起来@CarstenK?nig找到了正确的方法:
df.hist(bins=20, weights=np.ones_like(df[df.columns[0]]) * 100. / len(df))
回答by Christoph Schranz
You can simplify the weighting using np.ones_like():
您可以使用np.ones_like()简化加权:
df["ColumnName"].plot.hist(weights = np.ones_like(df.index) / len(df.index))
- np.ones_like() is okay with the df.index structure
- len(df.index) is faster for large DataFrames
- np.ones_like() 对 df.index 结构没问题
- len(df.index) 对于大型 DataFrames 更快
回答by anon
I know this answer is 6 years later but to anyone using density=True (the substitute for the normed=True), this is not doing what you might want to. It will normalize the whole distribution so that the area of the bins is 1. So if you have more bins with a width < 1 you can expect the height to be > 1 (y-axis). If you want to bound your histogram to [0;1] you will have to calculate it yourself.
我知道这个答案是 6 年后,但对于任何使用 density=True(normed=True 的替代品)的人来说,这并不是您想要的。它将对整个分布进行归一化,以便 bin 的面积为 1。因此,如果您有更多宽度 < 1 的 bin,则可以预期高度 > 1(y 轴)。如果要将直方图绑定到 [0;1],则必须自己计算。