Python matplotlib/pandas 中是否有参数将直方图的 Y 轴作为百分比?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17874063/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 09:23:06  来源:igfitidea点击:

Is there a parameter in matplotlib/pandas to have the Y axis of a histogram as percentage?

pythonpandasmatplotlib

提问by d1337

I would like to compare two histograms by having the Y axis show the percentage of each column from the overall dataset size instead of an absolute value. Is that possible? I am using Pandas and matplotlib. Thanks

我想通过让 Y 轴显示每列占整个数据集大小的百分比而不是绝对值来比较两个直方图。那可能吗?我正在使用 Pandas 和 matplotlib。谢谢

采纳答案by Rutger Kassies

The density=True(normed=Truefor matplotlib < 2.2.0) returns a histogram for which np.sum(pdf * np.diff(bins))equals 1. If you want the sum of the histogram to be 1 you can use Numpy's histogram() and normalize the results yourself.

density=Truenormed=Truematplotlib < 2.2.0)返回其直方图np.sum(pdf * np.diff(bins))如果你想直方图的总和为1,您可以使用numpy的的直方图()和规范的结果自己等于1。

x = np.random.randn(30)

fig, ax = plt.subplots(1,2, figsize=(10,4))

ax[0].hist(x, density=True, color='grey')

hist, bins = np.histogram(x)
ax[1].bar(bins[:-1], hist.astype(np.float32) / hist.sum(), width=(bins[1]-bins[0]), color='grey')

ax[0].set_title('normed=True')
ax[1].set_title('hist = hist / hist.sum()')

enter image description here

在此处输入图片说明

Btw: Strange plotting glitch at the first bin of the left plot.

顺便说一句:左侧图的第一个 bin 处出现奇怪的绘图故障。

回答by rshield

Pandas plotting can accept any extra keyword arguments from the respective matplotlib function. So for completeness from the comments of others here, this is how one would do it:

Pandas 绘图可以接受来自相应 matplotlib 函数的任何额外关键字参数。因此,从其他人的评论来看,这里的完整性是这样的:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(100,2), columns=list('AB'))

df.hist(density=1)

Also, for direct comparison this may be a good way as well:

此外,对于直接比较,这也可能是一个好方法:

df.plot(kind='hist', density=1, bins=20, stacked=False, alpha=.5)

回答by hobs

Looks like @CarstenK?nig found the right way:

看起来@CarstenK?nig找到了正确的方法

df.hist(bins=20, weights=np.ones_like(df[df.columns[0]]) * 100. / len(df))

回答by Christoph Schranz

You can simplify the weighting using np.ones_like():

您可以使用np.ones_like()简化加权:

df["ColumnName"].plot.hist(weights = np.ones_like(df.index) / len(df.index))
  • np.ones_like() is okay with the df.index structure
  • len(df.index) is faster for large DataFrames
  • np.ones_like() 对 df.index 结构没问题
  • len(df.index) 对于大型 DataFrames 更快

回答by anon

I know this answer is 6 years later but to anyone using density=True (the substitute for the normed=True), this is not doing what you might want to. It will normalize the whole distribution so that the area of the bins is 1. So if you have more bins with a width < 1 you can expect the height to be > 1 (y-axis). If you want to bound your histogram to [0;1] you will have to calculate it yourself.

我知道这个答案是 6 年后,但对于任何使用 density=True(normed=True 的替代品)的人来说,这并不是您想要的。它将对整个分布进行归一化,以便 bin 的面积为 1。因此,如果您有更多宽度 < 1 的 bin,则可以预期高度 > 1(y 轴)。如果要将直方图绑定到 [0;1],则必须自己计算。