Python matplotlib/pandas 中是否有参数将直方图的 Y 轴作为百分比？

Question

提问by d1337

I would like to compare two histograms by having the Y axis show the percentage of each column from the overall dataset size instead of an absolute value. Is that possible? I am using Pandas and matplotlib. Thanks

我想通过让 Y 轴显示每列占整个数据集大小的百分比而不是绝对值来比较两个直方图。那可能吗？我正在使用 Pandas 和 matplotlib。谢谢

Answer 1

采纳答案by Rutger Kassies

The density=True(normed=Truefor matplotlib < 2.2.0) returns a histogram for which np.sum(pdf * np.diff(bins))equals 1. If you want the sum of the histogram to be 1 you can use Numpy's histogram() and normalize the results yourself.

在density=True（normed=True为matplotlib < 2.2.0）返回其直方图np.sum(pdf * np.diff(bins))如果你想直方图的总和为1，您可以使用numpy的的直方图（）和规范的结果自己等于1。

x = np.random.randn(30)

fig, ax = plt.subplots(1,2, figsize=(10,4))

ax[0].hist(x, density=True, color='grey')

hist, bins = np.histogram(x)
ax[1].bar(bins[:-1], hist.astype(np.float32) / hist.sum(), width=(bins[1]-bins[0]), color='grey')

ax[0].set_title('normed=True')
ax[1].set_title('hist = hist / hist.sum()')

enter image description here

在此处输入图片说明

Btw: Strange plotting glitch at the first bin of the left plot.

顺便说一句：左侧图的第一个 bin 处出现奇怪的绘图故障。

Answer 2

回答by rshield

Pandas plotting can accept any extra keyword arguments from the respective matplotlib function. So for completeness from the comments of others here, this is how one would do it:

Pandas 绘图可以接受来自相应 matplotlib 函数的任何额外关键字参数。因此，从其他人的评论来看，这里的完整性是这样的：

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(100,2), columns=list('AB'))

df.hist(density=1)

Also, for direct comparison this may be a good way as well:

此外，对于直接比较，这也可能是一个好方法：

df.plot(kind='hist', density=1, bins=20, stacked=False, alpha=.5)

Answer 3

回答by hobs

Looks like @CarstenK?nig found the right way:

看起来@CarstenK?nig找到了正确的方法：

df.hist(bins=20, weights=np.ones_like(df[df.columns[0]]) * 100. / len(df))

Answer 4

回答by Christoph Schranz

You can simplify the weighting using np.ones_like():

您可以使用np.ones_like()简化加权：

df["ColumnName"].plot.hist(weights = np.ones_like(df.index) / len(df.index))

np.ones_like() is okay with the df.index structure
len(df.index) is faster for large DataFrames

np.ones_like() 对 df.index 结构没问题
len(df.index) 对于大型 DataFrames 更快

Answer 5

回答by anon

I know this answer is 6 years later but to anyone using density=True (the substitute for the normed=True), this is not doing what you might want to. It will normalize the whole distribution so that the area of the bins is 1. So if you have more bins with a width < 1 you can expect the height to be > 1 (y-axis). If you want to bound your histogram to [0;1] you will have to calculate it yourself.

我知道这个答案是 6 年后，但对于任何使用 density=True（normed=True 的替代品）的人来说，这并不是您想要的。它将对整个分布进行归一化，以便 bin 的面积为 1。因此，如果您有更多宽度 < 1 的 bin，则可以预期高度 > 1（y 轴）。如果要将直方图绑定到 [0;1]，则必须自己计算。

Python matplotlib/pandas 中是否有参数将直方图的 Y 轴作为百分比？

提问by d1337

采纳答案by Rutger Kassies

回答by rshield

回答by hobs

回答by Christoph Schranz

回答by anon

相关推荐

最近更新

标签

Python matplotlib/pandas 中是否有参数将直方图的 Y 轴作为百分比？

提问by d1337

采纳答案by Rutger Kassies

回答by rshield

回答by hobs

回答by Christoph Schranz

回答by anon

相关推荐

在python中将字符串转换为图像

python迭代对象列表

Python中的内置二叉搜索树？

Python 检查目录是否存在，然后在必要时创建它并将图形保存到新目录？

相关推荐

最近更新

标签