如何在python中标准化直方图？

Question

提问by user40

I'm trying to plot normed histogram, but instead of getting 1 as maximum value on y axis, I'm getting different numbers.

我正在尝试绘制归一直方图，但不是在 y 轴上获得 1 作为最大值，而是获得了不同的数字。

For array k=(1,4,3,1)

对于数组 k=(1,4,3,1)

 import numpy as np

 def plotGraph():

    import matplotlib.pyplot as plt

    k=(1,4,3,1)

    plt.hist(k, normed=1)

    from numpy import *
    plt.xticks( arange(10) ) # 10 ticks on x axis

    plt.show()  

plotGraph()

I get this histogram, that doesn't look like normed.

我得到了这个直方图，它看起来不像规范。

enter image description here

在此处输入图片说明

For a different array k=(3,3,3,3)

对于不同的数组 k=(3,3,3,3)

 import numpy as np

 def plotGraph():

    import matplotlib.pyplot as plt

    k=(3,3,3,3)

    plt.hist(k, normed=1)

    from numpy import *
    plt.xticks( arange(10) ) # 10 ticks on x axis

    plt.show()  

plotGraph()

I get this histogram with max y-value is 10.

我得到最大 y 值为 10 的直方图。

enter image description here

在此处输入图片说明

For different k I get different max value of y even though normed=1 or normed=True.

对于不同的 k，即使 normed=1 或 normed=True，我也会得到不同的 y 最大值。

Why the normalization (if it works) changes based on the data and how can I make maximum value of y equals to 1?

为什么标准化（如果有效）会根据数据发生变化，以及如何使 y 的最大值等于 1？

UPDATE:

更新：

I am trying to implement Carsten K?niganswer from plotting histograms whose bar heights sum to 1 in matplotliband getting very weird result:

我试图通过绘制在 matplotlib 中条形高度总和为 1 的直方图来实现Carsten K?nig答案，并得到非常奇怪的结果：

import numpy as np

def plotGraph():

    import matplotlib.pyplot as plt

    k=(1,4,3,1)

    weights = np.ones_like(k)/len(k)
    plt.hist(k, weights=weights)

    from numpy import *
    plt.xticks( arange(10) ) # 10 ticks on x axis

    plt.show()  

plotGraph()

Result:

结果：

enter image description here

在此处输入图片说明

What am I doing wrong?

我究竟做错了什么？

Thanks

谢谢

Answer 1

采纳答案by CT Zhu

When you plot a normalized histogram, it is not the height that should sum up to one, but the area underneath the curve should sum up to one:

绘制归一化直方图时，不应将高度加起来为 1，而是曲线下方的面积应加起来为 1：

In [44]:

import matplotlib.pyplot as plt
k=(3,3,3,3)
x, bins, p=plt.hist(k, density=True)  # used to be normed=True in older versions
from numpy import *
plt.xticks( arange(10) ) # 10 ticks on x axis
plt.show()  
In [45]:

print bins
[ 2.5  2.6  2.7  2.8  2.9  3.   3.1  3.2  3.3  3.4  3.5]

Here, this example, the bin width is 0.1, the area underneath the curve sums up to one (0.1*10).

在此示例中，bin 宽度为 0.1，曲线下方的面积总和为 1 (0.1*10)。

To have the sum of height to be 1, add the following before plt.show():

要使高度总和为 1，请在之前添加以下内容plt.show()：

for item in p:
    item.set_height(item.get_height()/sum(x))

enter image description here

在此处输入图片说明

Answer 2

回答by zhangxaochen

One way is to get the probabilities on your own, and then plot with plt.bar:

一种方法是自己获得概率，然后用plt.bar以下方式绘制：

In [91]: from collections import Counter
    ...: c=Counter(k)
    ...: print c
Counter({1: 2, 3: 1, 4: 1})

In [92]: plt.bar(prob.keys(), prob.values())
    ...: plt.show()

result: enter image description here

结果：在此处输入图片说明

Answer 3

回答by kthouz

A normed histogram is defined such that the sum of products of width and height of each column is equal to the total count. That's why you are not getting your max equal to one.

规范的直方图定义为每列的宽度和高度的乘积之和等于总计数。这就是为什么你没有让你的最大值等于 1。

However, if you still want to force it to be 1, you could use numpy and matplotlib.pyplot.bar in the following way

但是，如果您仍然想强制它为 1，则可以按以下方式使用 numpy 和 matplotlib.pyplot.bar

sample = np.random.normal(0,10,100)
#generate bins boundaries and heights
bin_height,bin_boundary = np.histogram(sample,bins=10)
#define width of each column
width = bin_boundary[1]-bin_boundary[0]
#standardize each column by dividing with the maximum height
bin_height = bin_height/float(max(bin_height))
#plot
plt.bar(bin_boundary[:-1],bin_height,width = width)
plt.show()

Answer 4

回答by upceric

You could use the solution outlined here:

您可以使用此处概述的解决方案：

weights = np.ones_like(myarray)/float(len(myarray))
plt.hist(myarray, weights=weights)

Answer 5

回答by Tova Halász

How the lines above:

上面的几行：

weights = np.ones_like(myarray)/float(len(myarray))
plt.hist(myarray, weights=weights)

should work when I have a stacked histogram like this?-

当我有这样的堆叠直方图时应该工作吗？ -

n, bins, patches = plt.hist([from6to10, from10to14, from14to18, from18to22,  from22to6],
label= ['06:00-10:00','10:00-14:00','14:00-18:00','18:00- 22:00','22:00-06:00'],
stacked=True,edgecolor='black', alpha=0.8, linewidth=0.5, range=(np.nanmin(ref1arr),
stacked=True,edgecolor='black', alpha=0.8, linewidth=0.5, range=(np.nanmin(ref1arr), np.nanmax(ref1arr)), bins=10)

如何在python中标准化直方图？

提问by user40

采纳答案by CT Zhu

回答by zhangxaochen

回答by kthouz

回答by upceric

回答by Tova Halász

相关推荐

最近更新

标签

如何在python中标准化直方图？

提问by user40

采纳答案by CT Zhu

回答by zhangxaochen

回答by kthouz

回答by upceric

回答by Tova Halász

相关推荐

Python PIP 安装 Numpy 抛出错误“ascii 编解码器无法解码字节 0xe2”

Python 如何打印具有 3 个小数位的 numpy 数组？

在 python 中运行特定的批处理命令

Python PIP 在 Windows 7 上用于 64 位安装的 MS Visual Studio 2010 Express 的路径有问题

相关推荐

最近更新

标签