如何在python中标准化直方图?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22241240/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:33:35  来源:igfitidea点击:

How to normalize a histogram in python?

pythonmatplotlibnormalization

提问by user40

I'm trying to plot normed histogram, but instead of getting 1 as maximum value on y axis, I'm getting different numbers.

我正在尝试绘制归一直方图,但不是在 y 轴上获得 1 作为最大值,而是获得了不同的数字。

For array k=(1,4,3,1)

对于数组 k=(1,4,3,1)

 import numpy as np

 def plotGraph():

    import matplotlib.pyplot as plt

    k=(1,4,3,1)

    plt.hist(k, normed=1)

    from numpy import *
    plt.xticks( arange(10) ) # 10 ticks on x axis

    plt.show()  

plotGraph()

I get this histogram, that doesn't look like normed.

我得到了这个直方图,它看起来不像规范。

enter image description here

在此处输入图片说明

For a different array k=(3,3,3,3)

对于不同的数组 k=(3,3,3,3)

 import numpy as np

 def plotGraph():

    import matplotlib.pyplot as plt

    k=(3,3,3,3)

    plt.hist(k, normed=1)

    from numpy import *
    plt.xticks( arange(10) ) # 10 ticks on x axis

    plt.show()  

plotGraph()

I get this histogram with max y-value is 10.

我得到最大 y 值为 10 的直方图。

enter image description here

在此处输入图片说明

For different k I get different max value of y even though normed=1 or normed=True.

对于不同的 k,即使 normed=1 或 normed=True,我也会得到不同的 y 最大值。

Why the normalization (if it works) changes based on the data and how can I make maximum value of y equals to 1?

为什么标准化(如果有效)会根据数据发生变化,以及如何使 y 的最大值等于 1?

UPDATE:

更新:

I am trying to implement Carsten K?niganswer from plotting histograms whose bar heights sum to 1 in matplotliband getting very weird result:

我试图通过绘制在 matplotlib 中条形高度总和为 1 的直方图来实现Carsten K?nig答案,并得到非常奇怪的结果:

import numpy as np

def plotGraph():

    import matplotlib.pyplot as plt

    k=(1,4,3,1)

    weights = np.ones_like(k)/len(k)
    plt.hist(k, weights=weights)

    from numpy import *
    plt.xticks( arange(10) ) # 10 ticks on x axis

    plt.show()  

plotGraph()

Result:

结果:

enter image description here

在此处输入图片说明

What am I doing wrong?

我究竟做错了什么?

Thanks

谢谢

采纳答案by CT Zhu

When you plot a normalized histogram, it is not the height that should sum up to one, but the area underneath the curve should sum up to one:

绘制归一化直方图时,不应将高度加起来为 1,而是曲线下方的面积应加起来为 1:

In [44]:

import matplotlib.pyplot as plt
k=(3,3,3,3)
x, bins, p=plt.hist(k, density=True)  # used to be normed=True in older versions
from numpy import *
plt.xticks( arange(10) ) # 10 ticks on x axis
plt.show()  
In [45]:

print bins
[ 2.5  2.6  2.7  2.8  2.9  3.   3.1  3.2  3.3  3.4  3.5]

Here, this example, the bin width is 0.1, the area underneath the curve sums up to one (0.1*10).

在此示例中,bin 宽度为 0.1,曲线下方的面积总和为 1 (0.1*10)。

To have the sum of height to be 1, add the following before plt.show():

要使高度总和为 1,请在 之前添加以下内容plt.show()

for item in p:
    item.set_height(item.get_height()/sum(x))

enter image description here

在此处输入图片说明

回答by zhangxaochen

One way is to get the probabilities on your own, and then plot with plt.bar:

一种方法是自己获得概率,然后用plt.bar以下方式绘制:

In [91]: from collections import Counter
    ...: c=Counter(k)
    ...: print c
Counter({1: 2, 3: 1, 4: 1})

In [92]: plt.bar(prob.keys(), prob.values())
    ...: plt.show()

result: enter image description here

结果: 在此处输入图片说明

回答by kthouz

A normed histogram is defined such that the sum of products of width and height of each column is equal to the total count. That's why you are not getting your max equal to one.

规范的直方图定义为每列的宽度和高度的乘积之和等于总计数。这就是为什么你没有让你的最大值等于 1。

However, if you still want to force it to be 1, you could use numpy and matplotlib.pyplot.bar in the following way

但是,如果您仍然想强制它为 1,则可以按以下方式使用 numpy 和 matplotlib.pyplot.bar

sample = np.random.normal(0,10,100)
#generate bins boundaries and heights
bin_height,bin_boundary = np.histogram(sample,bins=10)
#define width of each column
width = bin_boundary[1]-bin_boundary[0]
#standardize each column by dividing with the maximum height
bin_height = bin_height/float(max(bin_height))
#plot
plt.bar(bin_boundary[:-1],bin_height,width = width)
plt.show()

回答by upceric

You could use the solution outlined here:

您可以使用此处概述的解决方案:

weights = np.ones_like(myarray)/float(len(myarray))
plt.hist(myarray, weights=weights)

回答by Tova Halász

How the lines above:

上面的几行:

weights = np.ones_like(myarray)/float(len(myarray))
plt.hist(myarray, weights=weights)

should work when I have a stacked histogram like this?-

当我有这样的堆叠直方图时应该工作吗? -

n, bins, patches = plt.hist([from6to10, from10to14, from14to18, from18to22,  from22to6],
label= ['06:00-10:00','10:00-14:00','14:00-18:00','18:00- 22:00','22:00-06:00'],
stacked=True,edgecolor='black', alpha=0.8, linewidth=0.5, range=(np.nanmin(ref1arr),
stacked=True,edgecolor='black', alpha=0.8, linewidth=0.5, range=(np.nanmin(ref1arr), np.nanmax(ref1arr)), bins=10)