pandas 用python拟合直方图

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33811353/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:15:12  来源:igfitidea点击:

Histogram fitting with python

pythonpandasmatplotlibscipydata-analysis

提问by user2820579

I've been surfing but haven't found the correct method to do the following.

我一直在冲浪,但还没有找到执行以下操作的正确方法。

I have a histogram done with matplotlib:

我有一个用 matplotlib 做的直方图:

hist, bins, patches = plt.hist(distance, bins=100, normed='True')

From the plot, I can see that the distribution is more or less an exponential (Poisson distribution). How can I do the best fitting, taking into account my hist and bins arrays?

从图中,我可以看到分布或多或少是指数分布(泊松分布)。考虑到我的 hist 和 bins 数组,我该如何做最好的拟合

UPDATE

更新

I am using the following approach:

我正在使用以下方法:

x = np.float64(bins) # Had some troubles with data types float128 and float64
hist = np.float64(hist)
myexp=lambda x,l,A:A*np.exp(-l*x)
popt,pcov=opt.curve_fit(myexp,(x[1:]+x[:-1])/2,hist)

But I get

但我得到

---> 41 plt.plot(stats.expon.pdf(np.arange(len(hist)),popt),'-')

ValueError: operands could not be broadcast together with shapes (100,) (2,)

回答by CT Zhu

What you described is a form of exponential distribution, and you want to estimate the parameters of the exponential distribution, given the probability density observed in your data. Instead of using non-linear regression method (which assumes the residue errors are Gaussian distributed), one correct way is arguably a MLE (maximum likelihood estimation).

您所描述的是一种形式的指数分布,并且您希望根据数据中观察到的概率密度来估计指数分布的参数。而不是使用非线性回归方法(假设残差是高斯分布的),一种正确的方法可以说是 MLE(最大似然估计)。

scipyprovides a large number of continuous distributions in its statslibrary, and the MLE is implemented with the .fit()method. Of course, exponential distribution is there:

scipy在其stats库中提供了大量的连续分布,MLE 是用该.fit()方法实现的。当然,指数分布是存在的

In [1]:

import numpy as np
import scipy.stats as ss
import matplotlib.pyplot as plt
%matplotlib inline
In [2]:
#generate data 
X = ss.expon.rvs(loc=0.5, scale=1.2, size=1000)

#MLE
P = ss.expon.fit(X)
print P
(0.50046056920696858, 1.1442947648425439)
#not exactly 0.5 and 1.2, due to being a finite sample

In [3]:
#plotting
rX = np.linspace(0,10, 100)
rP = ss.expon.pdf(rX, *P)
#Yup, just unpack P with *P, instead of scale=XX and shape=XX, etc.
In [4]:

#need to plot the normalized histogram with `normed=True`
plt.hist(X, normed=True)
plt.plot(rX, rP)
Out[4]:

enter image description here

在此处输入图片说明

Your distancewill replace Xhere.

distanceX在这里更换。