Python 使用 Scipy 拟合威布尔分布

Question

提问by kungphil

I am trying to recreate maximum likelihood distribution fitting, I can already do this in Matlab and R, but now I want to use scipy. In particular, I would like to estimate the Weibull distribution parameters for my data set.

我正在尝试重新创建最大似然分布拟合，我已经可以在 Matlab 和 R 中做到这一点，但现在我想使用 scipy。特别是，我想估计我的数据集的威布尔分布参数。

I have tried this:

我试过这个：

import scipy.stats as s
import numpy as np
import matplotlib.pyplot as plt

def weib(x,n,a):
    return (a / n) * (x / n)**(a - 1) * np.exp(-(x / n)**a)

data = np.loadtxt("stack_data.csv")

(loc, scale) = s.exponweib.fit_loc_scale(data, 1, 1)
print loc, scale

x = np.linspace(data.min(), data.max(), 1000)
plt.plot(x, weib(x, loc, scale))
plt.hist(data, data.max(), normed=True)
plt.show()

And get this:

得到这个：

(2.5827280639441961, 3.4955032285727947)

And a distribution that looks like this:

和一个看起来像这样的分布：

Weibull distribution using Scipy

使用 Scipy 的威布尔分布

I have been using the exponweibafter reading this http://www.johndcook.com/distributions_scipy.html. I have also tried the other Weibull functions in scipy (just in case!).

exponweib阅读完这篇http://www.johndcook.com/distributions_scipy.html后，我一直在使用它。我还在 scipy 中尝试了其他 Weibull 函数（以防万一！）。

In Matlab (using the Distribution Fitting Tool - see screenshot) and in R (using both the MASS library function fitdistrand the GAMLSS package) I get a (loc) and b (scale) parameters more like 1.58463497 5.93030013. I believe all three methods use the maximum likelihood method for distribution fitting.

在 Matlab（使用分布拟合工具 - 见截图）和 R（同时使用 MASS 库函数fitdistr和 GAMLSS 包）中，我得到 a (loc) 和 b (scale) 参数，更像是 1.58463497 5.93030013。我相信这三种方法都使用最大似然法进行分布拟合。

Weibull distribution using Matlab

使用 Matlab 进行威布尔分布

I have posted my data hereif you would like to have a go! And for completeness I am using Python 2.7.5, Scipy 0.12.0, R 2.15.2 and Matlab 2012b.

如果你想试一试，我已经在这里发布了我的数据！为了完整起见，我使用 Python 2.7.5、Scipy 0.12.0、R 2.15.2 和 Matlab 2012b。

Why am I getting a different result!?

为什么我得到不同的结果！？

Answer 1

采纳答案by Josef

My guess is that you want to estimate the shape parameter and the scale of the Weibull distribution while keeping the location fixed. Fixing locassumes that the values of your data and of the distribution are positive with lower bound at zero.

我的猜测是您想在保持位置固定的同时估计形状参数和威布尔分布的尺度。Fixingloc假设您的数据和分布的值为正，下限为零。

floc=0keeps the location fixed at zero, f0=1keeps the first shape parameter of the exponential weibull fixed at one.

floc=0保持位置固定为零，f0=1保持指数威布尔的第一个形状参数固定为一。

>>> stats.exponweib.fit(data, floc=0, f0=1)
[1, 1.8553346917584836, 0, 6.8820748596850905]
>>> stats.weibull_min.fit(data, floc=0)
[1.8553346917584836, 0, 6.8820748596850549]

The fit compared to the histogram looks ok, but not very good. The parameter estimates are a bit higher than the ones you mention are from R and matlab.

与直方图相比的拟合看起来不错，但不是很好。参数估计值比您提到的来自 R 和 matlab 的估计值要高一些。

Update

更新

The closest I can get to the plot that is now available is with unrestricted fit, but using starting values. The plot is still less peaked. Note values in fit that don't have an f in front are used as starting values.

我能得到的最接近现在可用的图是无限制拟合，但使用起始值。情节仍然没有达到顶峰。注意前面没有 f 的适合值用作起始值。

>>> from scipy import stats
>>> import matplotlib.pyplot as plt
>>> plt.plot(data, stats.exponweib.pdf(data, *stats.exponweib.fit(data, 1, 1, scale=02, loc=0)))
>>> _ = plt.hist(data, bins=np.linspace(0, 16, 33), normed=True, alpha=0.5);
>>> plt.show()

exponweib fit

指数拟合

Answer 2

回答by Saullo G. P. Castro

I was curious about your question and, despite this is not an answer, it compares the Matlabresult with your result and with the result using leastsq, which showed the best correlation with the given data:

我对您的问题很好奇，尽管这不是答案，但它会将Matlab结果与您的结果以及使用的结果进行比较leastsq，这显示了与给定数据的最佳相关性：

enter image description here

The code is as follows:

代码如下：

import scipy.stats as s
import numpy as np
import matplotlib.pyplot as plt
import numpy.random as mtrand
from scipy.integrate import quad
from scipy.optimize import leastsq

## my distribution (Inverse Normal with shape parameter mu=1.0)
def weib(x,n,a):
    return (a / n) * (x / n)**(a-1) * np.exp(-(x/n)**a)

def residuals(p,x,y):
    integral = quad( weib, 0, 16, args=(p[0],p[1]) )[0]
    penalization = abs(1.-integral)*100000
    return y - weib(x, p[0],p[1]) + penalization

#
data = np.loadtxt("stack_data.csv")


x = np.linspace(data.min(), data.max(), 100)
n, bins, patches = plt.hist(data,bins=x, normed=True)
binsm = (bins[1:]+bins[:-1])/2

popt, pcov = leastsq(func=residuals, x0=(1.,1.), args=(binsm,n))

loc, scale = 1.58463497, 5.93030013
plt.plot(binsm,n)
plt.plot(x, weib(x, loc, scale),
         label='weib matlab, loc=%1.3f, scale=%1.3f' % (loc, scale), lw=4.)
loc, scale = s.exponweib.fit_loc_scale(data, 1, 1)
plt.plot(x, weib(x, loc, scale),
         label='weib stack, loc=%1.3f, scale=%1.3f' % (loc, scale), lw=4.)
plt.plot(x, weib(x,*popt),
         label='weib leastsq, loc=%1.3f, scale=%1.3f' % tuple(popt), lw=4.)

plt.legend(loc='upper right')
plt.show()

Answer 3

回答by CT Zhu

It is easy to verify which result is the true MLE, just need a simple function to calculate log likelihood:

很容易验证哪个结果是真正的 MLE，只需要一个简单的函数来计算对数似然：

>>> def wb2LL(p, x): #log-likelihood
    return sum(log(stats.weibull_min.pdf(x, p[1], 0., p[0])))
>>> adata=loadtxt('/home/user/stack_data.csv')
>>> wb2LL(array([6.8820748596850905, 1.8553346917584836]), adata)
-8290.1227946678173
>>> wb2LL(array([5.93030013, 1.57463497]), adata)
-8410.3327470347667

The result from fitmethod of exponweiband R fitdistr(@Warren) is better and has higher log likelihood. It is more likely to be the true MLE. It is not surprising that the result from GAMLSS is different. It is a complete different statistic model: Generalized Additive Model.

和 R (@Warren) 的fit方法的结果更好，并且具有更高的对数似然。它更有可能是真正的 MLE。GAMLSS 的结果不同也就不足为奇了。它是一个完全不同的统计模型：Generalized Additive Model。exponweibfitdistr

Still not convinced? We can draw a 2D confidence limit plot around MLE, see Meeker and Escobar's book for detail). Multi-dimensional Confidence Region

还是不相信？我们可以围绕 MLE 绘制 2D 置信限图，详情请参阅 Meeker 和 Escobar 的书）。 Multi-dimensional Confidence Region

Again this verifies that array([6.8820748596850905, 1.8553346917584836])is the right answer as loglikelihood is lower that any other point in the parameter space. Note:

这再次验证了这array([6.8820748596850905, 1.8553346917584836])是正确的答案，因为对数似然比参数空间中的任何其他点都低。笔记：

>>> log(array([6.8820748596850905, 1.8553346917584836]))
array([ 1.92892018,  0.61806511])

BTW1, MLE fit may not appears to fit the distribution histogram tightly. An easy way to think about MLE is that MLE is the parameter estimate most probable given the observed data. It doesn't need to visually fit the histogram well, that will be something minimizing mean square error.

顺便说一句，MLE 拟合可能似乎与分布直方图不太吻合。考虑 MLE 的一种简单方法是 MLE 是给定观察数据最可能的参数估计。它不需要在视觉上很好地拟合直方图，这将是最小化均方误差的东西。

BTW2, your data appears to be leptokurtic and left-skewed, which means Weibull distribution may not fit your data well. Try, e.g. Gompertz-Logistic, which improves log-likelihood by another about 100. enter image description here Cheers!

顺便说一句，您的数据似乎是leptokurtic 和左偏的，这意味着Weibull 分布可能不太适合您的数据。试试，例如 Gompertz-Logistic，它将对数似然再提高 100。 enter image description here 干杯！

Answer 4

回答by Kaihua Cai

the order of loc and scale is messed up in the code:

loc 和 scale 的顺序在代码中搞砸了：

plt.plot(x, weib(x, scale, loc))

the scale parameter should come first.

比例参数应该放在第一位。

Answer 5

回答by hobs

I had the same problem, but found that setting loc=0in exponweib.fitprimed the pump for the optimization. That was all that was needed from @user333700's answer. I couldn't load your data -- your data linkpoints to an image, not data. So I ran a test on my data instead:

我有同样的问题，却发现设置loc=0在exponweib.fit灌注泵的优化。这就是@user333700's answer所需的全部内容。我无法加载您的数据 - 您的数据链接指向图像，而不是数据。所以我对我的数据进行了测试：

Plot of distribution fit to problematic (bimodal?) data

import scipy.stats as ss
import matplotlib.pyplot as plt
import numpy as np

N=30
counts, bins = np.histogram(x, bins=N)
bin_width = bins[1]-bins[0]
total_count = float(sum(counts))

f, ax = plt.subplots(1, 1)
f.suptitle(query_uri)

ax.bar(bins[:-1]+bin_width/2., counts, align='center', width=.85*bin_width)
ax.grid('on')
def fit_pdf(x, name='lognorm', color='r'):
    dist = getattr(ss, name)  # params = shape, loc, scale
    # dist = ss.gamma  # 3 params

    params = dist.fit(x, loc=0)  # 1-day lag minimum for shipping
    y = dist.pdf(bins, *params)*total_count*bin_width
    sqerror_sum = np.log(sum(ci*(yi - ci)**2. for (ci, yi) in zip(counts, y)))
    ax.plot(bins, y, color, lw=3, alpha=0.6, label='%s   err=%3.2f' % (name, sqerror_sum))
    return y

colors = ['r-', 'g-', 'r:', 'g:']

for name, color in zip(['exponweib', 't', 'gamma'], colors): # 'lognorm', 'erlang', 'chi2', 'weibull_min', 
    y = fit_pdf(x, name=name, color=color)

ax.legend(loc='best', frameon=False)
plt.show()

Answer 6

回答by Peter9192

I know it's an old post, but I just faced a similar problem and this thread helped me solve it. Thought my solution might be helpful for others like me:

我知道这是一个旧帖子，但我刚刚遇到了类似的问题，这个线程帮助我解决了它。认为我的解决方案可能对像我这样的其他人有帮助：

# Fit Weibull function, some explanation below
params = stats.exponweib.fit(data, floc=0, f0=1)
shape = params[1]
scale = params[3]
print 'shape:',shape
print 'scale:',scale

#### Plotting
# Histogram first
values,bins,hist = plt.hist(data,bins=51,range=(0,25),normed=True)
center = (bins[:-1] + bins[1:]) / 2.

# Using all params and the stats function
plt.plot(center,stats.exponweib.pdf(center,*params),lw=4,label='scipy')

# Using my own Weibull function as a check
def weibull(u,shape,scale):
    '''Weibull distribution for wind speed u with shape parameter k and scale parameter A'''
    return (shape / scale) * (u / scale)**(shape-1) * np.exp(-(u/scale)**shape)

plt.plot(center,weibull(center,shape,scale),label='Wind analysis',lw=2)
plt.legend()

Some extra info that helped me understand:

一些帮助我理解的额外信息：

Scipy Weibull function can take four input parameters: (a,c),loc and scale. You want to fix the loc and the first shape parameter (a), this is done with floc=0,f0=1. Fitting will then give you params c and scale, where c corresponds to the shape parameter of the two-parameter Weibull distribution (often used in wind data analysis) and scale corresponds to its scale factor.

Scipy Weibull 函数可以接受四个输入参数：(a,c)、loc 和 scale。您想修复 loc 和第一个形状参数 (a)，这是通过 floc=0,f0=1 完成的。拟合然后会给你参数 c 和比例，其中 c 对应于双参数威布尔分布的形状参数（通常用于风数据分析），比例对应于其比例因子。

From docs:

从文档：

exponweib.pdf(x, a, c) =
    a * c * (1-exp(-x**c))**(a-1) * exp(-x**c)*x**(c-1)

If a is 1, then

如果 a 为 1，则

exponweib.pdf(x, a, c) =
    c * (1-exp(-x**c))**(0) * exp(-x**c)*x**(c-1)
  = c * (1) * exp(-x**c)*x**(c-1)
  = c * x **(c-1) * exp(-x**c)

From this, the relation to the 'wind analysis' Weibull function should be more clear

由此，与“风分析”威布尔函数的关系应该更清楚

Answer 7

回答by Keith

There have been a few answers to this already here and in other places. likt in Weibull distribution and the data in the same figure (with numpy and scipy)

在这里和其他地方已经有一些答案。Weibull 分布中的likt和同一图中的数据（使用 numpy 和 scipy）

It still took me a while to come up with a clean toy example so I though it would be useful to post.

我仍然花了一段时间才想出一个干净的玩具示例，所以我认为发布它会很有用。

from scipy import stats
import matplotlib.pyplot as plt

#input for pseudo data
N = 10000
Kappa_in = 1.8
Lambda_in = 10
a_in = 1
loc_in = 0 

#Generate data from given input
data = stats.exponweib.rvs(a=a_in,c=Kappa_in, loc=loc_in, scale=Lambda_in, size = N)

#The a and loc are fixed in the fit since it is standard to assume they are known
a_out, Kappa_out, loc_out, Lambda_out = stats.exponweib.fit(data, f0=a_in,floc=loc_in)

#Plot
bins = range(51)
fig = plt.figure() 
ax = fig.add_subplot(1, 1, 1)
ax.plot(bins, stats.exponweib.pdf(bins, a=a_out,c=Kappa_out,loc=loc_out,scale = Lambda_out))
ax.hist(data, bins = bins , normed=True, alpha=0.5)
ax.annotate("Shape: $k = %.2f$ \n Scale: $\lambda = %.2f$"%(Kappa_out,Lambda_out), xy=(0.7, 0.85), xycoords=ax.transAxes)
plt.show()

Python 使用 Scipy 拟合威布尔分布

提问by kungphil

采纳答案by Josef

回答by Saullo G. P. Castro

回答by CT Zhu

回答by Kaihua Cai

回答by hobs

回答by Peter9192

回答by Keith

相关推荐

最近更新

标签

Python 使用 Scipy 拟合威布尔分布

提问by kungphil

采纳答案by Josef

回答by Saullo G. P. Castro

回答by CT Zhu

回答by Kaihua Cai

回答by hobs

回答by Peter9192

回答by Keith

相关推荐

Python numpy 数组的最快保存和加载选项

如何在 tkinter、Python 3.2.5 的文本框中打印并让用户输入？

Python WebdriverWait 显示 TimeoutException，如果我使用 sleep.time 它工作正常

Python 对一组值进行排序

相关推荐

最近更新

标签