Python Scipy:对数正态拟合
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18534562/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Scipy: lognormal fitting
提问by bioslime
There have been quite a few posts on handling the lognorm
distribution with Scipy but i still dont get the hang of it.
已经有很多关于lognorm
使用 Scipy处理分发的帖子,但我仍然没有掌握它的窍门。
The 2 parameter lognormal is usually described by the parameters \mu
and \sigma
which corresponds to Scipys loc=0
and \sigma=shape
, \mu=np.log(scale)
.
2参数对数正态通常是由参数来描述\mu
和\sigma
其对应于Scipysloc=0
和\sigma=shape
,\mu=np.log(scale)
。
At scipy, lognormal distribution - parameters, we can read how to generate a lognorm(\mu,\sigma)
sample using the exponential of a random distribution. Now lets try something else:
在scipy, lognormal distribution - parameters,我们可以阅读如何lognorm(\mu,\sigma)
使用随机分布的指数生成样本。现在让我们试试别的:
A)
一种)
Whats the problem in creating a lognorm directly:
直接创建 lognorm 有什么问题:
# lognorm(mu=10,sigma=3)
# so shape=3, loc=0, scale=np.exp(10) ?
x=np.linspace(0.01,20,200)
sample_dist = sp.stats.lognorm.pdf(x, 3, loc=0, scale=np.exp(10))
shape, loc, scale = sp.stats.lognorm.fit(sample_dist, floc=0)
print shape, loc, scale
print np.log(scale), shape # mu and sigma
# last line: -7.63285693379 0.140259699945 # not 10 and 3
B)
乙)
I use the return values of a fit to create a fitted distribution. But again im doing something wrong apparently:
我使用拟合的返回值来创建拟合分布。但我显然又做错了什么:
samp=sp.stats.lognorm(0.5,loc=0,scale=1).rvs(size=2000) # sample
param=sp.stats.lognorm.fit(samp) # fit the sample data
print param # does not coincide with shape, loc, scale above!
x=np.linspace(0,4,100)
pdf_fitted = sp.stats.lognorm.pdf(x, param[0], loc=param[1], scale=param[2]) # fitted distribution
pdf = sp.stats.lognorm.pdf(x, 0.5, loc=0, scale=1) # original distribution
plt.plot(x,pdf_fitted,'r-',x,pdf,'g-')
plt.hist(samp,bins=30,normed=True,alpha=.3)
回答by bioslime
I realized my mistakes:
我意识到我的错误:
A) The samples i am drawing need to come from the .rvs
method. Like so:
sample_dist = sp.stats.lognorm.rvs(3, loc=0, scale=np.exp(10), size=2000)
A) 我正在绘制的样本需要来自该.rvs
方法。像这样:
sample_dist = sp.stats.lognorm.rvs(3, loc=0, scale=np.exp(10), size=2000)
B) The fit has some problems. When we fix the loc
parameter the fit succeeds much better.
param=sp.stats.lognorm.fit(samp, floc=0)
B) 拟合有一些问题。当我们修复loc
参数时,拟合效果会更好。
param=sp.stats.lognorm.fit(samp, floc=0)
回答by Christian K.
I made the same observations: a free fit of all parameters fails most of the time. You can help by providing a better initial guess, fixing the parameter is not necessary.
我做了同样的观察:大多数情况下,所有参数的自由拟合都失败了。您可以通过提供更好的初始猜测来提供帮助,无需修复参数。
samp = stats.lognorm(0.5,loc=0,scale=1).rvs(size=2000)
# this is where the fit gets it initial guess from
print stats.lognorm._fitstart(samp)
(1.0, 0.66628696413404565, 0.28031095750445462)
print stats.lognorm.fit(samp)
# note that the fit failed completely as the parameters did not change at all
(1.0, 0.66628696413404565, 0.28031095750445462)
# fit again with a better initial guess for loc
print stats.lognorm.fit(samp, loc=0)
(0.50146296628099118, 0.0011019321419653122, 0.99361128537912125)
You can also make up your own function to calculate the initial guess, e.g.:
您还可以编写自己的函数来计算初始猜测,例如:
def your_func(sample):
# do some magic here
return guess
stats.lognorm._fitstart = your_func
回答by Luis DG
This problem has been fixed in newer scipy versions. After upgrading scipy0.9 to scipy0.14 the problem dissapears.
此问题已在较新的 scipy 版本中修复。将 scipy0.9 升级到 scipy0.14 后,问题消失。
回答by nenetto
I answered in here
我在这里回答
I leave the code here too just for lazy :D
我也把代码留在这里只是为了懒惰:D
import scipy
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
mu = 10 # Mean of sample !!! Make sure your data is positive for the lognormal example
sigma = 1.5 # Standard deviation of sample
N = 2000 # Number of samples
norm_dist = scipy.stats.norm(loc=mu, scale=sigma) # Create Random Process
x = norm_dist.rvs(size=N) # Generate samples
# Fit normal
fitting_params = scipy.stats.norm.fit(x)
norm_dist_fitted = scipy.stats.norm(*fitting_params)
t = np.linspace(np.min(x), np.max(x), 100)
# Plot normals
f, ax = plt.subplots(1, sharex='col', figsize=(10, 5))
sns.distplot(x, ax=ax, norm_hist=True, kde=False, label='Data X~N(mu={0:.1f}, sigma={1:.1f})'.format(mu, sigma))
ax.plot(t, norm_dist_fitted.pdf(t), lw=2, color='r',
label='Fitted Model X~N(mu={0:.1f}, sigma={1:.1f})'.format(norm_dist_fitted.mean(), norm_dist_fitted.std()))
ax.plot(t, norm_dist.pdf(t), lw=2, color='g', ls=':',
label='Original Model X~N(mu={0:.1f}, sigma={1:.1f})'.format(norm_dist.mean(), norm_dist.std()))
ax.legend(loc='lower right')
plt.show()
# The lognormal model fits to a variable whose log is normal
# We create our variable whose log is normal 'exponenciating' the previous variable
x_exp = np.exp(x)
mu_exp = np.exp(mu)
sigma_exp = np.exp(sigma)
fitting_params_lognormal = scipy.stats.lognorm.fit(x_exp, floc=0, scale=mu_exp)
lognorm_dist_fitted = scipy.stats.lognorm(*fitting_params_lognormal)
t = np.linspace(np.min(x_exp), np.max(x_exp), 100)
# Here is the magic I was looking for a long long time
lognorm_dist = scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu))
# Plot lognormals
f, ax = plt.subplots(1, sharex='col', figsize=(10, 5))
sns.distplot(x_exp, ax=ax, norm_hist=True, kde=False,
label='Data exp(X)~N(mu={0:.1f}, sigma={1:.1f})\n X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(mu, sigma))
ax.plot(t, lognorm_dist_fitted.pdf(t), lw=2, color='r',
label='Fitted Model X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(lognorm_dist_fitted.mean(), lognorm_dist_fitted.std()))
ax.plot(t, lognorm_dist.pdf(t), lw=2, color='g', ls=':',
label='Original Model X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(lognorm_dist.mean(), lognorm_dist.std()))
ax.legend(loc='lower right')
plt.show()
The trick is to understand these two things:
诀窍是理解这两件事:
- If the EXP of a variable is NORMAL with MU and STD -> EXP(X) ~ scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu))
- If your variable (x) HAS THE FORM of a LOGNORMAL, the model will be scipy.stats.lognorm(s=sigmaX, loc=0, scale=muX)
with:
- muX = np.mean(np.log(x))
- sigmaX = np.std(np.log(x))
- 如果变量的 EXP 是 NORMAL 与 MU 和 STD -> EXP(X) ~ scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu))
- 如果您的变量 (x) 具有 LOGNORMAL 的形式,则模型将为 scipy.stats.lognorm(s=sigmaX, loc=0, scale=muX) ,其中:
- muX = np.mean(np.log(x))
- sigmaX = np.std(np.log(x))
回答by bart cubrich
If you are just interested in plotting you can use seaborn to get a lognormal distribution.
如果您只是对绘图感兴趣,可以使用 seaborn 来获得对数正态分布。
import seaborn as sns
import numpy as np
mu=0
sigma=1
n=1000
x=np.random.normal(mu,sigma,n)
sns.distplot(x, fit=sp_stats.norm) #normal distribution
loc=0
scale=1
x=np.log(np.random.lognormal(loc,scale,n))
sns.distplot(x, fit=sp_stats.lognorm) #log normal distribution