pandas 使用python从指数分布和模型中生成随机数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47319277/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Generate random numbers from exponential distribution and model using python
提问by JAG2024
My goal is to create a dataset of random points whose histogram looks like an exponential decay function and then plot an exponential decay function through those points.
我的目标是创建一个随机点的数据集,其直方图看起来像一个指数衰减函数,然后通过这些点绘制一个指数衰减函数。
First I tried to create a series of random numbers (but did not do so successfully since these should be points, not numbers) from an exponential distribution.
首先,我尝试从指数分布创建一系列随机数(但没有成功,因为这些应该是点,而不是数字)。
from pylab import *
from scipy.optimize import curve_fit
import random
import numpy as np
import pandas as pd
testx = pd.DataFrame(range(10)).astype(float)
testx = testx[0]
for i in range(1,11):
x = random.expovariate(15) # rate = 15 arrivals per second
data[i] = [x]
testy = pd.DataFrame(data).T.astype(float)
testy = testy[0]; testy
plot(testx, testy, 'ko')
The result could look something like this.
结果可能看起来像这样。
And then I define a function to draw a line through my points:
然后我定义了一个函数来通过我的点画一条线:
def func(x, a, e):
return a*np.exp(-a*x)+e
popt, pcov = curve_fit(f=func, xdata=testx, ydata=testy, p0 = None, sigma = None)
print popt # parameters
print pcov # covariance
plot(testx, testy, 'ko')
xx = np.linspace(0, 15, 1000)
plot(xx, func(xx,*popt))
plt.show()
What I'm looking for is: (1) a more elegant way to create an array of random numbers from an exponential (decay) distribution and (2) how to test that my function is indeed going through the data points.
我正在寻找的是:(1)从指数(衰减)分布创建随机数数组的更优雅的方法,以及(2)如何测试我的函数确实通过数据点。
采纳答案by Bill Bell
I think you are actuallyasking about a regression problem, which is what Praveen was suggesting.
我认为您实际上是在问一个回归问题,这就是 Praveen 的建议。
You have a bog standard exponential decay that arrives at the y-axis at about y=0.27. Its equation is therefore y = 0.27*exp(-0.27*x)
. I can model gaussian error around the values of this function and plot the result using the following code.
您有一个沼泽标准指数衰减,它在 y=0.27 处到达 y 轴。因此它的方程是y = 0.27*exp(-0.27*x)
。我可以围绕此函数的值对高斯误差进行建模,并使用以下代码绘制结果。
import matplotlib.pyplot as plt
from math import exp
from scipy.stats import norm
x = range(0, 16)
Y = [0.27*exp(-0.27*_) for _ in x]
error = norm.rvs(0, scale=0.05, size=9)
simulated_data = [max(0, y+e) for (y,e) in zip(Y[:9],error)]
plt.plot(x, Y, 'b-')
plt.plot(x[:9], simulated_data, 'r.')
plt.show()
print (x[:9])
print (simulated_data)
Here's the plot. Notice that I save the output values for subsequent use.
这是情节。请注意,我保存了输出值以供后续使用。
Now I can calculate the nonlinear regression of the exponential decay values, contaminated with noise, on the independent variable, which is what curve_fit
does.
现在我可以计算被噪声污染的指数衰减值对自变量的非线性回归,这就是curve_fit
。
from math import exp
from scipy.optimize import curve_fit
import numpy as np
def model(x, p):
return p*np.exp(-p*x)
x = list(range(9))
Y = [0.22219001972988275, 0.15537454187341937, 0.15864069451825827, 0.056411162886672819, 0.037398831058143338, 0.10278251869912845, 0.03984605649260467, 0.0035360087611421981, 0.075855255999424692]
popt, pcov = curve_fit(model, x, Y)
print (popt[0])
print (pcov)
The bonus is that, not only does curve_fit
calculate an estimate for the parameter — 0.207962159793 — it also offers an estimate for this estimate's variance — 0.00086071 — as an element of pcov
. This would appear to be a fairly small value, given the small sample size.
好处是,不仅curve_fit
计算参数的估计值 — 0.207962159793 — 它还提供了该估计值方差的估计值 — 0.00086071 — 作为 的元素pcov
。鉴于样本量很小,这似乎是一个相当小的值。
Here's how to calculate the residuals. Notice that each residual is the difference between the data value and the value estimated from x
using the parameter estimate.
以下是计算残差的方法。请注意,每个残差是数据值与x
使用参数估计值估计的值之间的差值。
residuals = [y-model(_, popt[0]) for (y, _) in zip(Y, x)]
print (residuals)
If you wanted to further 'test that my function is indeed going through the data points' then I would suggest looking for patterns in the residuals. But discussions like this might be beyond what's welcomed on stackoverflow: Q-Q and P-P plots, plots of residuals vs y
or x
, and so on.
如果您想进一步“测试我的函数确实通过数据点”,那么我建议您在残差中寻找模式。但是像这样的讨论可能超出了 stackoverflow 的欢迎范围:QQ 和 PP 图、残差图与y
orx
等。
回答by ImportanceOfBeingErnest
I would guess that the following is close to what you want. You can generate some random numbers drawn from an exponential distribution with numpy,
我猜想以下内容与您想要的很接近。您可以使用 numpy 从指数分布中生成一些随机数,
data = numpy.random.exponential(5, size=1000)
You can then create a histogram of them using numpy.hist
and draw the histogram values into a plot. You may decide to take the middle of the bins as position for the point (this assumption is of course wrong, but gets the more valid the more bins you use).
然后,您可以使用它们创建直方图并将numpy.hist
直方图值绘制到图中。您可以决定将 bin 的中间作为点的位置(这个假设当然是错误的,但是使用的 bin 越多,结果越有效)。
Fitting works as in the code from the question. You will then find out that our fit roughly finds the parameter used for the data generation (in this case below ~5).
拟合在问题中的代码中起作用。然后你会发现我们的拟合粗略地找到了用于数据生成的参数(在本例中低于 ~5)。
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
data = np.random.exponential(5, size=1000)
hist,edges = np.histogram(data,bins="auto",density=True )
x = edges[:-1]+np.diff(edges)/2.
plt.scatter(x,hist)
func = lambda x,beta: 1./beta*np.exp(-x/beta)
popt, pcov = curve_fit(f=func, xdata=x, ydata=hist)
print(popt)
xx = np.linspace(0, x.max(), 101)
plt.plot(xx, func(xx,*popt), ls="--", color="k",
label="fit, $beta = ${}".format(popt))
plt.legend()
plt.show()
回答by mikuszefski
I agree with the solution of @ImportanceOfBeingErnes, but I'd like to add a (well known?) general solution for distributions. If you have a distribution function f
with integral F
(i.e. f = dF / dx
) then you get the required distribution by mapping random numbers with inv F
i.e. the inverse function of the integral. In case of the exponential function, the integral is, again, an exponential and the inverse is the logarithm. So it can be done like this:
我同意@ImportanceOfBeingErnes 的解决方案,但我想为分布添加一个(众所周知的?)通用解决方案。如果您有一个f
带积分的分布函数F
(即f = dF / dx
),那么您可以通过将随机数与inv F
积分的反函数进行映射来获得所需的分布。在指数函数的情况下,积分再次是指数,而逆是对数。所以可以这样做:
import matplotlib.pyplot as plt
import numpy as np
from random import random
def gen( a ):
y=random()
return( -np.log( y ) / a )
def dist_func( x, a ):
return( a * np.exp( -a * x) )
data = [ gen(3.14) for x in range(20000) ]
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.hist(data, bins=80, normed=True, histtype="step")
ax.plot(np.linspace(0,5,150), dist_func( np.linspace(0,5,150), 3.14 ) )
plt.show()