pandas 使用python从指数分布和模型中生成随机数

Question

提问by JAG2024

My goal is to create a dataset of random points whose histogram looks like an exponential decay function and then plot an exponential decay function through those points.

我的目标是创建一个随机点的数据集，其直方图看起来像一个指数衰减函数，然后通过这些点绘制一个指数衰减函数。

First I tried to create a series of random numbers (but did not do so successfully since these should be points, not numbers) from an exponential distribution.

首先，我尝试从指数分布创建一系列随机数（但没有成功，因为这些应该是点，而不是数字）。

from pylab import *
from scipy.optimize import curve_fit
import random
import numpy as np
import pandas as pd

testx = pd.DataFrame(range(10)).astype(float)
testx = testx[0]

for i in range(1,11):
   x = random.expovariate(15) # rate = 15 arrivals per second
   data[i] = [x]

testy = pd.DataFrame(data).T.astype(float)
testy = testy[0]; testy

plot(testx, testy, 'ko')

The result could look something like this.

结果可能看起来像这样。

And then I define a function to draw a line through my points:

然后我定义了一个函数来通过我的点画一条线：

def func(x, a, e):
return a*np.exp(-a*x)+e

popt, pcov = curve_fit(f=func, xdata=testx, ydata=testy, p0 = None, sigma = None) 

print popt # parameters
print pcov # covariance

plot(testx, testy, 'ko')

xx = np.linspace(0, 15, 1000)
plot(xx, func(xx,*popt))

plt.show()

What I'm looking for is: (1) a more elegant way to create an array of random numbers from an exponential (decay) distribution and (2) how to test that my function is indeed going through the data points.

我正在寻找的是：（1）从指数（衰减）分布创建随机数数组的更优雅的方法，以及（2）如何测试我的函数确实通过数据点。

Answer 1

采纳答案by Bill Bell

I think you are actuallyasking about a regression problem, which is what Praveen was suggesting.

我认为您实际上是在问一个回归问题，这就是 Praveen 的建议。

You have a bog standard exponential decay that arrives at the y-axis at about y=0.27. Its equation is therefore y = 0.27*exp(-0.27*x). I can model gaussian error around the values of this function and plot the result using the following code.

您有一个沼泽标准指数衰减，它在 y=0.27 处到达 y 轴。因此它的方程是y = 0.27*exp(-0.27*x)。我可以围绕此函数的值对高斯误差进行建模，并使用以下代码绘制结果。

import matplotlib.pyplot as plt
from math import exp
from scipy.stats import norm


x = range(0, 16)
Y = [0.27*exp(-0.27*_) for _ in x]
error = norm.rvs(0, scale=0.05, size=9)
simulated_data = [max(0, y+e) for (y,e) in zip(Y[:9],error)]

plt.plot(x, Y, 'b-')
plt.plot(x[:9], simulated_data, 'r.')
plt.show()

print (x[:9])
print (simulated_data)

Here's the plot. Notice that I save the output values for subsequent use.

这是情节。请注意，我保存了输出值以供后续使用。

Now I can calculate the nonlinear regression of the exponential decay values, contaminated with noise, on the independent variable, which is what curve_fitdoes.

现在我可以计算被噪声污染的指数衰减值对自变量的非线性回归，这就是curve_fit。

from math import exp
from scipy.optimize import curve_fit
import numpy as np

def model(x, p):
    return p*np.exp(-p*x)

x = list(range(9))
Y = [0.22219001972988275, 0.15537454187341937, 0.15864069451825827, 0.056411162886672819, 0.037398831058143338, 0.10278251869912845, 0.03984605649260467, 0.0035360087611421981, 0.075855255999424692]

popt, pcov = curve_fit(model, x, Y)
print (popt[0])
print (pcov)

The bonus is that, not only does curve_fitcalculate an estimate for the parameter — 0.207962159793 — it also offers an estimate for this estimate's variance — 0.00086071 — as an element of pcov. This would appear to be a fairly small value, given the small sample size.

好处是，不仅curve_fit计算参数的估计值 — 0.207962159793 — 它还提供了该估计值方差的估计值 — 0.00086071 — 作为的元素pcov。鉴于样本量很小，这似乎是一个相当小的值。

Here's how to calculate the residuals. Notice that each residual is the difference between the data value and the value estimated from xusing the parameter estimate.

以下是计算残差的方法。请注意，每个残差是数据值与x使用参数估计值估计的值之间的差值。

residuals = [y-model(_, popt[0]) for (y, _) in zip(Y, x)]
print (residuals)

If you wanted to further 'test that my function is indeed going through the data points' then I would suggest looking for patterns in the residuals. But discussions like this might be beyond what's welcomed on stackoverflow: Q-Q and P-P plots, plots of residuals vs yor x, and so on.

如果您想进一步“测试我的函数确实通过数据点”，那么我建议您在残差中寻找模式。但是像这样的讨论可能超出了 stackoverflow 的欢迎范围：QQ 和 PP 图、残差图与yorx等。

Answer 2

回答by ImportanceOfBeingErnest

I would guess that the following is close to what you want. You can generate some random numbers drawn from an exponential distribution with numpy,

我猜想以下内容与您想要的很接近。您可以使用 numpy 从指数分布中生成一些随机数，

data = numpy.random.exponential(5, size=1000)

You can then create a histogram of them using numpy.histand draw the histogram values into a plot. You may decide to take the middle of the bins as position for the point (this assumption is of course wrong, but gets the more valid the more bins you use).

然后，您可以使用它们创建直方图并将numpy.hist直方图值绘制到图中。您可以决定将 bin 的中间作为点的位置（这个假设当然是错误的，但是使用的 bin 越多，结果越有效）。

Fitting works as in the code from the question. You will then find out that our fit roughly finds the parameter used for the data generation (in this case below ~5).

拟合在问题中的代码中起作用。然后你会发现我们的拟合粗略地找到了用于数据生成的参数（在本例中低于 ~5）。

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

data = np.random.exponential(5, size=1000)

hist,edges = np.histogram(data,bins="auto",density=True )

x = edges[:-1]+np.diff(edges)/2.
plt.scatter(x,hist)

func = lambda x,beta: 1./beta*np.exp(-x/beta)

popt, pcov = curve_fit(f=func, xdata=x, ydata=hist) 

print(popt)

xx = np.linspace(0, x.max(), 101)
plt.plot(xx, func(xx,*popt), ls="--", color="k", 
         label="fit, $beta = ${}".format(popt))
plt.legend()
plt.show()

Answer 3

回答by mikuszefski

I agree with the solution of @ImportanceOfBeingErnes, but I'd like to add a (well known?) general solution for distributions. If you have a distribution function fwith integral F(i.e. f = dF / dx) then you get the required distribution by mapping random numbers with inv Fi.e. the inverse function of the integral. In case of the exponential function, the integral is, again, an exponential and the inverse is the logarithm. So it can be done like this:

我同意@ImportanceOfBeingErnes 的解决方案，但我想为分布添加一个（众所周知的？）通用解决方案。如果您有一个f带积分的分布函数F（即f = dF / dx），那么您可以通过将随机数与inv F积分的反函数进行映射来获得所需的分布。在指数函数的情况下，积分再次是指数，而逆是对数。所以可以这样做：

import matplotlib.pyplot as plt
import numpy as np
from random import random


def gen( a ):
    y=random()
    return( -np.log( y ) / a )


def dist_func( x, a ):
    return( a * np.exp( -a * x) )


data = [ gen(3.14) for x in range(20000) ]
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.hist(data, bins=80, normed=True, histtype="step") 
ax.plot(np.linspace(0,5,150), dist_func( np.linspace(0,5,150), 3.14 ) )
plt.show()

pandas 使用python从指数分布和模型中生成随机数

提问by JAG2024

采纳答案by Bill Bell

回答by ImportanceOfBeingErnest

回答by mikuszefski

相关推荐

最近更新

标签

pandas 使用python从指数分布和模型中生成随机数

提问by JAG2024

采纳答案by Bill Bell

回答by ImportanceOfBeingErnest

回答by mikuszefski

相关推荐

pandas Python 3.6 安装大熊猫错误 - 找不到大熊猫的匹配分布

pandas 大熊猫石斑鱼与时间石斑鱼

pandas 计算熊猫中的非空值

pandas 重命名熊猫数据框的列名没有按预期工作 - python

相关推荐

最近更新

标签