Python Numpy泊松分布
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35729290/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Numpy Poisson Distribution
提问by famfop
I am generating a Gaussian, for the sake of completeness, that's my implementation:
为了完整起见,我正在生成一个高斯,这是我的实现:
from numpy import *
x=linspace(0,1,1000)
y=exp(-(x-0.5)**2/(2.0*(0.1/(2*sqrt(2*log(2))))**2))
with peak at 0.5
and fwhm=0.1
. So far so not interesting. In the next step I calculate the poisson distribution of my set of data using numpys
random.poissonimplementation.
峰值在0.5
和fwhm=0.1
。到目前为止还不是很有趣。在下一步中,我使用numpys
random.poisson实现计算我的数据集的泊松分布。
poi = random.poisson(lam=y)
I'm having two major problems.
我有两个主要问题。
- A specialty of poisson is that the variance equals the exp. value, comparing the output of mean() and var() does confuse me as the outputs are not equal.
- When plotting this, the poisson dist. takes up integer values onlyand the max. value is around 7, sometimes 6, whilst my old function y has its max. at 1. Afai understand, the poisson-function should give me sort of a 'fit' of my actual function y. How come the max. values are not equal? Sorry for my mathematical incorrectness, actually I'm doing this to emulate poisson-distributed noise but I guess you understand 'fit' in this context.
- 泊松的一个特点是方差等于 exp。值,比较 mean() 和 var() 的输出确实让我感到困惑,因为输出不相等。
- 绘制此图时,泊松分布。占据整数值只和最大。值大约是 7,有时是 6,而我的旧函数 y 有它的最大值。在 1. Afai 明白,泊松函数应该给我一种我的实际函数 y 的“拟合”。最大值怎么来的 值不相等?很抱歉我的数学不正确,实际上我这样做是为了模拟泊松分布的噪声,但我想您在这种情况下理解“适合”。
EDIT: 3. question: What's the 'size' variable used for in this context? I've seen different types of usage but in the end they did not give me different results but failing when choosing it wrong...
编辑: 3. 问题:在此上下文中使用的“大小”变量是什么?我见过不同类型的用法,但最终他们没有给我不同的结果,但在选择错误时失败了......
EDIT2: OK, from the answer I got I think that I was not clear enough (although it already helped me correct some other stupid errors I did, thanks for that!). What I want to do is apply poisson (white) noise to the function y. As described by MSeifert in the post below, I now use the expectation value as lam. But this only gives me the noise. I guess I have some understanding problems on the level of how th{is,e} noise is applied (and maybe it's more physics related?!).
EDIT2:好的,从我得到的答案来看,我认为我还不够清楚(尽管它已经帮助我纠正了我所做的其他一些愚蠢的错误,谢谢!)。我想要做的是将泊松(白)噪声应用于函数 y。正如 MSeifert 在下面的帖子中所描述的,我现在使用期望值作为 lam。但这只会给我带来噪音。我想我在如何应用 th{is,e} 噪声方面有一些理解问题(也许它与物理更多相关?!)。
回答by MSeifert
First of all, I'll write this answer assuming you import numpy as np
because it clearly distinguishes numpy
functions from the builtins or those of the math
and random
package of python.
首先,我会假设你写这个答案,import numpy as np
因为它清楚地将numpy
函数与内置函数或python的math
和random
包的函数区分开来。
I think it is not necessary to answer your specified questions because your basic assumption is wrong:
我认为没有必要回答您指定的问题,因为您的基本假设是错误的:
Yes, the poisson-statistics has a mean that equals the variance but that assumes you use a constantlam
. But you don't. You input the y-values of your gaussian, so you cannot expect them to be constant (they are by your definition gaussian!).
是的,泊松统计的均值等于方差,但假设您使用的是常数lam
。但你没有。您输入高斯的 y 值,因此您不能期望它们是恒定的(根据您的定义,它们是高斯的!)。
Use np.random.poisson(lam=0.5)
to get one random value from a poisson distribution. But be careful since this poisson distribution is not even approximately identical to your gaussian distribution because you are in the "low-mean" interval where both of these are significantly different, see for example the Wikipedia article about Poisson distribution.
用于np.random.poisson(lam=0.5)
从泊松分布中获取一个随机值。但要小心,因为这种泊松分布甚至与您的高斯分布几乎不相同,因为您处于“低均值”区间,其中两者都显着不同,例如参见维基百科关于泊松分布的文章。
Also you are creating random numbers, so you shouldn't really plot them but plot a np.histogram
of them. Since statistical distributions are all about probabilitiy density functions (see Probability density function).
此外,您正在创建随机数,因此您不应该真正绘制它们,而是绘制其中np.histogram
的一个。由于统计分布都是关于概率密度函数的(参见概率密度函数)。
Before, I already mentioned that you create a poisson distribution with a constant lam
so now it is time to talk about the size
: You create random numbers, so to approximate the real poisson distribution you need to draw a lot of random numbers. There the size comes in: np.random.poisson(lam=0.5, size=10000)
for example creates an array of 10000 elements each drawn from a poissonian probability density function for a mean value of 0.5
.
之前,我已经提到你创建了一个具有常数的泊松分布,lam
所以现在是时候谈谈size
:你创建了随机数,所以为了近似真实的泊松分布,你需要绘制大量的随机数。大小就出现了:np.random.poisson(lam=0.5, size=10000)
例如,创建一个包含 10000 个元素的数组,每个元素都是从泊松概率密度函数中提取的,平均值为0.5
。
And if you haven't read it in the Wikipedia article mentioned before the poisson distribution gives by definition only unsigned (>= 0) integer as result.
如果你还没有在泊松分布之前提到的维基百科文章中阅读它,根据定义只给出无符号 (>= 0) 整数作为结果。
So I guess what you wanted to do is create a gaussian and poisson distribution containing 1000 values:
所以我猜你想要做的是创建一个包含 1000 个值的高斯和泊松分布:
gaussian = np.random.normal(0.5, 2*np.sqrt(2*np.log(2)), 1000)
poisson = np.random.poisson(0.5, 1000)
and then to plot it, plot the histograms:
然后绘制它,绘制直方图:
import matplotlib.pyplot as plt
plt.hist(gaussian)
plt.hist(poisson)
plt.show()
or use the np.histogram
instead.
或使用np.histogram
代替。
To get statistics from your random samples you can still use np.var
and np.mean
on the gaussian and poisson samples. And this time (at least on my sample run) they give good results:
要从您的随机样本中获取统计信息,您仍然可以使用np.var
和np.mean
高斯和泊松样本。这一次(至少在我的样本运行中)他们给出了很好的结果:
print(np.mean(gaussian))
0.653517935138
print(np.var(gaussian))
5.4848398775
print(np.mean(poisson))
0.477
print(np.var(poisson))
0.463471
Notice how the gaussian values are almost exactly what we defined as parameters. On the other hand poisson mean and var are almost equal. You can increase the precision of the mean and var by increasing the size
above.
请注意高斯值几乎与我们定义为参数的值完全相同。另一方面,泊松均值和 var 几乎相等。您可以通过增加size
上面的值来增加均值和变量的精度。
Why the poisson distribution doesn't approximate your original signal
为什么泊松分布不接近您的原始信号
Your original signal contains only values between 0 and 1, so the poisson distribution only allows positive integer and the standard deviation is linked to the mean value. So far from the mean of the gaussian your signal is approximatly 0, so the poisson distribution will almost always draw 0. Where the gaussian has it's maximum the value is 1. The poisson distribution for 1 looks like this (left is the signal + poisson and on the right the poisson distribution around a value of 1)
您的原始信号仅包含 0 到 1 之间的值,因此泊松分布仅允许正整数,并且标准偏差与平均值相关联。离高斯的平均值很远,你的信号大约是 0,所以泊松分布几乎总是画 0。高斯的最大值是 1。1 的泊松分布看起来像这样(左边是信号 + 泊松右边是值 1 附近的泊松分布)
so you'll get a lot of 0 and 1 and some 2 in that region. But also there is some probability that you draw values up to 7. This is exactly the antisymmetry that I mentioned. If you change the amplitude of your gaussian (multiply it by 1000 for example) the "fit" is much better since the poisson distribution is almost symmetric there:
所以你会在那个区域得到很多 0 和 1 以及一些 2。但也有一些概率将值绘制到 7。这正是我提到的反对称。如果你改变你的高斯幅度(例如乘以 1000),“拟合”会好得多,因为泊松分布在那里几乎是对称的: