C++ 产生幂律分布的随机数发生器?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/918736/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 17:57:04  来源:igfitidea点击:

Random number generator that produces a power-law distribution?

c++mathrandompower-law

提问by twk

I'm writing some tests for a C++ command line Linux app. I'd like to generate a bunch of integers with a power-law/long-tail distribution. Meaning, I get a some numbers very frequently but most of them relatively infrequently.

我正在为 C++ 命令行 Linux 应用程序编写一些测试。我想生成一堆具有幂律/长尾分布的整数。意思是,我经常得到一些数字,但其中大多数相对较少。

Ideally there would just be some magic equations I could use with rand() or one of the stdlib random functions. If not, an easy to use chunk of C/C++ would be great.

理想情况下,我可以将一些神奇的方程与 rand() 或 stdlib 随机函数之一一起使用。如果没有,一个易于使用的 C/C++ 块会很棒。

Thanks!

谢谢!

回答by gnovice

This page at Wolfram MathWorlddiscusses how to get a power-law distribution from a uniform distribution (which is what most random number generators provide).

Wolfram MathWorld 的这个页面讨论了如何从均匀分布(这是大多数随机数生成器提供的)中获得幂律分布。

The short answer (derivation at the above link):

简短的回答(从上面的链接推导):

x = [(x1^(n+1) - x0^(n+1))*y + x0^(n+1)]^(1/(n+1))

where yis a uniform variate, nis the distribution power, x0and x1define the range of the distribution, and xis your power-law distributed variate.

其中y是均匀变量,n是分布幂,x0x1定义分布范围,x是幂律分布变量。

回答by dmckee --- ex-moderator kitten

If you know the distribution you want (called the Probability Distribution Function (PDF)) and have it properly normalized, you can integrate it to get the Cumulative Distribution Function (CDF), then invert the CDF (if possible) to get the transformation you need from uniform [0,1]distribution to your desired.

如果您知道所需的分布(称为概率分布函数 (PDF))并对其进行了正确归一化,则可以对其进行积分以获得累积分布函数 (CDF),然后反转 CDF(如果可能)以获得转换需要从均匀[0,1]分布到您想要的。

So you start by defining the distribution you want.

因此,您首先要定义所需的分布。

P = F(x)

(for x in [0,1]) then integrated to give

(对于 [0,1] 中的 x)然后积分得到

C(y) = \int_0^y F(x) dx

If this can be inverted you get

如果这可以反转,你会得到

y = F^{-1}(C)

So call rand()and plug the result in as Cin the last line and use y.

所以rand()C最后一行一样调用并插入结果并使用 y。

This result is called the Fundamental Theorem of Sampling. This is a hassle because of the normalization requirement and the need to analytically invert the function.

这个结果被称为抽样的基本定理。由于归一化要求和分析反转函数的需要,这很麻烦。

Alternately you can use a rejection technique: throw a number uniformly in the desired range, then throw another number and compare to the PDF at the location indeicated by your first throw. Reject if the second throw exceeds the PDF. Tends to be inefficient for PDFs with a lot of low probability region, like those with long tails...

或者,您可以使用拒绝技术:在所需范围内均匀地抛出一个数字,然后抛出另一个数字并与第一次抛出所指示位置的 PDF 进行比较。如果第二次投掷超过 PDF,则拒绝。对于具有很多低概率区域的 PDF 来说,效率往往很低,比如那些长尾......

An intermediate approach involves inverting the CDF by brute force: you store the CDF as a lookup table, and do a reverse lookup to get the result.

一种中间方法涉及通过蛮力反转 CDF:您将 CDF 存储为查找表,然后执行反向查找以获得结果。



The real stinker here is that simple x^-ndistributions are non-normalizable on the range [0,1], so you can't use the sampling theorem. Try (x+1)^-n instead...

这里真正令人讨厌的是,简单x^-n分布在 range 上是不可归一化的[0,1],因此您不能使用采样定理。尝试 (x+1)^-n 代替...

回答by jwfearn

I can't comment on the math required to produce a power law distribution (the other posts have suggestions) but I would suggest you familiarize yourself with the TR1 C++ Standard Library random number facilities in <random>. These provide more functionality than std::randand std::srand. The new system specifies a modular API for generators, engines and distributions and supplies a bunch of presets.

我无法评论生成幂律分布所需的数学(其他帖子有建议),但我建议您熟悉<random>. 这些提供了比std::rand和更多的功能std::srand。新系统为生成器、引擎和发行版指定了模块化 API,并提供了大量预设。

The included distribution presets are:

包含的分发预设是:

  • uniform_int
  • bernoulli_distribution
  • geometric_distribution
  • poisson_distribution
  • binomial_distribution
  • uniform_real
  • exponential_distribution
  • normal_distribution
  • gamma_distribution
  • uniform_int
  • bernoulli_distribution
  • geometric_distribution
  • poisson_distribution
  • binomial_distribution
  • uniform_real
  • exponential_distribution
  • normal_distribution
  • gamma_distribution

When you define your power law distribution, you should be able to plug it in with existing generators and engines. The book The C++ Standard Library Extensionsby Pete Becker has a great chapter on <random>.

当您定义幂律分布时,您应该能够将其插入现有的发电机和引擎中。皮特·贝克尔 (Pete Becker)所著的《C++ 标准库扩展》一书有一个关于<random>.

Here is an articleabout how to create other distributions (with examples for Cauchy, Chi-squared, Student t and Snedecor F)

这是一篇关于如何创建其他分布的文章(以 Cauchy、Chi-squared、Student t 和 Snedecor F 为例)

回答by Antoni Parellada

I just wanted to carry out an actual simulation as a complement to the (rightfully) accepted answer. Although in R, the code is so simple as to be (pseudo)-pseudo-code.

我只是想进行一个实际的模拟,作为对(正确地)接受的答案的补充。尽管在 R 中,代码非常简单,以至于成为(伪)伪代码。

One tiny difference between the Wolfram MathWorld formulain the accepted answer and other, perhaps more common, equations is the fact that the power law exponentn(which is typically denoted as alpha) does not carry an explicit negative sign. So the chosen alpha value has to be negative, and typically between 2 and 3.

公认答案中的Wolfram MathWorld 公式与其他可能更常见的方程之间的一个微小差异是幂律指数n(通常表示为 alpha)不带有明确的负号。所以选择的 alpha 值必须是负数,通常在 2 到 3 之间。

x0and x1stand for the lower and upper limits of the distribution.

x0x1代表分布的下限和上限。

So here it is:

所以这里是:

set.seed(0)
x1 = 5           # Maximum value
x0 = 0.1         # It can't be zero; otherwise X^0^(neg) is 1/0.
alpha = -2.5     # It has to be negative.
y = runif(1e7)   # Number of samples
x  = ((x1^(alpha+1) - x0^(alpha+1))*y + x0^(alpha+1))^(1/(alpha+1))
plot(density(x), ylab="log density x", col=2)

enter image description here

在此处输入图片说明

or plotted in logarithmic scale:

或以对数刻度绘制:

plot(density(x), log="xy", ylab="log density x", col=2)

enter image description here

在此处输入图片说明

Here is the summary of the data:

以下是数据摘要:

> summary(x)
   Min.   1st Qu.  Median    Mean   3rd Qu.    Max. 
  0.1000  0.1208  0.1584    0.2590  0.2511   4.9388