C++ 从范围生成随机整数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5008804/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Generating random integer from a range
提问by Matěj Zábsky
I need a function which would generate a random integer in given range (including border values). I don't unreasonable quality/randomness requirements, I have four requirements:
我需要一个函数来生成给定范围内的随机整数(包括边界值)。我没有不合理的质量/随机性要求,我有四个要求:
- I need it to be fast. My project needs to generate millions (or sometimes even tens of millions) of random numbers and my current generator function has proven to be a bottleneck.
- I need it to be reasonably uniform (use of rand() is perfectly fine).
- the min-max ranges can be anything from <0, 1> to <-32727, 32727>.
- it has to be seedable.
- 我需要它快。我的项目需要生成数百万(有时甚至数千万)随机数,而我当前的生成器功能已被证明是一个瓶颈。
- 我需要它相当统一(使用 rand() 非常好)。
- min-max 范围可以是从 <0, 1> 到 <-32727, 32727> 的任何值。
- 它必须是可播种的。
I currently have following C++ code:
我目前有以下 C++ 代码:
output = min + (rand() * (int)(max - min) / RAND_MAX)
The problem is, that it is not really uniform - max is returned only when rand() = RAND_MAX (for Visual C++ it is 1/32727). This is major issue for small ranges like <-1, 1>, where the last value is almost never returned.
问题是,它并不是真正统一的——仅当 rand() = RAND_MAX 时才返回 max(对于 Visual C++,它是 1/32727)。对于像 <-1, 1> 这样的小范围来说,这是一个主要问题,其中最后一个值几乎从不返回。
So I grabbed pen and paper and came up with following formula (which builds on the (int)(n + 0.5) integer rounding trick):
所以我拿起笔和纸,想出了以下公式(它建立在 (int)(n + 0.5) 整数舍入技巧上):
But it still doesn't give me uniform distribution. Repeated runs with 10000 samples give me ratio of 37:50:13 for values values -1, 0. 1.
但它仍然没有给我均匀分布。重复运行 10000 个样本,对于值 -1、0. 1 的比率为 37:50:13。
Could you please suggest better formula? (or even whole pseudo-random number generator function)
你能建议更好的公式吗?(甚至整个伪随机数生成器函数)
采纳答案by Mark B
A fast, somewhat better than yours, but still not properly uniform distributed solution is
一个快速的,比你的好一些,但仍然没有正确统一的分布式解决方案是
output = min + (rand() % static_cast<int>(max - min + 1))
Except when the size of the range is a power of 2, this method produces biased non-uniform distributednumbersregardless the quality of rand()
. For a comprehensive test of the quality of this method, please read this.
除非范围的大小是 2 的幂,否则无论 的质量如何,此方法都会产生有偏的非均匀分布数rand()
。有关此方法质量的全面测试,请阅读此。
回答by Walter
The simplest (and hence best) C++ (using the 2011 standard) answer is
最简单(也是最好的)C++(使用 2011 标准)的答案是
#include <random>
std::random_device rd; // only used once to initialise (seed) engine
std::mt19937 rng(rd()); // random-number engine used (Mersenne-Twister in this case)
std::uniform_int_distribution<int> uni(min,max); // guaranteed unbiased
auto random_integer = uni(rng);
No need to re-invent the wheel. No need to worry about bias. No need to worry about using time as random seed.
无需重新发明轮子。无需担心偏见。无需担心使用时间作为随机种子。
回答by Howard Hinnant
If your compiler supports C++0x and using it is an option for you, then the new standard <random>
header is likely to meet your needs. It has a high quality uniform_int_distribution
which will accept minimum and maximum bounds (inclusive as you need), and you can choose among various random number generators to plug into that distribution.
如果您的编译器支持 C++0x 并且您可以选择使用它,那么新的标准<random>
头文件可能会满足您的需求。它具有高质量uniform_int_distribution
,可以接受最小和最大边界(根据需要包括在内),您可以在各种随机数生成器中进行选择以插入该分布。
Here is code that generates a million random int
s uniformly distributed in [-57, 365]. I've used the new std <chrono>
facilities to time it as you mentioned performance is a major concern for you.
这是生成一百万个int
均匀分布在 [-57, 365] 中的随机数的代码。我已经使用新的标准<chrono>
工具来计时,因为您提到性能是您的主要关注点。
#include <iostream>
#include <random>
#include <chrono>
int main()
{
typedef std::chrono::high_resolution_clock Clock;
typedef std::chrono::duration<double> sec;
Clock::time_point t0 = Clock::now();
const int N = 10000000;
typedef std::minstd_rand G;
G g;
typedef std::uniform_int_distribution<> D;
D d(-57, 365);
int c = 0;
for (int i = 0; i < N; ++i)
c += d(g);
Clock::time_point t1 = Clock::now();
std::cout << N/sec(t1-t0).count() << " random numbers per second.\n";
return c;
}
For me (2.8 GHz Intel Core i5) this prints out:
对我来说(2.8 GHz Intel Core i5)打印出来:
2.10268e+07 random numbers per second.
每秒 2.10268e+07 个随机数。
You can seed the generator by passing in an int to its constructor:
您可以通过将 int 传递给其构造函数来为生成器设定种子:
G g(seed);
If you later find that int
doesn't cover the range you need for your distribution, this can be remedied by changing the uniform_int_distribution
like so (e.g. to long long
):
如果您后来发现int
它没有涵盖您的发行版所需的范围,则可以通过更改uniform_int_distribution
类似的内容(例如long long
)来解决此问题:
typedef std::uniform_int_distribution<long long> D;
If you later find that the minstd_rand
isn't a high enough quality generator, that can also easily be swapped out. E.g.:
如果您后来发现它的minstd_rand
质量不够高,也可以轻松更换。例如:
typedef std::mt19937 G; // Now using mersenne_twister_engine
Having separate control over the random number generator, and the random distribution can be quite liberating.
对随机数生成器进行单独控制,随机分布可以非常自由。
I've also computed (not shown) the first 4 "moments" of this distribution (using minstd_rand
) and compared them to the theoretical valuesin an attempt to quantify the quality of the distribution:
我还计算了(未显示)此分布的前 4 个“时刻”(使用minstd_rand
)并将它们与理论值进行比较,以试图量化分布的质量:
min = -57
max = 365
mean = 154.131
x_mean = 154
var = 14931.9
x_var = 14910.7
skew = -0.00197375
x_skew = 0
kurtosis = -1.20129
x_kurtosis = -1.20001
(The x_
prefix refers to "expected")
(x_
前缀是指“预期”)
回答by J?rgen Fogh
Let's split the problem into two parts:
让我们把问题分成两部分:
- Generate a random number
n
in the range 0 through (max-min). - Add min to that number
- 生成
n
0 到 (max-min) 范围内的随机数。 - 将 min 添加到该数字
The first part is obviously the hardest. Let's assume that the return value of rand() is perfectly uniform. Using modulo will add bias
to the first (RAND_MAX + 1) % (max-min+1)
numbers. So if we could magically change RAND_MAX
to RAND_MAX - (RAND_MAX + 1) % (max-min+1)
, there would no longer be any bias.
第一部分显然是最难的。让我们假设 rand() 的返回值是完全一致的。使用模数会增加第一个(RAND_MAX + 1) % (max-min+1)
数字的偏差。因此,如果我们可以神奇地更改RAND_MAX
为RAND_MAX - (RAND_MAX + 1) % (max-min+1)
,则不再有任何偏差。
It turns out that we can use this intuition if we are willing to allow pseudo-nondeterminism into the running time of our algorithm. Whenever rand() returns a number which is too large, we simply ask for another random number until we get one which is small enough.
事实证明,如果我们愿意允许伪不确定性进入我们算法的运行时间,我们就可以使用这种直觉。每当 rand() 返回一个太大的数字时,我们只需要求另一个随机数,直到我们得到一个足够小的随机数。
The running time is now geometrically distributed, with expected value 1/p
where p
is the probability of getting a small enough number on the first try. Since RAND_MAX - (RAND_MAX + 1) % (max-min+1)
is always less than (RAND_MAX + 1) / 2
,
we know that p > 1/2
, so the expected number of iterations will always be less than two
for any range. It should be possible to generate tens of millions of random numbers in less than a second on a standard CPU with this technique.
运行时间现在几何分布,与预期值1/p
,其中p
是获得第一次尝试一个足够小的数的概率。由于RAND_MAX - (RAND_MAX + 1) % (max-min+1)
始终小于(RAND_MAX + 1) / 2
,我们知道p > 1/2
,因此对于任何范围,预期的迭代次数将始终小于 2。使用这种技术应该可以在标准 CPU 上在不到一秒的时间内生成数千万个随机数。
EDIT:
编辑:
Although the above is technically correct, DSimon's answer is probably more useful in practice. You shouldn't implement this stuff yourself. I have seen a lot of implementations of rejection sampling and it is often very difficult to see if it's correct or not.
虽然以上在技术上是正确的,但 DSimon 的答案在实践中可能更有用。你不应该自己实现这些东西。我见过很多拒绝抽样的实现,通常很难判断它是否正确。
回答by Aphex
How about the Mersenne Twister? The boost implementation is rather easy to use and is well tested in many real-world applications. I've used it myself in several academic projects such as artificial intelligence and evolutionary algorithms.
如何在梅森难题?boost 实现相当容易使用,并且在许多实际应用中得到了很好的测试。我自己在几个学术项目中使用过它,比如人工智能和进化算法。
Here's their example where they make a simple function to roll a six-sided die:
这是他们的示例,他们制作了一个简单的函数来滚动六面骰子:
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/uniform_int.hpp>
#include <boost/random/variate_generator.hpp>
boost::mt19937 gen;
int roll_die() {
boost::uniform_int<> dist(1, 6);
boost::variate_generator<boost::mt19937&, boost::uniform_int<> > die(gen, dist);
return die();
}
Oh, and here's some more pimping of this generator just in case you aren't convinced you should use it over the vastly inferior rand()
:
哦,这里还有一些关于这个生成器的皮条客,以防万一你不相信你应该在非常低劣的情况下使用它rand()
:
The Mersenne Twister is a "random number" generator invented by Makoto Matsumoto and Takuji Nishimura; their website includes numerous implementations of the algorithm.
Essentially, the Mersenne Twister is a very large linear-feedback shift register. The algorithm operates on a 19,937 bit seed, stored in an 624-element array of 32-bit unsigned integers. The value 2^19937-1 is a Mersenne prime; the technique for manipulating the seed is based on an older "twisting" algorithm -- hence the name "Mersenne Twister".
An appealing aspect of the Mersenne Twister is its use of binary operations -- as opposed to time-consuming multiplication -- for generating numbers. The algorithm also has a very long period, and good granularity. It is both fast and effective for non-cryptographic applications.
Mersenne Twister 是由 Makoto Matsumoto 和 Takuji Nishimura 发明的“随机数”发生器;他们的网站包括该算法的许多实现。
本质上,Mersenne Twister 是一个非常大的线性反馈移位寄存器。该算法在 19,937 位种子上运行,种子存储在 624 个元素的 32 位无符号整数数组中。值 2^19937-1 是梅森素数;操纵种子的技术基于较旧的“扭曲”算法——因此得名“Mersenne Twister”。
Mersenne Twister 一个吸引人的方面是它使用二进制运算——而不是耗时的乘法——来生成数字。该算法周期长,粒度好。它对于非加密应用程序既快速又有效。
回答by Lior Kogan
int RandU(int nMin, int nMax)
{
return nMin + (int)((double)rand() / (RAND_MAX+1) * (nMax-nMin+1));
}
This is a mapping of 32768 integers to (nMax-nMin+1) integers. The mapping will be quite good if (nMax-nMin+1) is small (as in your requirement). Note however that if (nMax-nMin+1) is large, the mapping won't work (For example - you can't map 32768 values to 30000 values with equal probability). If such ranges are needed - you should use a 32-bit or 64-bit random source, instead of the 15-bit rand(), or ignore rand() results which are out-of-range.
这是 32768 个整数到 (nMax-nMin+1) 个整数的映射。如果 (nMax-nMin+1) 很小(如您的要求),映射将非常好。但是请注意,如果 (nMax-nMin+1) 很大,则映射将不起作用(例如 - 您不能以相等的概率将 32768 个值映射到 30000 个值)。如果需要这样的范围 - 您应该使用 32 位或 64 位随机源,而不是 15 位 rand(),或者忽略超出范围的 rand() 结果。
回答by Jeremiah Willcock
Here is an unbiased version that generates numbers in [low, high]
:
这是一个生成数字的无偏版本[low, high]
:
int r;
do {
r = rand();
} while (r < ((unsigned int)(RAND_MAX) + 1) % (high + 1 - low));
return r % (high + 1 - low) + low;
If your range is reasonably small, there is no reason to cache the right-hand side of the comparison in the do
loop.
如果您的范围相当小,则没有理由在do
循环中缓存比较的右侧。
回答by DSimon
I recommend the Boost.Random library, it's super detailed and well-documented, lets you explicitly specify what distribution you want, and in non-cryptographic scenarios can actually outperforma typical C library rand implementation.
我推荐Boost.Random 库,它非常详细且文档齐全,可让您明确指定所需的分布,并且在非加密场景中实际上可以胜过典型的 C 库 rand 实现。
回答by Huang Kun
assume min and max are int values, [ and ] means include this value, ( and ) means not include this value, using above to get the right value using c++ rand()
假设 min 和 max 是 int 值,[ 和 ] 表示包括这个值,( 和 ) 表示不包括这个值,使用上面的使用 c++ rand() 获得正确的值
reference: for ()[] define, visit:
参考:for()[]定义,访问:
https://en.wikipedia.org/wiki/Interval_(mathematics)
https://en.wikipedia.org/wiki/Interval_(数学)
for rand and srand function or RAND_MAX define, visit:
对于 rand 和 srand 函数或 RAND_MAX 定义,请访问:
http://en.cppreference.com/w/cpp/numeric/random/rand
http://en.cppreference.com/w/cpp/numeric/random/rand
[min, max]
[最小,最大]
int randNum = rand() % (max - min + 1) + min
(min, max]
(最小,最大]
int randNum = rand() % (max - min) + min + 1
[min, max)
[最小,最大)
int randNum = rand() % (max - min) + min
(min, max)
(最小,最大)
int randNum = rand() % (max - min - 1) + min + 1
回答by Pado
In this thread rejection sampling was already discussed, but I wanted to suggest one optimization based on the fact that rand() % 2^something
does not introduce any bias as already mentioned above.
在这个线程中已经讨论了拒绝采样,但我想根据rand() % 2^something
上面已经提到的不会引入任何偏差的事实提出一种优化建议。
The algorithm is really simple:
算法其实很简单:
- calculate the smallest power of 2 greater than the interval length
- randomize one number in that "new" interval
- return that number if it is less than the length of the original interval
- reject otherwise
- 计算大于间隔长度的 2 的最小幂
- 在该“新”间隔中随机化一个数字
- 如果它小于原始间隔的长度,则返回该数字
- 否则拒绝
Here's my sample code:
这是我的示例代码:
int randInInterval(int min, int max) {
int intervalLen = max - min + 1;
//now calculate the smallest power of 2 that is >= than `intervalLen`
int ceilingPowerOf2 = pow(2, ceil(log2(intervalLen)));
int randomNumber = rand() % ceilingPowerOf2; //this is "as uniform as rand()"
if (randomNumber < intervalLen)
return min + randomNumber; //ok!
return randInInterval(min, max); //reject sample and try again
}
This works well especially for small intervals, because the power of 2 will be "nearer" to the real interval length, and so the number of misses will be smaller.
这尤其适用于小间隔,因为 2 的幂将“更接近”实际间隔长度,因此未命中的次数会更小。
PS
Obviously avoiding the recursion would be more efficient (no need to calculate over and over the log ceiling..) but I thought it was more readable for this example.
PS
显然避免递归会更有效(不需要一遍又一遍地计算日志上限......)但我认为这个例子更具可读性。