C++ 创建无重复的随机数序列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/693880/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Create Random Number Sequence with No Repeats
提问by Unknown
Duplicate:
复制:
I want an pseudo random number generator that can generate numbers with no repeats in a random order.
我想要一个伪随机数生成器,它可以以随机顺序生成没有重复的数字。
For example:
例如:
random(10)
随机(10)
might return 5, 9, 1, 4, 2, 8, 3, 7, 6, 10
可能返回 5, 9, 1, 4, 2, 8, 3, 7, 6, 10
Is there a better way to do it other than making the range of numbers and shuffling them about, or checking the generated list for repeats?
除了制作数字范围并将它们改组或检查生成的列表是否重复之外,还有没有更好的方法来做到这一点?
Edit:
编辑:
Also I want it to be efficient in generating big numbers without the entire range.
此外,我希望它能够有效地在没有整个范围的情况下生成大数字。
Edit:
编辑:
I see everyone suggesting shuffle algorithms. But if I want to generate large random number (1024 byte+) then that method would take alot more memory than if I just used a regular RNG and inserted into a Set until it was a specified length, right? Is there no better mathematical algorithm for this.
我看到每个人都在建议 shuffle 算法。但是,如果我想生成大的随机数(1024 字节+),那么该方法将比我只使用常规 RNG 并插入到 Set 中直到达到指定长度需要更多的内存,对吧?有没有更好的数学算法来解决这个问题。
采纳答案by gbarry
You may be interested in a linear feedback shift register. We used to build these out of hardware, but I've also done them in software. It uses a shift register with some of the bits xor'ed and fed back to the input, and if you pick just the right "taps" you can get a sequence that's as long as the register size. That is, a 16-bit lfsr can produce a sequence 65535 long with no repeats. It's statistically random but of course eminently repeatable. Also, if it's done wrong, you can get some embarrassingly short sequences. If you look up the lfsr, you will find examples of how to construct them properly (which is to say, "maximal length").
您可能对线性反馈移位寄存器感兴趣。我们过去常常用硬件来构建这些,但我也用软件来完成它们。它使用一个移位寄存器,其中一些位被异或并反馈到输入,如果您选择正确的“抽头”,您可以获得与寄存器大小一样长的序列。也就是说,一个 16 位的 lfsr 可以产生一个长度为 65535 的没有重复的序列。它在统计上是随机的,但当然是可重复的。此外,如果做错了,你会得到一些令人尴尬的短序列。如果您查找 lfsr,您会找到有关如何正确构建它们的示例(即“最大长度”)。
回答by Mitch Wheat
A shuffle is a perfectly good way to do this (provided you do not introduce a bias using the naive algorithm). See Fisher-Yates shuffle.
shuffle 是一种非常好的方法(前提是您不使用朴素算法引入偏差)。参见费雪-耶茨洗牌。
回答by Daniel Earwicker
In order to ensure that the list doesn't repeat, it would have to keep a list of numbers previously returned. As it has to therefore generate the entire list by the end of the algorithm, this is equivalent in storage requirement to generating the ordered list and then shuffling.
为了确保列表不会重复,它必须保留一个先前返回的数字列表。由于它必须在算法结束时生成整个列表,这在存储要求上等同于生成有序列表然后混洗。
More about shuffling here: Creating a random ordered list from an ordered list
关于改组的更多信息:从有序列表创建随机有序列表
However, if the range of the random numbers is very large but the quantity of numbers required is small (you've hinted that this is the actual requirement in a comment), then generate a complete list and shuffling it is wasteful. A shuffle on a huge array involves accessing pages of virtual memory in a way that (by definition) will defeat the OS's paging system (on a smaller scale the same problem would occur with the CPU's memory cache).
但是,如果随机数的范围非常大而所需的数字数量很少(您已经在评论中暗示这是实际要求),那么生成一个完整列表并对其进行洗牌是浪费。对巨大阵列的洗牌涉及以(根据定义)将击败操作系统的分页系统的方式访问虚拟内存页面(在较小的规模上,CPU 的内存缓存会出现相同的问题)。
In this case, searching the list-so-far will be much more efficient. So the ideal would be to use heuristics (determined by experiment) to pick the right implementation for the given arguments. (Apologies for giving examples in C# rather than C++ but ASFAC++BI'm training myself to think in C#).
在这种情况下,搜索到目前为止的列表会更有效率。因此,理想的做法是使用启发式(由实验确定)为给定参数选择正确的实现。(抱歉用 C# 而不是 C++ 给出例子,但ASFAC++B我正在训练自己用 C# 思考)。
IEnumerable<int> GenerateRandomNumbers(int range, int quantity)
{
int[] a = new int[quantity];
if (range < Threshold)
{
for (int n = 0; n < range; n++)
a[n] = n;
Shuffle(a);
}
else
{
HashSet<int> used = new HashSet<int>();
for (int n = 0; n < quantity; n++)
{
int r = Random(range);
while (!used.Add(r))
r = Random(range);
a[n] = r;
}
}
return a;
}
The cost of doing the checking for repeated numbers, the looping while there are collisions, etc. will be expensive, but there will likely be some Threshold
value where it becomes faster than allocating for the entire range.
检查重复数字、发生冲突时的循环等的成本将是昂贵的,但可能会有一些Threshold
价值,它比为整个范围分配更快。
For sufficiently small quantity requirements, it may be faster to use an array for used
and do linear searches in it, due to the greater locality, lower overhead, the cheapness of the comparison...
对于足够小的数量要求,used
由于更大的局部性、更低的开销、比较的便宜,使用数组并在其中进行线性搜索可能会更快……
Also for large quantities AND large ranges, it might be preferable to return an object that produces the numbers in the sequence on request, instead of allocating the array for the results upfront. This is very easy to implement in C# thanks to the yield return
keyword:
同样对于大量和大范围,最好返回一个根据请求生成序列中数字的对象,而不是预先为结果分配数组。由于有yield return
关键字,这在 C# 中很容易实现:
IEnumerable<int> ForLargeQuantityAndRange(int quantity, int range)
{
for (int n = 0; n < quantity; n++)
{
int r = Random(range);
while (!used.Add(r))
r = Random(range);
yield return r;
}
}
回答by Motti
If a random number is guaranteed to never repeat it is no longer random and the amount of randomnessdecreases as the numbers are generated (after nine numbers random(10)
is rather predictable and even after only eight you have a 50-50 chance).
如果一个随机数被保证永远不会重复,它就不再是随机的,并且随着数字的生成,随机性的数量会减少(在九个数字之后random(10)
是相当可预测的,即使在只有八个数字之后,你也有 50-50 的机会)。
回答by SPWorley
I understand tou don't want a shuffle for large ranges, since you'd have to store the whole list to do so.
我知道你不想对大范围进行洗牌,因为你必须存储整个列表才能这样做。
Instead, use a reversible pseudo-random hash. Then feed in the values 0 1 2 3 4 5 6 etc in turn.
相反,使用可逆的伪随机哈希。然后依次输入值 0 1 2 3 4 5 6 等。
There are infinite numbers of hashes like this. They're not too hard to generate if they're restricted to a power of 2, but any base can be used.
像这样的哈希有无数个。如果将它们限制为 2 的幂,它们不会太难生成,但可以使用任何基础。
Here's one that would work for example if you wanted to go through all 2^32 32 bit values. It's easiest to write because the implicit mod 2^32 of integer math works to your advantage in this case.
例如,如果您想遍历所有 2^32 32 位值,这里有一个可以工作的方法。最容易编写,因为在这种情况下,整数数学的隐式模 2^32 对您有利。
unsigned int reversableHash(unsigned int x)
{
x*=0xDEADBEEF;
x=x^(x>>17);
x*=0x01234567;
x+=0x88776655;
x=x^(x>>4);
x=x^(x>>9);
x*=0x91827363;
x=x^(x>>7);
x=x^(x>>11);
x=x^(x>>20);
x*=0x77773333;
return x;
}
回答by Bill the Lizard
A shuffle is the best you can do for random numbers in a specific range with no repeats. The reason that the method you describe (randomly generate numbers and put them in a Set until you reach a specified length) is less efficient is because of duplicates. Theoretically, that algorithm might never finish. At best it will finish in an indeterminable amount of time, as compared to a shuffle, which will always run in a highly predictable amount of time.
shuffle 是您可以对特定范围内的无重复随机数执行的最佳操作。您描述的方法(随机生成数字并将它们放入 Set 直到达到指定长度)效率较低的原因是重复。从理论上讲,该算法可能永远不会完成。充其量它会在不确定的时间内完成,与 shuffle 相比,shuffle 总是在高度可预测的时间内运行。
对编辑和评论的回应:
If, as you indicate in the comments, the range of numbers is very large and you want to select relatively few of them at random with no repeats, then the likelihood of repeats diminishes rapidly. The bigger the difference in size between the range and the number of selections, the smaller the likelihood of repeat selections, and the better the performance will be for the select-and-check algorithm you describe in the question.
如果,正如您在评论中指出的,数字的范围非常大,而您想随机选择相对较少的数字而没有重复,那么重复的可能性就会迅速降低。范围和选择数量之间的大小差异越大,重复选择的可能性越小,您在问题中描述的选择和检查算法的性能就越好。
回答by starblue
If you don't mind mediocre randomness properties and if the number of elements allows it then you could use a linear congruential random number generator.
如果您不介意平庸的随机性属性,并且元素数量允许,那么您可以使用线性同余随机数生成器。
回答by Rashack
What about using GUID generator (like in the one in .NET). Granted it is not guaranteed that there will be no duplicates, however the chance getting one is pretty low.
使用 GUID 生成器怎么样(就像在 .NET 中的生成器一样)。当然,不能保证不会有重复,但是获得一个的机会非常低。
回答by Nick Johnson
This has been asked before - see my answer to the previous question. In a nutshell: You can use a block cipher to generate a secure (random) permutation over any range you want, without having to store the entire permutation at any point.
之前已经问过这个问题 - 请参阅我对上一个问题的回答。简而言之:您可以使用分组密码在您想要的任何范围内生成安全(随机)排列,而无需在任何时候存储整个排列。
回答by Brian Campbell
If you want to creating large (say, 64 bits or greater) random numbers with no repeats, then just create them. If you're using a good random number generator, that actually has enough entropy, then the odds of generating repeats are so miniscule as to not be worth worrying about.
如果您想创建无重复的大(例如 64 位或更大)随机数,则只需创建它们。如果你使用一个好的随机数生成器,它实际上有足够的熵,那么生成重复的几率非常小,不值得担心。
For instance, when generating cryptographic keys, no one actually bothers checking to see if they've generated the same key before; since you're trusting your random number generator that a dedicated attacker won't be able to get the same key out, then why would you expect that you would come up with the same key accidentally?
例如,在生成加密密钥时,实际上没有人会费心检查他们之前是否生成过相同的密钥;既然你相信你的随机数生成器,一个专门的攻击者将无法得到相同的密钥,那么你为什么会期望你会意外地想出相同的密钥?
Of course, if you have a bad random number generator (like the Debian SSL random number generator vulnerability), or are generating small enough numbers that the birthday paradoxgives you a high chance of collision, then you will need to actually do something to ensure you don't get repeats. But for large random numbers with a good generator, just trust probability not to give you any repeats.
当然,如果你有一个糟糕的随机数生成器(比如Debian SSL 随机数生成器漏洞),或者生成的数字足够小以至于生日悖论给你带来了很大的碰撞机会,那么你需要实际做一些事情来确保你不会得到重复。但是对于具有良好生成器的大随机数,请相信概率不会给您任何重复。