C++ 在 [0..n-1] 范围内生成 m 个不同的随机数

Question

提问by Armen Tsirunyan

I have two methods of generating m distinct random numbers in the range [0..n-1]

我有两种方法可以在 [0..n-1] 范围内生成 m 个不同的随机数

Method 1:

方法一：

//C++-ish pseudocode
int result[m];
for(i = 0; i < m; ++i)
{
   int r;
   do
   {
      r = rand()%n;
   }while(r is found in result array at indices from 0 to i)
   result[i] = r;   
}

Method 2:

方法二：

//C++-ish pseudocode
int arr[n];
for(int i = 0; i < n; ++i)
    arr[i] = i;
random_shuffle(arr, arr+n);
result = first m elements in arr;

The first method is more efficient when n is much larger than m, whereas the second is more efficient otherwise. But "much larger" isn't that strict a notion, is it? :)

当 n 远大于 m 时，第一种方法更有效，否则第二种方法更有效。但是“更大”不是一个严格的概念，是吗？:)

Question:What formula of n and m should I use to determine whether method1 or method2 will be more efficient? (in terms of mathematical expectation of the running time)

问题：我应该使用 n 和 m 的什么公式来确定方法 1 还是方法 2 的效率更高？（就运行时间的数学期望而言）

Answer 1

采纳答案by Grigor Gevorgyan

Pure mathematics:
Let's calculate the quantity of rand()function calls in both cases and compare the results:

纯数学：
让我们计算rand()两种情况下函数调用的数量并比较结果：

Case 1:let's see the mathematical expectation of calls on step i = k, when you already have k numbers chosen. The probability to get a number with one rand()call is equal to p = (n-k)/n. We need to know the mathematical expectation of such calls quantity which leads to obtaining a number we don't have yet.

案例 1：i = k当您已经选择了 k 个数字时，让我们看看 step 调用的数学期望。一次rand()调用得到一个数字的概率等于p = (n-k)/n。我们需要知道这种调用数量的数学期望，这导致获得我们还没有的数字。

The probability to get it using 1call is p. Using 2calls - q * p, where q = 1 - p. In general case, the probability to get it exactly after ncalls is (q^(n-1))*p. Thus, the mathematical expectation is
Sum[ n * q^(n-1) * p ], n = 1 --> INF. This sum is equal to 1/p(proved by wolfram alpha).

使用1call获得它的概率是p。使用2调用 - q * p，其中q = 1 - p。在一般情况下，恰好在n调用后得到它的概率是(q^(n-1))*p。因此，数学期望为
Sum[ n * q^(n-1) * p ], n = 1 --> INF。这个总和等于1/p（由 wolfram alpha 证明）。

So, on the step i = kyou will perform 1/p = n/(n-k)calls of the rand()function.

因此，在该步骤中，i = k您将执行函数1/p = n/(n-k)调用rand()。

Now let's sum it overall:

现在让我们总结一下：

Sum[ n/(n - k) ], k = 0 --> m - 1 = n * T- the number of randcalls in method 1.
Here T = Sum[ 1/(n - k) ], k = 0 --> m - 1

Sum[ n/(n - k) ], k = 0 --> m - 1 = n * T-rand方法 1 中的调用次数。
这里T = Sum[ 1/(n - k) ], k = 0 --> m - 1

Case 2:

案例2：

Here rand()is called inside random_shufflen - 1times (in most implementations).

这里rand()称为内部random_shufflen - 1时间（在大多数实现中）。

Now, to choose the method, we have to compare these two values: n * T ? n - 1.
So, to choose the appropriate method, calculate Tas described above. If T < (n - 1)/nit's better to use the first method. Use the second method otherwise.

现在，选择的方法，我们有这两个值进行比较：n * T ? n - 1。
因此，要选择合适的方法，请按T上述方法计算。如果T < (n - 1)/n使用第一种方法更好。否则使用第二种方法。

Answer 2

回答by Mark Ransom

Check the Wikipedia description of the original Fisher-Yates algorithm. It advocates using essentially your method 1 for up to n/2, and your method 2 for the remainder.

检查原始Fisher-Yates 算法的维基百科描述。它提倡基本上使用您的方法 1 最多 n/2，而您的方法 2 用于其余部分。

Answer 3

回答by Nick Johnson

Here's an algorithm that will work in O(n) memory and O(n) time (where n is the number of returned results, not the size of the set you're selecting from) for any result set. It's in Python for convenience because it uses a hashtable:

这是一个算法，可以在 O(n) 内存和 O(n) 时间（其中 n 是返回结果的数量，而不是您从中选择的集合的大小）中为任何结果集工作。为方便起见，它使用 Python 编写，因为它使用哈希表：

def random_elements(num_elements, set_size):
    state = {}
    for i in range(num_elements):
        # Swap state[i] with a random element
        swap_with = random.randint(i, set_size - 1)
        state[i], state[swap_with] = state.get(swap_with, swap_with), state.get(i, i)
    return [state[i] for i in range(num_elements) # effectively state[:num_elements] if it were a list/array.

This is just a partial fisher-yates shuffle, with the array being shuffled implemented as a sparse hashtable - any element that is not present is equal to its index. We shuffle the first num_elementsindices, and return those values. In the case that set_size = 1,this is equivalent to picking a random number in the range, and in the case that num_elements = set_size, this is equivalent to a standard fisher-yates shuffle.

这只是一个部分的fisher-yates shuffle，数组被shuffle，实现为一个稀疏哈希表——任何不存在的元素都等于它的索引。我们打乱第一个num_elements索引，并返回这些值。在这种情况下，set_size = 1,这相当于在范围内选择一个随机数，在这种情况下num_elements = set_size，这相当于标准的 Fisher-yates shuffle。

It's trivial to observe that this is O(n) time, and because each iteration of the loop initializes at most two new indices in the hashtable, it's O(n) space, too.

观察到这是 O(n) 时间是微不足道的，并且因为循环的每次迭代最多初始化哈希表中的两个新索引，所以它也是 O(n) 空间。

Answer 4

回答by Dave S

Personally, I would use Method 1, and then if M > N/2, choose N-M values, and then invert the array (return the numbers that were not picked). So for example, if N is 1000 and you want 950 of them, chose 50 values using Method 1, and then return the other 950.

就我个人而言，我会使用方法 1，然后如果 M > N/2，则选择 NM 值，然后反转数组（返回未选择的数字）。例如，如果 N 为 1000，而您想要其中的 950 个，则使用方法 1 选择 50 个值，然后返回其他 950 个。

Edit: Though, if consistent performance is your goal, I would use a modified method 2, which doesn't do the full shuffle, but only shuffles the first M elements of your N length array.

编辑：虽然，如果一致的性能是你的目标，我会使用修改后的方法 2，它不会进行完全洗牌，而只会洗牌 N 长度数组的前 M 个元素。

int arr[n];
for(int i = 0; i < n; ++i)
    arr[i] = i;

for (int i =0; i < m; ++i) {
   int j = rand(n-i); // Pick random number from 0 <= r < n-i.  Pick favorite method
   // j == 0 means don't swap, otherwise swap with the element j away
   if (j != 0) { 
      std::swap(arr[i], arr[i+j]);
   }
}
result = first m elements in arr;

Answer 5

回答by Jacob Eggers

What about a third method?

第三种方法呢？

int result[m];
for(i = 0; i < m; ++i)
{
   int r;
   r = rand()%(n-i);
   r += (number of items in result <= r)
   result[i] = r;   
}

Editit should be <=. and it would actually additional logic to avoid collisions.

编辑它应该是<=。它实际上会附加逻辑来避免冲突。

This is better, an example using the Modern Methodfrom Fisher-Yates

这更好，一个使用来自 Fisher-Yates的现代方法的例子

//C++-ish pseudocode
int arr[n];
for(int i = 0; i < n; ++i)
    arr[i] = i;

for(i = 0; i < m; ++i)
    swap(arr, n-i, rand()%(n-i) );

result = last m elements in arr;

Answer 6

回答by Karoly Horvath

Talking about mathematical expectation, it's pretty useless but I will post it anyway :D

谈论数学期望，它很没用，但我还是会发布它：D

Shuffle is simple O(m).

Shuffle 很简单 O(m)。

Now the other algorithm is a bit more complex. The number of steps needed to generate the next number is the expected value of the number of trials, and the probability of the trial length is a geomtric distribution. So...

现在另一个算法有点复杂。生成下一个数字所需的步数是试验次数的期望值，试验长度的概率是几何分布。所以...

p=1          E[X1]=1            = 1           = 1
p=1-1/n      E[x2]=1/(1-1/n)    = 1 + 1/(n-1) = 1 + 1/(n-1) 
p=1-2/n      E[x3]=1/(1-1/n)    = 1 + 2/(n-2) = 1 + 1/(n-2) + 1/(n-2)
p=1-3/n      E[X4]=1/(1-2/n)    = 1 + 3/(n-3) = 1 + 1/(n-3) + 1/(n-3) + 1(n-3)
....
p=1-(m-1)/n) E[Xm]=1/(1-(m-1)/n))

Note that the sum can be split up into a triangle shape, see right hand side.

请注意，总和可以分成三角形，请参见右侧。

Let's use the formula for the harmonic series: H_n = Sum k=0->n (1/k) = approx ln(k)

让我们使用谐波级数的公式： H_n = Sum k=0->n (1/k) = approx ln(k)

Sum(E[Xk]) = m + ln(n-1)-ln(n-m-1) + ln(n-2)-ln(n-m-1) + ... = m + ln(n-1) + ln(n-2) + ... - (m-1)*ln(n-m-1) ..

And there is some forumla for the sum of harmonic series, if you are stillinterested I will look it up...

还有一些关于谐波级数总和的论坛，如果你还有兴趣我会去查...

Update: actually it's quite nice formula (thanks to the brilliant Concrete Mathematics book)

更新：实际上这是一个很好的公式（感谢精彩的混凝土数学书）

Sum(H_k) k=0->n = n*H_n - n

So the expected number of steps:

所以预期的步骤数：

Sum(E[Xk]) = m + (n-1)*ln(n-1) - (n-1) - (n-m-1)*ln(n-m-1) - (n-m-1)) - (m-1)*ln(n-m-1).

Note: I haven't verified it.

注意：我还没有验证过。

Answer 7

回答by biziclop

This is a bit of a long shot, but it could work, depending on your system.

这有点远，但它可以工作，具体取决于您的系统。

Start with some reasonable ratio, like 0.5.
When a request comes in, process it with whichever method you get from the current value of the threshold ratio.
Record the time it takes and when you have "empty" time, perform the same task with the other method.
If the alternative solution is much faster than the original one, adjust the threshold up or down.

从一些合理的比率开始，例如 0.5。
当请求进来时，使用您从阈值比率的当前值获得的任何方法来处理它。
记录花费的时间，当您有“空闲”时间时，用另一种方法执行相同的任务。
如果替代解决方案比原始解决方案快得多，请向上或向下调整阈值。

The obvious flaw in this method is that on highly variable load systems your "offline" test won't be too reliable.

这种方法的明显缺陷是，在高度可变的负载系统上，您的“离线”测试不会太可靠。

Answer 8

回答by Hani Shams

What about using setinstead of array, i think it is much easier than array

使用set而不是 array怎么样，我认为它比 array 容易得多

set<int> Numbers;
while (Numbers.size() < m) {
   Numbers.insert(rand() % n);
}

Answer 9

回答by Tomilov Anatoliy

There was suggested Fisher-Yates shuffle. Don't know if the next code generates equally distributed integers, but it is at least compact and one-pass:

有人建议Fisher-Yates shuffle。不知道下一段代码是否生成等分布的整数，但它至少是紧凑和单次传递的：

std::random_device rd;
std::mt19937 g(rd());
for (size_type i = 1; i < std::size(v); ++i) {
    v[i] = std::exchange(v[g() % i], i);
}

Answer 10

回答by Olufisayo Joseph Ayodele

I don't advise this method but it works

我不建议这种方法，但它有效

#include <iostream>
#include <random>
#include <ctime>

using namespace std;

int randArray[26];
int index = 0;

bool unique(int rand) {

    for (int i = 0; i < index; i++)
        if (rand == randArray[i])
            return false;
    index++;
    return true;
}


int main()
{
    srand(time(NULL));

    for (int i = 1; i < 26; i++)
        randArray[i] = -1;

    for (int i = 0; i < 26; i++) {

        randArray[i] = rand() % 26;

        while (!unique(randArray[i])) {
            randArray[i] = rand() % 26;
        }
    }

    for (int i = 0; i < 26; i++) {
        cout << randArray[i] << " ";
    }

    cout << "\n" << index << endl;


    return 0;
}

C++ 在 [0..n-1] 范围内生成 m 个不同的随机数

提问by Armen Tsirunyan

采纳答案by Grigor Gevorgyan

回答by Mark Ransom

回答by Nick Johnson

回答by Dave S

回答by Jacob Eggers

回答by Karoly Horvath

回答by biziclop

回答by Hani Shams

回答by Tomilov Anatoliy

回答by Olufisayo Joseph Ayodele

相关推荐

最近更新

标签

C++ 在 [0..n-1] 范围内生成 m 个不同的随机数

提问by Armen Tsirunyan

采纳答案by Grigor Gevorgyan

回答by Mark Ransom

回答by Nick Johnson

回答by Dave S

回答by Jacob Eggers

回答by Karoly Horvath

回答by biziclop

回答by Hani Shams

回答by Tomilov Anatoliy

回答by Olufisayo Joseph Ayodele

相关推荐

C++ '无法将参数'1'的'float'转换为'float*'

将字符串写入文件末尾（C++）

C++ 如何在 Qt Designer 中管理 QSplitter

是否有一个简单的 C 或 C++ 函数来计算字符串的 sha1 哈希？

相关推荐

最近更新

标签