C语言 如何并行生成随机数?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4287531/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to generate random numbers in parallel?
提问by Tomek Tarczynski
I want to generate pseudorandom numbers in parallel using openMP, something like this:
我想使用 openMP 并行生成伪随机数,如下所示:
int i;
#pragma omp parallel for
for (i=0;i<100;i++)
{
printf("%d %d %d\n",i,omp_get_thread_num(),rand());
}
return 0;
I've tested it on windows and I got huge speedup, but each thread generated exactly the same numbers. I've tested it also on Linux and I got huge slowdown, parallel version on 8core processor was about 10 time slower than sequential, but each thread generated different numbers.
我已经在 Windows 上对其进行了测试,并且获得了巨大的加速,但是每个线程生成的数字完全相同。我也在 Linux 上对其进行了测试,但速度非常慢,8 核处理器上的并行版本比顺序慢了大约 10 倍,但每个线程生成的数字不同。
Is there any way to have both speedup and different numbers?
有没有办法同时拥有加速和不同的数字?
Edit 27.11.2010
I think I've solved it using an idea from Jonathan Dursi post. It seems that following code works fast on both linux and windows. Numbers are also pseudorandom. What do You think about it?
编辑 27.11.2010
我想我已经使用 Jonathan Dursi 帖子中的一个想法解决了它。似乎以下代码在 linux 和 windows 上都可以快速运行。数字也是伪随机的。你怎么看待这件事?
int seed[10];
int main(int argc, char **argv)
{
int i,s;
for (i=0;i<10;i++)
seed[i] = rand();
#pragma omp parallel private(s)
{
s = seed[omp_get_thread_num()];
#pragma omp for
for (i=0;i<1000;i++)
{
printf("%d %d %d\n",i,omp_get_thread_num(),s);
s=(s*17931+7391); // those numbers should be choosen more carefully
}
seed[omp_get_thread_num()] = s;
}
return 0;
}
PS.: I haven't accepted any answer yet, because I need to be sure that this idea is good.
PS.:我还没有接受任何答案,因为我需要确保这个想法是好的。
采纳答案by Jonathan Dursi
I'll post here what I posted to Concurrent random number generation:
我将在这里发布我发布到并发随机数生成的内容:
I think you're looking for rand_r(), which explicitly takes the current RNG state as a parameter. Then each thread should have it's own copy of seed data (whether you want each thread to start off with the same seed or different ones depends on what you're doing, here you want them to be different or you'd get the same row again and again). There's some discussion of rand_r() and thread-safety here: whether rand_r is real thread safe?.
我认为您正在寻找 rand_r(),它明确地将当前 RNG 状态作为参数。然后每个线程都应该有它自己的种子数据副本(您希望每个线程以相同的种子还是不同的种子开始取决于您在做什么,在这里您希望它们不同,否则您将获得相同的行一次又一次)。这里有一些关于 rand_r() 和线程安全的讨论:rand_r是否是真正的线程安全?.
So say you wanted each thread to have its seed start off with its thread number (which is probably not what you want, as it would give the same results every time you ran with the same number of threads, but just as an example):
因此,假设您希望每个线程的种子都以其线程编号开始(这可能不是您想要的,因为每次使用相同数量的线程运行时它都会给出相同的结果,但仅作为示例):
#pragma omp parallel default(none)
{
int i;
unsigned int myseed = omp_get_thread_num();
#pragma omp for
for(i=0; i<100; i++)
printf("%d %d %d\n",i,omp_get_thread_num(),rand_r(&myseed));
}
Edit: Just on a lark, checked to see if the above would get any speedup. Full code was
编辑:只是在云雀上,检查一下上面的内容是否会得到任何加速。完整代码是
#define NRANDS 1000000
int main(int argc, char **argv) {
struct timeval t;
int a[NRANDS];
tick(&t);
#pragma omp parallel default(none) shared(a)
{
int i;
unsigned int myseed = omp_get_thread_num();
#pragma omp for
for(i=0; i<NRANDS; i++)
a[i] = rand_r(&myseed);
}
double sum = 0.;
double time=tock(&t);
for (long int i=0; i<NRANDS; i++) {
sum += a[i];
}
printf("Time = %lf, sum = %lf\n", time, sum);
return 0;
}
where tick and tock are just wrappers to gettimeofday(), and tock() returns the difference in seconds. Sum is printed just to make sure that nothing gets optimized away, and to demonstrate a small point; you will get different numbers with different numbers of threads because each thread gets its own threadnum as a seed; if you run the same code again and again with the same number of threads you'll get the same sum, for the same reason. Anyway, timing (running on a 8-core nehalem box with no other users):
其中,tick 和 tock 只是 to 的包装器gettimeofday(),而 tock() 以秒为单位返回差值。打印 Sum 只是为了确保没有任何东西被优化掉,并展示一个小点;你会得到不同数量的线程的不同数字,因为每个线程都有自己的线程号作为种子;如果您使用相同数量的线程一次又一次地运行相同的代码,您将获得相同的总和,出于相同的原因。无论如何,计时(在没有其他用户的情况下在 8 核 nehalem 机器上运行):
$ export OMP_NUM_THREADS=1
$ ./rand
Time = 0.008639, sum = 1074808568711883.000000
$ export OMP_NUM_THREADS=2
$ ./rand
Time = 0.006274, sum = 1074093295878604.000000
$ export OMP_NUM_THREADS=4
$ ./rand
Time = 0.005335, sum = 1073422298606608.000000
$ export OMP_NUM_THREADS=8
$ ./rand
Time = 0.004163, sum = 1073971133482410.000000
So speedup, if not great; as @ruslik points out, this is not really a compute-intensive process, and other issues like memory bandwidth start playing a role. Thus, only a shade over 2x speedup on 8 cores.
所以加速,如果不是很好的话;正如@ruslik 指出的那样,这并不是一个真正的计算密集型过程,内存带宽等其他问题开始发挥作用。因此,在 8 核上只有超过 2 倍的加速。
回答by R.. GitHub STOP HELPING ICE
You cannot use the C rand()function from multiple threads; this results in undefined behavior. Some implementations might give you locking (which will make it slow); others might allow threads to clobber each other's state, possibly crashing your program or just giving "bad" random numbers.
不能rand()从多个线程使用 C函数;这会导致未定义的行为。某些实现可能会给您锁定(这会使其变慢);其他人可能允许线程破坏彼此的状态,可能会导致程序崩溃或只是给出“坏”随机数。
To solve the problem, either write your own PRNG implementation or use an existing one that allows the caller to store and pass the state to the PRNG iterator function.
要解决此问题,请编写您自己的 PRNG 实现或使用允许调用者存储状态并将状态传递给 PRNG 迭代器函数的现有实现。
回答by moinudin
Get each thread to set a different seed based on its thread id, e.g. srand(omp_get_thread_num() * 1000);
让每个线程根据其线程 id 设置不同的种子,例如srand(omp_get_thread_num() * 1000);
回答by Axel Gneiting
It seems like that randhas a global shared state between all threads on Linux and a thread local storage state for it on Windows. The shared state on Linux is causing your slowdowns because of the necessary synchronization.
它似乎rand在 Linux 上的所有线程之间具有全局共享状态,在 Windows 上具有线程本地存储状态。由于必要的同步,Linux 上的共享状态会导致您的速度变慢。
I don't think there is a portable way in the C library to use the RNG parallel on multiple threads, so you need another one. You could use a Mersenne Twister. As marcog said you need to initialize the seed for each thread differently.
我认为 C 库中没有一种可移植的方式在多个线程上使用 RNG 并行,因此您需要另一种方式。您可以使用Mersenne Twister。正如 marcog 所说,您需要以不同方式为每个线程初始化种子。
回答by Riko Jacob
On linux/unix you can use
在 linux/unix 上你可以使用
long jrand48(unsigned short xsubi[3]);
where xsubi[3] encodes the state of the random number generator, like this:
其中 xsubi[3] 对随机数生成器的状态进行编码,如下所示:
#include<stdio.h>
#include<stdlib.h>
#include <algorithm>
int main() {
unsigned short *xsub;
#pragma omp parallel private(xsub)
{
xsub = new unsigned short[3];
xsub[0]=xsub[1]=xsub[2]= 3+omp_get_thread_num();
int j;
#pragma omp for
for(j=0;j<10;j++)
printf("%d [%d] %ld\n", j, omp_get_thread_num(), jrand48(xsub));
}
}
compile with
编译
g++-mp-4.4 -Wall -Wextra -O2 -march=native -fopenmp -D_GLIBCXX_PARALLEL jrand.cc -o jrand
(replace g++-mp-4.4 with whatever you need to call g++ version 4.4 or 4.3) and you get
(将 g++-mp-4.4 替换为您需要调用 g++ 版本 4.4 或 4.3 的任何内容),您将得到
$ ./jrand
0 [0] 1344229389
1 [0] 1845350537
2 [0] 229759373
3 [0] 1219688060
4 [0] -553792943
5 [1] 360650087
6 [1] -404254894
7 [1] 1678400333
8 [1] 1373359290
9 [1] 171280263
i.e. 10 different pseudorandom numbers without any mutex locking or race conditions.
即 10 个不同的伪随机数,没有任何互斥锁或竞争条件。
回答by ruslik
Random numbers can be generated very fast,so usually the memory would be the bottleneck. By dividing this task between several threads you create additional communication and syncronization overheads (and sinchronization of caches of different cores is not cheap).
随机数的生成速度非常快,所以通常内存会成为瓶颈。通过在多个线程之间分配此任务,您会创建额外的通信和同步开销(并且不同内核的缓存同步并不便宜)。
It would be better to use a single thread with a better random()function.
使用random()功能更好的单线程会更好。

