C++ openMP 嵌套并行 for 循环与内部并行 for
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10540760/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
openMP nested parallel for loops vs inner parallel for
提问by Scott Logan
If I use nested parallel for loops like this:
如果我使用嵌套并行 for 循环是这样的:
#pragma omp parallel for schedule(dynamic,1)
for (int x = 0; x < x_max; ++x) {
#pragma omp parallel for schedule(dynamic,1)
for (int y = 0; y < y_max; ++y) {
//parallelize this code here
}
//IMPORTANT: no code in here
}
is this equivalent to:
这相当于:
for (int x = 0; x < x_max; ++x) {
#pragma omp parallel for schedule(dynamic,1)
for (int y = 0; y < y_max; ++y) {
//parallelize this code here
}
//IMPORTANT: no code in here
}
Is the outer parallel for doing anything other than creating a new task?
除了创建新任务之外,外部并行是做什么的?
回答by Hristo Iliev
If your compiler supports OpenMP 3.0, you can use the collapse
clause:
如果您的编译器支持 OpenMP 3.0,您可以使用以下collapse
子句:
#pragma omp parallel for schedule(dynamic,1) collapse(2)
for (int x = 0; x < x_max; ++x) {
for (int y = 0; y < y_max; ++y) {
//parallelize this code here
}
//IMPORTANT: no code in here
}
If it doesn't (e.g. only OpenMP 2.5 is supported), there is a simple workaround:
如果没有(例如,仅支持 OpenMP 2.5),则有一个简单的解决方法:
#pragma omp parallel for schedule(dynamic,1)
for (int xy = 0; xy < x_max*y_max; ++xy) {
int x = xy / y_max;
int y = xy % y_max;
//parallelize this code here
}
You can enable nested parallelism with omp_set_nested(1);
and your nested omp parallel for
code will work but that might not be the best idea.
您可以启用嵌套并行性,omp_set_nested(1);
并且您的嵌套omp parallel for
代码将起作用,但这可能不是最好的主意。
By the way, why the dynamic scheduling? Is every loop iteration evaluated in non-constant time?
顺便说一下,为什么要动态调度?是否每次循环迭代都在非常数时间内进行评估?
回答by Walter
NO.
不。
The first #pragma omp parallel
will create a team of parallel threads and the second will then try to create for each of the original threads another team, i.e. a team of teams. However, on almost all existing implementations the second team has just only one thread: the second parallel region is essentially not used. Thus, your code is more like equivalent to
第一个#pragma omp parallel
将创建一组并行线程,然后第二个将尝试为每个原始线程创建另一个团队,即一组团队。然而,在几乎所有现有的实现中,第二个团队只有一个线程:第二个并行区域基本上没有使用。因此,您的代码更像是等效于
#pragma omp parallel for schedule(dynamic,1)
for (int x = 0; x < x_max; ++x) {
// only one x per thread
for (int y = 0; y < y_max; ++y) {
// code here: each thread loops all y
}
}
If you don't want that, but only parallelise the inner loop, you can do this:
如果您不希望那样,而只对内部循环进行并行化,则可以执行以下操作:
#pragma omp parallel
for (int x = 0; x < x_max; ++x) {
// each thread loops over all x
#pragma omp for schedule(dynamic,1)
for (int y = 0; y < y_max; ++y) {
// code here, only one y per thread
}
}