C++ OpenMP 如何处理嵌套循环?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13357065/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 17:13:50  来源:igfitidea点击:

How does OpenMP handle nested loops?

c++loopsparallel-processingopenmp

提问by user0002128

Does the following code just parallelize the first (outer) loops, or it parallelize the entire nested loops?

以下代码只是并行化第一个(外部)循环,还是并行化整个嵌套循环?

    #pragma omp parallel for
    for (int i=0;i<N;i++)
    { 
      for (int j=0;j<M;j++)
      {
       //do task(i,j)//
      }
    }

I just want to make sure if the above code will parallelize the entire nested for-loops (thus one thread directly related task(i,j)), or it only parallelizes the outer for-loop (thus it ensures that, for each parrallel thread with loop index i, its inner loop will be done sequentially in a single thread, which is very import).

我只是想确定上面的代码是否会并行化整个嵌套的 for 循环(因此一个线程直接相关的任务(i,j)),或者它只并行化外部 for 循环(因此它确保,对于每个并行循环索引为 i 的线程,其内部循环将在单个线程中按顺序完成,这非常重要)。

回答by Massimiliano

The lines you have written will parallelize only the outer loop. To parallelize both you need to add a collapseclause:

您编写的行将仅并行化外循环。要并行化两者,您需要添加一个collapse子句:

#pragma omp parallel for collapse(2)
    for (int i=0;i<N;i++)
    { 
      for (int j=0;j<M;j++)
      {
       //do task(i,j)//
      }
    }

You may want to check OpenMP 3.1specifications (sec 2.5.1) for more details.

您可能需要查看OpenMP 3.1规范(第 2.5.1 节)以了解更多详细信息。

回答by Erangad

You will be able to better understand this with the following example. Let's do this with two threads.

通过以下示例,您将能够更好地理解这一点。让我们用两个线程来做这件事。

#pragma omp parallel for num_threads(2)
for(int i=0; i< 3; i++) {
    for (int j=0; j< 3; j++) {
        printf("i = %d, j= %d, threadId = %d \n", i, j, omp_get_thread_num());
    }
}

then the result will be,

那么结果将是,

i = 0, j= 0, threadId = 0 
i = 0, j= 1, threadId = 0 
i = 0, j= 2, threadId = 0 
i = 1, j= 0, threadId = 0 
i = 1, j= 1, threadId = 0 
i = 1, j= 2, threadId = 0 
i = 2, j= 0, threadId = 1 
i = 2, j= 1, threadId = 1 
i = 2, j= 2, threadId = 1

That means, when you add #pragma omp parallel for to the uppermost for loop, the index of that for loop is divided among the threads. As you can see, when index of i is same the thread id is also the same.

这意味着,当您将 #pragma omp parallel for 添加到最上面的 for 循环时,该 for 循环的索引在线程之间划分。如您所见,当 i 的索引相同时,线程 ID 也相同。

Instead of that, we can parallel the combinations that we have in a nested for loop. In this example we can have following combinations of i and j.

取而代之的是,我们可以并行嵌套 for 循环中的组合。在这个例子中,我们可以有以下 i 和 j 的组合。

i = 0, j= 0
i = 0, j= 1
i = 0, j= 2
i = 1, j= 0
i = 1, j= 1
i = 1, j= 2
i = 2, j= 0
i = 2, j= 1
i = 2, j= 2

In order to parallelize the code combination wise, we can add the collapse keyword as follows.

为了明智地并行化代码组合,我们可以添加collapse 关键字,如下所示。

#pragma omp parallel for num_threads(2) collapse(2)
for(int i=0; i< 3; i++) {
    for (int j=0; j< 3; j++) {
        printf("i = %d, j= %d, threadId = %d \n", i, j, omp_get_thread_num());
    }
}

then the result will be as follows.

那么结果如下。

i = 0, j= 0, threadId = 0 
i = 0, j= 1, threadId = 0 
i = 1, j= 2, threadId = 1 
i = 2, j= 0, threadId = 1 
i = 2, j= 1, threadId = 1 
i = 2, j= 2, threadId = 1 
i = 0, j= 2, threadId = 0 
i = 1, j= 0, threadId = 0 
i = 1, j= 1, threadId = 0 

Then you can see that unlike before, for the same index i, there can be different thread ids ( when (i=1 and j=2 threadId=1) also (i=1 and j=0 threadId=0)). That means in this scenario, the combinations of i and j are divided among the threads.

然后你可以看到,与之前不同的是,对于相同的索引 i,可以有不同的线程 id(当 (i=1 and j=2 threadId=1) 也 (i=1 and j=0 threadId=0))。这意味着在这种情况下,i 和 j 的组合在线程之间分配。

回答by hcarver

OpenMP only parallelizes the loop next to the pragma. You can parallelize the inner loop too if you want to but it won't be done automatically.

OpenMP 仅并行化编译指示旁边的循环。如果您愿意,您也可以并行化内部循环,但它不会自动完成。