C++ OpenMP 如何处理嵌套循环？

Question

提问by user0002128

Does the following code just parallelize the first (outer) loops, or it parallelize the entire nested loops?

以下代码只是并行化第一个（外部）循环，还是并行化整个嵌套循环？

    #pragma omp parallel for
    for (int i=0;i<N;i++)
    { 
      for (int j=0;j<M;j++)
      {
       //do task(i,j)//
      }
    }

I just want to make sure if the above code will parallelize the entire nested for-loops (thus one thread directly related task(i,j)), or it only parallelizes the outer for-loop (thus it ensures that, for each parrallel thread with loop index i, its inner loop will be done sequentially in a single thread, which is very import).

我只是想确定上面的代码是否会并行化整个嵌套的 for 循环（因此一个线程直接相关的任务（i，j）），或者它只并行化外部 for 循环（因此它确保，对于每个并行循环索引为 i 的线程，其内部循环将在单个线程中按顺序完成，这非常重要）。

Answer 1

回答by Massimiliano

The lines you have written will parallelize only the outer loop. To parallelize both you need to add a collapseclause:

您编写的行将仅并行化外循环。要并行化两者，您需要添加一个collapse子句：

#pragma omp parallel for collapse(2)
    for (int i=0;i<N;i++)
    { 
      for (int j=0;j<M;j++)
      {
       //do task(i,j)//
      }
    }

You may want to check OpenMP 3.1specifications (sec 2.5.1) for more details.

您可能需要查看OpenMP 3.1规范（第 2.5.1 节）以了解更多详细信息。

Answer 2

回答by Erangad

You will be able to better understand this with the following example. Let's do this with two threads.

通过以下示例，您将能够更好地理解这一点。让我们用两个线程来做这件事。

#pragma omp parallel for num_threads(2)
for(int i=0; i< 3; i++) {
    for (int j=0; j< 3; j++) {
        printf("i = %d, j= %d, threadId = %d \n", i, j, omp_get_thread_num());
    }
}

then the result will be,

那么结果将是，

i = 0, j= 0, threadId = 0 
i = 0, j= 1, threadId = 0 
i = 0, j= 2, threadId = 0 
i = 1, j= 0, threadId = 0 
i = 1, j= 1, threadId = 0 
i = 1, j= 2, threadId = 0 
i = 2, j= 0, threadId = 1 
i = 2, j= 1, threadId = 1 
i = 2, j= 2, threadId = 1

That means, when you add #pragma omp parallel for to the uppermost for loop, the index of that for loop is divided among the threads. As you can see, when index of i is same the thread id is also the same.

这意味着，当您将 #pragma omp parallel for 添加到最上面的 for 循环时，该 for 循环的索引在线程之间划分。如您所见，当 i 的索引相同时，线程 ID 也相同。

Instead of that, we can parallel the combinations that we have in a nested for loop. In this example we can have following combinations of i and j.

取而代之的是，我们可以并行嵌套 for 循环中的组合。在这个例子中，我们可以有以下 i 和 j 的组合。

i = 0, j= 0
i = 0, j= 1
i = 0, j= 2
i = 1, j= 0
i = 1, j= 1
i = 1, j= 2
i = 2, j= 0
i = 2, j= 1
i = 2, j= 2

In order to parallelize the code combination wise, we can add the collapse keyword as follows.

为了明智地并行化代码组合，我们可以添加collapse 关键字，如下所示。

#pragma omp parallel for num_threads(2) collapse(2)
for(int i=0; i< 3; i++) {
    for (int j=0; j< 3; j++) {
        printf("i = %d, j= %d, threadId = %d \n", i, j, omp_get_thread_num());
    }
}

then the result will be as follows.

那么结果如下。

i = 0, j= 0, threadId = 0 
i = 0, j= 1, threadId = 0 
i = 1, j= 2, threadId = 1 
i = 2, j= 0, threadId = 1 
i = 2, j= 1, threadId = 1 
i = 2, j= 2, threadId = 1 
i = 0, j= 2, threadId = 0 
i = 1, j= 0, threadId = 0 
i = 1, j= 1, threadId = 0

Then you can see that unlike before, for the same index i, there can be different thread ids ( when (i=1 and j=2 threadId=1) also (i=1 and j=0 threadId=0)). That means in this scenario, the combinations of i and j are divided among the threads.

然后你可以看到，与之前不同的是，对于相同的索引 i，可以有不同的线程 id（当 (i=1 and j=2 threadId=1) 也 (i=1 and j=0 threadId=0)）。这意味着在这种情况下，i 和 j 的组合在线程之间分配。

Answer 3

回答by hcarver

OpenMP only parallelizes the loop next to the pragma. You can parallelize the inner loop too if you want to but it won't be done automatically.

OpenMP 仅并行化编译指示旁边的循环。如果您愿意，您也可以并行化内部循环，但它不会自动完成。

C++ OpenMP 如何处理嵌套循环？

提问by user0002128

回答by Massimiliano

回答by Erangad

回答by hcarver

相关推荐

最近更新

标签

C++ OpenMP 如何处理嵌套循环？

提问by user0002128

回答by Massimiliano

回答by Erangad

回答by hcarver

相关推荐

C++ 我可以/应该在 GPU 上运行此代码吗？

如何清除 C++ 数组？

在 C++ 中将数组作为参数传递

C++ 错误 LNK2019：未解析的外部符号

相关推荐

最近更新

标签