C语言理解openmp中的collapse子句

Question

提问by iomartin

I came across an OpenMP code that had the collapse clause, which was new to me. I'm trying to understand what it means, but I don't think I have fully grasped it's implications; One definition that I found is:

我遇到了一个带有崩溃子句的 OpenMP 代码，这对我来说是新的。我试图理解它的含义，但我认为我没有完全理解它的含义；我发现的一个定义是：

COLLAPSE: Specifies how many loops in a nested loop should be collapsed into one large iteration space and divided according to the schedule clause. The sequential execution of the iterations in all associated loops determines the order of the iterations in the collapsed iteration space.

COLLAPSE：指定嵌套循环中应将多少个循环折叠为一个大迭代空间并根据调度子句进行划分。所有相关循环中迭代的顺序执行决定了折叠迭代空间中的迭代顺序。

I thought I understood what that meant, so I tried the follwoing simple program:

我以为我明白这意味着什么，所以我尝试了以下简单程序：

int i, j;
#pragma omp parallel for num_threads(2) private(j)
for (i = 0; i < 4; i++)
    for (j = 0; j <= i; j++)
        printf("%d %d %d\n", i, j, omp_get_thread_num());

Which produced

其中产生

I then added the collapse(2)clause. I expected to have the same result in the first two columns but now have an equal number of 0's and 1's in the last column. But I got

然后我添加了该collapse(2)条款。我希望在前两列中得到相同的结果，但现在在最后一列中有相同数量的0's 和1's。但我得到了

So my questions are:

所以我的问题是：

What is happening in my code?
Under what circumstances should I use collapse?
Can you provide an example that shows the difference between using collapseand not using it?

我的代码中发生了什么？
我应该在什么情况下使用collapse？
你能提供一个例子来说明使用collapse和不使用它之间的区别吗？

Answer 1

回答by Z boson

The problem with your code is that the iterations of the inner loop depend on the outer loop. According to the OpenMP specification under the description of the section on binding and the collapseclause:

您的代码的问题在于内循环的迭代取决于外循环。根据 OpenMP 规范下关于绑定的部分和collapse条款的描述：

If execution of any associated loop changes any of the values used to compute any of the iteration counts, then the behavior is unspecified.

如果任何关联循环的执行更改了用于计算任何迭代计数的任何值，则行为是未指定的。

You can use collapse when this is not the case for example with a square loop

如果不是这种情况，您可以使用折叠，例如方形循环

#pragma omp parallel for private(j) collapse(2)
for (i = 0; i < 4; i++)
    for (j = 0; j < 100; j++)

In fact this is a good example to show when to use collapse. The outer loop only has four iterations. If you have more than four threads then some will be wasted. But when you collapse the threads will distribute among 400 iterations which is likely to be much greater than the number of threads. Another reason to use collapse is if the load is not well distributed. If you only used four iterations and the fourth iteration took most of the time the other threads wait. But if you use 400 iterations the load is likely to be better distributed.

事实上，这是一个很好的例子来展示何时使用折叠。外循环只有四次迭代。如果您有四个以上的线程，那么一些线程将被浪费。但是，当您折叠时，线程将分布在 400 次迭代中，这可能远大于线程数。使用折叠的另一个原因是负载分布不均。如果您只使用了四次迭代并且第四次迭代花费了其他线程等待的大部分时间。但是如果您使用 400 次迭代，负载可能会更好地分布。

You can fuse a loop by hand for the code above like this

您可以像这样手动为上面的代码融合一个循环

#pragma omp parallel for
for(int n=0; n<4*100; n++) {
    int i = n/100; int j=n%100;

Hereis an example showing how to fuse a triply fused loop by hand.

这是一个示例，展示了如何手动融合三重融合回路。

Finally, hereis an example showing how to fuse a triangular loop which collapseis not defined for.

最后，这是一个示例，展示了如何融合collapse未定义的三角形循环。

Here is a solution that maps a rectangular loop to the triangular loop in the OPs question. This can be used to fuse the OPs triangular loop.

这是一个将矩形循环映射到 OP 问题中的三角形循环的解决方案。这可用于融合 OP 三角环。

//int n = 4;
for(int k=0; k<n*(n+1)/2; k++) {
    int i = k/(n+1), j = k%(n+1);
    if(j>i) i = n - i -1, j = n - j;
    printf("(%d,%d)\n", i,j);
}

This works for any value of n.

这适用于任何 n 值。

The map for the OPs question goes from

OP问题的地图来自

(0,0),
(1,0), (1,1),
(2,0), (2,1), (2,2),
(3,0), (3,1), (3,2), (3,3),

to

到

(0,0), (3,3), (3,2), (3,1), (3,0),
(1,0), (1,1), (2,2), (2,1), (2,0),

For odd values of n the map is not exactly a rectangle but the formula still works.

对于 n 的奇数值，地图不完全是矩形，但公式仍然有效。

For example n = 3 gets mapped from

例如 n = 3 从

(0,0),
(1,0), (1,1),
(2,0), (2,1), (2,2),

to

到

(0,0), (2,2), (2,1), (2,0),
(1,0), (1,1),

Here is code to test this

这是测试这个的代码

#include <stdio.h>
int main(void) {
    int n = 4;
    for(int i=0; i<n; i++) {
        for(int j=0; j<=i; j++) {
            printf("(%d,%d)\n", i,j);
        }
    }
    puts("");
    for(int k=0; k<n*(n+1)/2; k++) {
        int i = k/(n+1), j = k%(n+1);
        if(j>i) i = n - i - 1, j = n - j;
        printf("(%d,%d)\n", i,j);
    }
}

Answer 2

回答by h2kyeong

If your purpose is balancing the load over increasing rows, assuming the workload for each item is regular or well scattered, then how about folding the row indices in half, and forgetting about the collapseclause?

如果您的目的是在增加的行上平衡负载，假设每个项目的工作负载是规则的或分散的，那么如何将行索引折叠成两半，而忘记collapse子句？

#pragma omp for
for (int iy0=0; iy0<n; ++iy0){
  int iy = iy0;
  if (iy0 >= n/2) iy = n-1 -iy0 +n/2;
  for (int ix=iy+1; ix<n; ++ix){
    work(ix, iy);
  }
}

C语言理解openmp中的collapse子句

提问by iomartin

回答by Z boson

回答by h2kyeong

相关推荐

最近更新

标签

C语言 理解openmp中的collapse子句

提问by iomartin

回答by Z boson

回答by h2kyeong

相关推荐

C语言 包含静态库中的头文件

C语言 Code::Blocks 13.12 错误 - CC1.exe 已停止工作

C语言 扫描到新行

C语言 错误：从类型“char *”分配给类型“char[25]”时类型不兼容

相关推荐

最近更新

标签

C语言理解openmp中的collapse子句

C语言包含静态库中的头文件

C语言扫描到新行

C语言错误：从类型“char *”分配给类型“char[25]”时类型不兼容