C++ omp 有序子句如何工作?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13224155/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 17:06:29  来源:igfitidea点击:

How does the omp ordered clause work?

c++parallel-processingopenmp

提问by Mihai Neacsu

vector<int> v;

#pragma omp parallel for ordered schedule(dynamic, anyChunkSizeGreaterThan1)
    for (int i = 0; i < n; ++i){
            ...
            ...
            ...
#pragma omp ordered
            v.push_back(i);
    }

This fills vwith an nsized ordered list.

这填充v了一个n大小有序的列表。

When reaching the omp orderedblock all threads need to wait for the lowest iteration possible thread to finish, but what if none of the threads was appointed that specific iteration? Or does the OpenMP runtime library always make sure that the lowest iteration is handled by some thread?

当到达omp ordered块时,所有线程都需要等待可能的最低迭代线程完成,但是如果没有线程被指定为特定迭代呢?还是 OpenMP 运行时库始终确保最低迭代由某个线程处理?

Also why is it suggested that orderedclause be used along with the dynamic schedule? Would static scheduleaffect performance?

另外,为什么建议该ordered子句与dynamic schedule?一起使用?会static schedule影响性能吗?

回答by Hristo Iliev

The orderedclause works like this: different threads execute concurrently until they encounter the orderedregion, which is then executed sequentially in the same order as it would get executed in a serial loop. This still allows for some degree of concurrency, especially if the code section outside the orderedregion has substantial run time.

ordered子句的工作方式如下:不同的线程并发执行,直到遇到该ordered区域,然后以与在串行循环中执行的顺序相同的顺序顺序执行该区域。这仍然允许一定程度的并发,特别是如果区域外的代码部分ordered有大量运行时间。

There is no particular reason to use dynamicschedule instead of staticschedule with small chunk size. It all depends on the structure of the code. Since orderedintroduces dependency between the threads, if used with schedule(static)with default chunk size, the second thread would have to wait for the first one to finish all iterations, then the third thread would have to wait for the second one to finish its iterations (and hence for the first one too), and so on. One could easily visualise it with 3 threads and 9 iterations (3 per thread):

没有特别的理由使用dynamicschedule 而不是static带有小块大小的schedule。这一切都取决于代码的结构。由于ordered引入了线程之间的依赖关系,如果schedule(static)与默认块大小一起使用,则第二个线程将不得不等待第一个线程完成所有迭代,然后第三个线程将不得不等待第二个线程完成其迭代(因此第一个也是),依此类推。可以使用 3 个线程和 9 次迭代(每个线程 3 个)轻松地将其可视化:

tid  List of     Timeline
     iterations
0    0,1,2       ==o==o==o
1    3,4,5       ==.......o==o==o
2    6,7,8       ==..............o==o==o

=shows that the thread is executing code in parallel. ois when the thread is executing the orderedregion. .is the thread being idle, waiting for its turn to execute the orderedregion. With schedule(static,1)the following would happen:

=表明线程正在并行执行代码。o是线程正在执行ordered区域时。.是线程空闲,等待轮到它执行该ordered区域。随着schedule(static,1)下面会发生:

tid  List of     Timeline
     iterations
0    0,3,6       ==o==o==o
1    1,4,7       ==.o==o==o
2    2,5,8       ==..o==o==o

I believe the difference in both cases is more than obvious. With schedule(dynamic)the pictures above would become more or less random as the list of iterations assigned to each thread is non-deterministic. It would also add an additional overhead. It is only useful if the amount of computation is different for each iteration and it takes much more time to do the computation than is the added overhead of using dynamic scheduling.

我相信这两种情况的区别非常明显。由于schedule(dynamic)分配给每个线程的迭代列表是不确定的,因此上面的图片或多或少会变得随机。它还会增加额外的开销。仅当每次迭代的计算量不同时才有用,并且进行计算所需的时间比使用动态调度增加的开销要多得多。

Don't worry about the lowest numbered iteration. It is usually handled to the first thread in the team to become ready to execute code.

不要担心最低编号的迭代。它通常被处理到团队中的第一个线程以准备执行代码。