C++ omp 有序子句如何工作?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13224155/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How does the omp ordered clause work?
提问by Mihai Neacsu
vector<int> v;
#pragma omp parallel for ordered schedule(dynamic, anyChunkSizeGreaterThan1)
for (int i = 0; i < n; ++i){
...
...
...
#pragma omp ordered
v.push_back(i);
}
This fills v
with an n
sized ordered list.
这填充v
了一个n
大小有序的列表。
When reaching the omp ordered
block all threads need to wait for the lowest iteration possible thread to finish, but what if none of the threads was appointed that specific iteration? Or does the OpenMP runtime library always make sure that the lowest iteration is handled by some thread?
当到达omp ordered
块时,所有线程都需要等待可能的最低迭代线程完成,但是如果没有线程被指定为特定迭代呢?还是 OpenMP 运行时库始终确保最低迭代由某个线程处理?
Also why is it suggested that ordered
clause be used along with the dynamic schedule
? Would static schedule
affect performance?
另外,为什么建议该ordered
子句与dynamic schedule
?一起使用?会static schedule
影响性能吗?
回答by Hristo Iliev
The ordered
clause works like this: different threads execute concurrently until they encounter the ordered
region, which is then executed sequentially in the same order as it would get executed in a serial loop. This still allows for some degree of concurrency, especially if the code section outside the ordered
region has substantial run time.
该ordered
子句的工作方式如下:不同的线程并发执行,直到遇到该ordered
区域,然后以与在串行循环中执行的顺序相同的顺序顺序执行该区域。这仍然允许一定程度的并发,特别是如果区域外的代码部分ordered
有大量运行时间。
There is no particular reason to use dynamic
schedule instead of static
schedule with small chunk size. It all depends on the structure of the code. Since ordered
introduces dependency between the threads, if used with schedule(static)
with default chunk size, the second thread would have to wait for the first one to finish all iterations, then the third thread would have to wait for the second one to finish its iterations (and hence for the first one too), and so on. One could easily visualise it with 3 threads and 9 iterations (3 per thread):
没有特别的理由使用dynamic
schedule 而不是static
带有小块大小的schedule。这一切都取决于代码的结构。由于ordered
引入了线程之间的依赖关系,如果schedule(static)
与默认块大小一起使用,则第二个线程将不得不等待第一个线程完成所有迭代,然后第三个线程将不得不等待第二个线程完成其迭代(因此第一个也是),依此类推。可以使用 3 个线程和 9 次迭代(每个线程 3 个)轻松地将其可视化:
tid List of Timeline
iterations
0 0,1,2 ==o==o==o
1 3,4,5 ==.......o==o==o
2 6,7,8 ==..............o==o==o
=
shows that the thread is executing code in parallel. o
is when the thread is executing the ordered
region. .
is the thread being idle, waiting for its turn to execute the ordered
region. With schedule(static,1)
the following would happen:
=
表明线程正在并行执行代码。o
是线程正在执行ordered
区域时。.
是线程空闲,等待轮到它执行该ordered
区域。随着schedule(static,1)
下面会发生:
tid List of Timeline
iterations
0 0,3,6 ==o==o==o
1 1,4,7 ==.o==o==o
2 2,5,8 ==..o==o==o
I believe the difference in both cases is more than obvious. With schedule(dynamic)
the pictures above would become more or less random as the list of iterations assigned to each thread is non-deterministic. It would also add an additional overhead. It is only useful if the amount of computation is different for each iteration and it takes much more time to do the computation than is the added overhead of using dynamic scheduling.
我相信这两种情况的区别非常明显。由于schedule(dynamic)
分配给每个线程的迭代列表是不确定的,因此上面的图片或多或少会变得随机。它还会增加额外的开销。仅当每次迭代的计算量不同时才有用,并且进行计算所需的时间比使用动态调度增加的开销要多得多。
Don't worry about the lowest numbered iteration. It is usually handled to the first thread in the team to become ready to execute code.
不要担心最低编号的迭代。它通常被处理到团队中的第一个线程以准备执行代码。