C++ OpenMP 并行 For 循环 - std::vector 的替代方案
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18669296/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
C++ OpenMP Parallel For Loop - Alternatives to std::vector
提问by Armin Meisterhirn
Based on this thread, OpenMP and STL vector, which data structures are good alternatives for a sharedstd::vector in a parallel for loop? The main aspect is speed, and the vector might require resizing during the loop.
基于此线程OpenMP 和 STL vector,哪些数据结构是并行 for 循环中共享std::vector 的良好替代方案?主要方面是速度,向量可能需要在循环期间调整大小。
采纳答案by Vivian Miranda
The question you link was talking about the fact that "that STL vector container is not thread-safe in the situation where multiple threads write to a single container" . This is only true, as stated correctly there, if you call methods that can cause reallocation of the underlying array that std::vector
holds. push_back()
, pop_back()
and insert()
are examples of these dangerous methods.
您链接的问题是关于“在多个线程写入单个容器的情况下,STL 向量容器不是线程安全的”这一事实。仅当您调用可能导致重新分配std::vector
所持有的底层数组的方法时,正如那里正确所述,这才是正确的。push_back()
,pop_back()
并且insert()
是这些危险方法的示例。
If you need thread safe reallocation, then the library intel thread building blockoffers you concurrent vector containers. You should not use tbb::concurrent_vector in single thread programs because the time it takes to access random elements is higher than the time std::vector takes to do the same (which is O(1)). However, concurrent vector calls push_back()
, pop_back()
, insert()
in a thread safe way, even when reallocation happens.
如果您需要线程安全的重新分配,那么库英特尔线程构建块为您提供并发向量容器。不应在单线程程序中使用 tbb::concurrent_vector,因为访问随机元素所需的时间高于 std::vector 执行相同操作所需的时间(O(1))。然而,并发矢量电话push_back()
,pop_back()
,insert()
在一个线程安全的方式,甚至当再分配发生。
EDIT 1: The slides 46 and 47 of the following Intel presentationgive an illustrative example of concurrent reallocation using tbb::concurrent_vector
编辑 1:以下英特尔演示文稿的幻灯片 46 和 47给出了使用 tbb::concurrent_vector 进行并发重新分配的说明性示例
EDIT 2: By the way, if you start using Intel Tread Building Block (it is open source, it works with most compilers and it is much better integrated with C++/C++11 features than openmp), then you don't need to use openmp to create a parallel_for, Hereis a nice example of parallel_for using tbb.
编辑 2:顺便说一下,如果您开始使用 Intel Tread Building Block(它是开源的,它适用于大多数编译器,并且与 C++/C++11 功能的集成比 openmp 好得多),那么您不需要使用openmp创建一个parallel_for,这是一个很好的使用tbb的parallel_for示例。
回答by Z boson
I think you can use std::vector
with OpenMP most of the time and still have good performance. The following code for example fills std::vectors
in parallel and then combines them in the end. As long as your main loop/fill function is the bottleneck this should work well in general and be thread safe.
我认为您可以std::vector
在大多数情况下使用OpenMP 并且仍然具有良好的性能。例如下面的代码是std::vectors
并行填充,最后合并。只要您的主循环/填充功能是瓶颈,这通常应该可以很好地工作并且是线程安全的。
std::vector<int> vec;
#pragma omp parallel
{
std::vector<int> vec_private;
#pragma omp for nowait //fill vec_private in parallel
for(int i=0; i<100; i++) {
vec_private.push_back(i);
}
#pragma omp critical
vec.insert(vec.end(), vec_private.begin(), vec_private.end());
}
Edit:
编辑:
OpenMP 4.0 allows user-defined reductions using #pragma omp declare reduction
. The code above can be simplified with to
OpenMP 4.0 允许使用#pragma omp declare reduction
. 上面的代码可以简化为
#pragma omp declare reduction (merge : std::vector<int> : omp_out.insert(omp_out.end(), omp_in.begin(), omp_in.end()))
std::vector<int> vec;
#pragma omp parallel for reduction(merge: vec)
for(int i=0; i<100; i++) vec.push_back(i);
Edit: What I have shown so far does not fill the vector in order. If the order matters then this can be done like this
编辑:到目前为止我所展示的并没有按顺序填充矢量。如果订单很重要,那么可以这样做
std::vector<int> vec;
#pragma omp parallel
{
std::vector<int> vec_private;
#pragma omp for nowait schedule(static)
for(int i=0; i<N; i++) {
vec_private.push_back(i);
}
#pragma omp for schedule(static) ordered
for(int i=0; i<omp_get_num_threads(); i++) {
#pragma omp ordered
vec.insert(vec.end(), vec_private.begin(), vec_private.end());
}
}
This avoids saving a std::vector for each thread and then merging them in serial outside of the parallel region. I learned about this "trick" here. I'm not sure how to do this (or if it's even possible) for user-defined reductions.. It's not possible to do this with user-defined reductions.
这避免了为每个线程保存一个 std::vector,然后在并行区域之外串行合并它们。我在这里学到了这个“技巧” 。对于用户定义的减少,我不确定如何执行此操作(或者甚至可能)。. 使用用户定义的减少量是不可能做到这一点的。
I just realized that the critical section is not necessary which I figured out from this question parallel-cumulative-prefix-sums-in-openmp-communicating-values-between-thread. This method also gets the order correct as well
我刚刚意识到临界区不是必需的,我从这个问题parallel-cumulative-prefix-sums-in-openmp-communicating-values-between-thread 中发现了这一点。此方法也使顺序正确
std::vector<int> vec;
size_t *prefix;
#pragma omp parallel
{
int ithread = omp_get_thread_num();
int nthreads = omp_get_num_threads();
#pragma omp single
{
prefix = new size_t[nthreads+1];
prefix[0] = 0;
}
std::vector<int> vec_private;
#pragma omp for schedule(static) nowait
for(int i=0; i<100; i++) {
vec_private.push_back(i);
}
prefix[ithread+1] = vec_private.size();
#pragma omp barrier
#pragma omp single
{
for(int i=1; i<(nthreads+1); i++) prefix[i] += prefix[i-1];
vec.resize(vec.size() + prefix[nthreads]);
}
std::copy(vec_private.begin(), vec_private.end(), vec.begin() + prefix[ithread]);
}
delete[] prefix;