C++ OpenMP 并行 For 循环 - std::vector 的替代方案

Question

提问by Armin Meisterhirn

Based on this thread, OpenMP and STL vector, which data structures are good alternatives for a sharedstd::vector in a parallel for loop? The main aspect is speed, and the vector might require resizing during the loop.

基于此线程OpenMP 和 STL vector，哪些数据结构是并行 for 循环中共享std::vector 的良好替代方案？主要方面是速度，向量可能需要在循环期间调整大小。

Answer 1

采纳答案by Vivian Miranda

The question you link was talking about the fact that "that STL vector container is not thread-safe in the situation where multiple threads write to a single container" . This is only true, as stated correctly there, if you call methods that can cause reallocation of the underlying array that std::vectorholds. push_back(), pop_back()and insert()are examples of these dangerous methods.

您链接的问题是关于“在多个线程写入单个容器的情况下，STL 向量容器不是线程安全的”这一事实。仅当您调用可能导致重新分配std::vector所持有的底层数组的方法时，正如那里正确所述，这才是正确的。push_back()，pop_back()并且insert()是这些危险方法的示例。

If you need thread safe reallocation, then the library intel thread building blockoffers you concurrent vector containers. You should not use tbb::concurrent_vector in single thread programs because the time it takes to access random elements is higher than the time std::vector takes to do the same (which is O(1)). However, concurrent vector calls push_back(), pop_back(), insert()in a thread safe way, even when reallocation happens.

如果您需要线程安全的重新分配，那么库英特尔线程构建块为您提供并发向量容器。不应在单线程程序中使用 tbb::concurrent_vector，因为访问随机元素所需的时间高于 std::vector 执行相同操作所需的时间（O(1)）。然而，并发矢量电话push_back()，pop_back()，insert()在一个线程安全的方式，甚至当再分配发生。

EDIT 1: The slides 46 and 47 of the following Intel presentationgive an illustrative example of concurrent reallocation using tbb::concurrent_vector

编辑 1：以下英特尔演示文稿的幻灯片 46 和 47给出了使用 tbb::concurrent_vector 进行并发重新分配的说明性示例

EDIT 2: By the way, if you start using Intel Tread Building Block (it is open source, it works with most compilers and it is much better integrated with C++/C++11 features than openmp), then you don't need to use openmp to create a parallel_for, Hereis a nice example of parallel_for using tbb.

编辑 2：顺便说一下，如果您开始使用 Intel Tread Building Block（它是开源的，它适用于大多数编译器，并且与 C++/C++11 功能的集成比 openmp 好得多），那么您不需要使用openmp创建一个parallel_for，这是一个很好的使用tbb的parallel_for示例。

Answer 2

回答by Z boson

I think you can use std::vectorwith OpenMP most of the time and still have good performance. The following code for example fills std::vectorsin parallel and then combines them in the end. As long as your main loop/fill function is the bottleneck this should work well in general and be thread safe.

我认为您可以std::vector在大多数情况下使用OpenMP 并且仍然具有良好的性能。例如下面的代码是std::vectors并行填充，最后合并。只要您的主循环/填充功能是瓶颈，这通常应该可以很好地工作并且是线程安全的。

std::vector<int> vec;
#pragma omp parallel
{
    std::vector<int> vec_private;
    #pragma omp for nowait //fill vec_private in parallel
    for(int i=0; i<100; i++) {
        vec_private.push_back(i);
    }
    #pragma omp critical
    vec.insert(vec.end(), vec_private.begin(), vec_private.end());
}

Edit:

编辑：

OpenMP 4.0 allows user-defined reductions using #pragma omp declare reduction. The code above can be simplified with to

OpenMP 4.0 允许使用#pragma omp declare reduction. 上面的代码可以简化为

#pragma omp declare reduction (merge : std::vector<int> : omp_out.insert(omp_out.end(), omp_in.begin(), omp_in.end()))

std::vector<int> vec;
#pragma omp parallel for reduction(merge: vec)
for(int i=0; i<100; i++) vec.push_back(i);

Edit: What I have shown so far does not fill the vector in order. If the order matters then this can be done like this

编辑：到目前为止我所展示的并没有按顺序填充矢量。如果订单很重要，那么可以这样做

std::vector<int> vec;
#pragma omp parallel
{
    std::vector<int> vec_private;
    #pragma omp for nowait schedule(static)
    for(int i=0; i<N; i++) { 
        vec_private.push_back(i);
    }
    #pragma omp for schedule(static) ordered
    for(int i=0; i<omp_get_num_threads(); i++) {
        #pragma omp ordered
        vec.insert(vec.end(), vec_private.begin(), vec_private.end());
    }
}

This avoids saving a std::vector for each thread and then merging them in serial outside of the parallel region. I learned about this "trick" here. ~~I'm not sure how to do this (or if it's even possible) for user-defined reductions.~~. It's not possible to do this with user-defined reductions.

这避免了为每个线程保存一个 std::vector，然后在并行区域之外串行合并它们。我在这里学到了这个“技巧” 。~~对于用户定义的减少，我不确定如何执行此操作（或者甚至可能）。~~. 使用用户定义的减少量是不可能做到这一点的。

I just realized that the critical section is not necessary which I figured out from this question parallel-cumulative-prefix-sums-in-openmp-communicating-values-between-thread. This method also gets the order correct as well

我刚刚意识到临界区不是必需的，我从这个问题parallel-cumulative-prefix-sums-in-openmp-communicating-values-between-thread 中发现了这一点。此方法也使顺序正确

std::vector<int> vec;
size_t *prefix;
#pragma omp parallel
{
    int ithread  = omp_get_thread_num();
    int nthreads = omp_get_num_threads();
    #pragma omp single
    {
        prefix = new size_t[nthreads+1];
        prefix[0] = 0;
    }
    std::vector<int> vec_private;
    #pragma omp for schedule(static) nowait
    for(int i=0; i<100; i++) {
        vec_private.push_back(i);
    }
    prefix[ithread+1] = vec_private.size();
    #pragma omp barrier
    #pragma omp single 
    {
        for(int i=1; i<(nthreads+1); i++) prefix[i] += prefix[i-1];
        vec.resize(vec.size() + prefix[nthreads]);
    }
    std::copy(vec_private.begin(), vec_private.end(), vec.begin() + prefix[ithread]);
}
delete[] prefix;

C++ OpenMP 并行 For 循环 - std::vector 的替代方案

提问by Armin Meisterhirn

采纳答案by Vivian Miranda

回答by Z boson

相关推荐

最近更新

标签

C++ OpenMP 并行 For 循环 - std::vector 的替代方案

提问by Armin Meisterhirn

采纳答案by Vivian Miranda

回答by Z boson

相关推荐

C++ sockaddr、sockaddr_in 和 sockaddr_in6 有什么区别？

C++ 链接：警告 LNK4098：defaultlib 'MSVCRT' 与其他库的使用冲突；使用 /NODEFAULTLIB:library

C++ 带实现的纯虚函数

C++ 错误 C2059：语法错误“常量”

相关推荐

最近更新

标签