C++ 2011:std::thread:并行化循环的简单示例?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10792157/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 14:28:16  来源:igfitidea点击:

C++ 2011 : std::thread : simple example to parallelize a loop?

c++multithreadingc++11

提问by Vincent

C++ 2011 includes very cool new features, but I can't find a lot of example to parallelize a for-loop. So my very naive question is : how do you parallelize a simple for loop (like using "omp parallel for") with std::thread ? (I search for an example).

C++ 2011 包含非常酷的新功能,但我找不到很多示例来并行化 for 循环。所以我非常天真的问题是:你如何将一个简单的 for 循环(比如使用“omp parallel for”)与 std::thread 并行化?(我搜索一个例子)。

Thank you very much.

非常感谢。

回答by Stephan Dollberg

std::threadis not necessarily meant to parallize loops. It is meant to be the lowlevel abstraction to build constructs like a parallel_for algorithm. If you want to parallize your loops, you should either wirte a parallel_for algorithm yourself or use existing libraires which offer task based parallism.

std::thread不一定意味着并行化循环。它旨在成为构建类似 parallel_for 算法的结构的低级抽象。如果你想并行化你的循环,你应该自己编写一个 parallel_for 算法或使用现有的提供基于任务的并行性的库。

The following example shows how you could parallize a simple loop but on the other side also shows the disadvantages, like the missing load-balancing and the complexity for a simple loop.

下面的例子展示了如何并行化一个简单的循环,但另一方面也展示了它的缺点,比如缺少负载平衡和简单循环的复杂性。

  typedef std::vector<int> container;
  typedef container::iterator iter;

  container v(100, 1);

  auto worker = [] (iter begin, iter end) {
    for(auto it = begin; it != end; ++it) {
      *it *= 2;
    }
  };


  // serial
  worker(std::begin(v), std::end(v));

  std::cout << std::accumulate(std::begin(v), std::end(v), 0) << std::endl; // 200

  // parallel
  std::vector<std::thread> threads(8);
  const int grainsize = v.size() / 8;

  auto work_iter = std::begin(v);
  for(auto it = std::begin(threads); it != std::end(threads) - 1; ++it) {
    *it = std::thread(worker, work_iter, work_iter + grainsize);
    work_iter += grainsize;
  }
  threads.back() = std::thread(worker, work_iter, std::end(v));

  for(auto&& i : threads) {
    i.join();
  }

  std::cout << std::accumulate(std::begin(v), std::end(v), 0) << std::endl; // 400

Using a library which offers a parallel_fortemplate, it can be simplified to

使用提供parallel_for模板的库,它可以简化为

parallel_for(std::begin(v), std::end(v), worker);

回答by paxdiablo

Can't provide a C++11 specific answer since we're still mostly using pthreads. But, as a language-agnostic answer, you parallelise something by setting it up to run in a separate function (the thread function).

无法提供特定于 C++11 的答案,因为我们仍然主要使用 pthread。但是,作为与语言无关的答案,您可以通过将其设置为在单独的函数(线程函数)中运行来并行化某些内容。

In other words, you have a function like:

换句话说,你有一个类似的功能:

def processArraySegment (threadData):
    arrayAddr = threadData->arrayAddr
    startIdx  = threadData->startIdx
    endIdx    = threadData->endIdx

    for i = startIdx to endIdx:
        doSomethingWith (arrayAddr[i])

    exitThread()

and, in your main code, you can process the array in two chunks:

并且,在您的主代码中,您可以分两块处理数组:

int xyzzy[100]

threadData->arrayAddr = xyzzy
threadData->startIdx  = 0
threadData->endIdx    = 49
threadData->done      = false
tid1 = startThread (processArraySegment, threadData)

// caveat coder: see below.
threadData->arrayAddr = xyzzy
threadData->startIdx  = 50
threadData->endIdx    = 99
threadData->done      = false
tid2 = startThread (processArraySegment, threadData)

waitForThreadExit (tid1)
waitForThreadExit (tid2)

(keeping in mind the caveat that you should ensure thread 1 has loaded the data into its local storage beforethe main thread starts modifying it for thread 2, possibly with a mutex or by using an arrayof structures, one per thread).

(请记住,在主线程开始为线程 2 修改数据之前,您应该确保线程 1 已将数据加载到其本地存储中,可能使用互斥锁或使用结构数组,每个线程一个)。

In other words, it's rarely a simple matter of just modifying a forloop so that it runs in parallel, though that would be nice, something like:

换句话说,修改for循环使其并行运行很少是一件简单的事情,尽管这很好,例如:

for {threads=10} ({i} = 0; {i} < ARR_SZ; {i}++)
    array[{i}] = array[{i}] + 1;

Instead, it requires a bit of rearranging your code to take advantage of threads.

相反,它需要对代码进行一些重新排列以利用线程。

And, of course, you have to ensure that it makes sense for the data to be processed in parallel. If you're setting each array element to the previous one plus 1, no amount of parallel processing will help, simply because you have to wait for the previous element to be modified first.

而且,当然,您必须确保并行处理数据是有意义的。如果您将每个数组元素设置为前一个加 1,那么任何并行处理都无济于事,因为您必须先等待前一个元素被修改。

This particular example above simply uses an argument passed to the thread function to specify which part of the array it should process. The thread function itself contains the loop to do the work.

上面的这个特定示例仅使用传递给线程函数的参数来指定它应该处理数组的哪个部分。线程函数本身包含执行工作的循环。

回答by Klaim

Well obviously it depends on what your loop does, how you choose to paralellize, and how you manage the threads lifetime.

很明显,这取决于你的循环做什么,你如何选择并行化,以及你如何管理线程生命周期。

I'm reading the book from the std C++11 threading library(that is also one of the boost.threadmaintainer and wrote Just Thread) and I can see that "it depends".

我正在从 std C++11 线程库(这也是boost.thread维护者之一并编写Just Thread)阅读这本书,我可以看到“这取决于”。

Now to give you an idea of basics using the new standard threading, I would recommand to read the book as it gives plenty of examples. Also, take a look at http://www.justsoftwaresolutions.co.uk/threading/and https://stackoverflow.com/questions/415994/boost-thread-tutorials

现在为了让您了解使用新标准线程的基础知识,我建议您阅读本书,因为它提供了大量示例。另外,看看http://www.justsoftwaresolutions.co.uk/threading/https://stackoverflow.com/questions/415994/boost-thread-tutorials

回答by Viktor Sehr

Using thisclass you can do it as:

使用这个类,你可以这样做:

Range based loop (read and write)
pforeach(auto &val, container) { 
  val = sin(val); 
};

Index based for-loop
auto new_container = container;
pfor(size_t i, 0, container.size()) { 
  new_container[i] = sin(container[i]); 
};

回答by huseyin tugrul buyukisik

Define macro using std::thread and lambda expression:

使用 std::thread 和 lambda 表达式定义宏:

#ifndef PARALLEL_FOR
#define PARALLEL_FOR(INT_LOOP_BEGIN_INCLUSIVE, INT_LOOP_END_EXCLUSIVE,I,O)          \                                                               \
    {                                                                               \
        int LOOP_LIMIT=INT_LOOP_END_EXCLUSIVE-INT_LOOP_BEGIN_INCLUSIVE;             \
        std::thread threads[LOOP_LIMIT]; auto fParallelLoop=[&](int I){ O; };       \
        for(int i=0; i<LOOP_LIMIT; i++)                                             \
        {                                                                           \
            threads[i]=std::thread(fParallelLoop,i+INT_LOOP_BEGIN_INCLUSIVE);       \
        }                                                                           \
        for(int i=0; i<LOOP_LIMIT; i++)                                             \
        {                                                                           \
            threads[i].join();                                                      \
        }                                                                           \
    }                                                                               \
#endif

usage:

用法:

int aaa=0;
PARALLEL_FOR(0,90,i,
{
    aaa+=i;
});

its ugly but it works.

它丑陋但有效。

回答by Jean-Micha?l Celerier

AFAIK the simplest way to parallelize a loop, if you are sure that there are no concurrent access possible, is by using OpenMP.

AFAIK 并行化循环的最简单方法,如果您确定没有可能的并发访问,则是使用 OpenMP。

It is supported by all major compilers except LLVM (as of August 2013).

除了 LLVM(截至 2013 年 8 月),所有主要编译器都支持它。

Example :

例子 :

for(int i = 0; i < n; ++i)
{
   tab[i] *= 2;
   tab2[i] /= 2;
   tab3[i] += tab[i] - tab2[i];
}

This would be parallelized very easily like this :

这将很容易并行化,如下所示:

#pragma omp parallel for
for(int i = 0; i < n; ++i)
{
   tab[i] *= 2;
   tab2[i] /= 2;
   tab3[i] += tab[i] - tab2[i];
}

However, be aware that this is only efficient with a big number of values.

但是,请注意,这仅适用于大量值。

If you use g++, another very C++11-ish way of doing would be using a lambda and a for_each, and use gnu parallel extensions (which can use OpenMP behind the scene) :

如果您使用 g++,另一种非常 C++11-ish 的做法是使用 lambda 和 for_each,并使用 gnu 并行扩展(可以在幕后使用 OpenMP):

__gnu_parallel::for_each(std::begin(tab), std::end(tab), [&] () 
{
    stuff_of_your_loop();
});

However, for_each is mainly thought for arrays, vectors, etc... But you can "cheat" it if you only want to iterate through a range by creating a Rangeclass with beginand endmethod which will mostly increment an int.

但是,for_each 主要用于数组、向量等......但是如果您只想通过创建一个主要增加 int的Rangebeginend方法来遍历一个范围,则可以“欺骗”它。

Note that for simple loops that do mathematical stuff, the algorithms in #include <numeric>and #include <algorithm>can all be parallelized with G++.

请注意,对于执行数学运算的简单循环,#include <numeric>和 中的算法#include <algorithm>都可以与 G++ 并行化。