C++ 中的并行循环
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36246300/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Parallel Loops in C++
提问by Exagon
I wonder if there is a light, straight forward way to compute loops such as for and range based for loops in parallel in C++. How would you implement such a thing? From Scala I know the map, filter and foreach functions maybe it would also possible to perform these parallel? Is there an easy way to achieve this in C++. My primary plattform is Linux but it would be nice if it works cross-plattform.
我想知道是否有一种简单、直接的方法可以在 C++ 中并行计算循环,例如 for 和基于范围的 for 循环。你会如何实施这样的事情?从 Scala 我知道 map、filter 和 foreach 函数也许也可以并行执行这些?有没有一种简单的方法可以在 C++ 中实现这一点。我的主要平台是 Linux,但如果它可以跨平台工作,那就太好了。
回答by Exagon
With the parallel algorithms in C++17 we can now use:
使用 C++17 中的并行算法,我们现在可以使用:
std::vector<std::string> foo;
std::for_each(
std::execution::par_unseq,
foo.begin(),
foo.end(),
[](auto&& item)
{
//do stuff with item
});
to compute loops in parallel. The first parameter specifies the execution policy
并行计算循环。第一个参数指定执行策略
回答by Daniel Langr
What is your platform? You can look at OpenMP, though it's not a part of C++. But it is widely supported by compilers.
你的平台是什么?您可以查看OpenMP,尽管它不是 C++ 的一部分。但它得到了编译器的广泛支持。
As for range-based for loops, see, e.g., Using OpenMP with C++11 range-based for loops?.
至于基于范围的 for 循环,请参阅例如使用 OpenMP 和 C++11 基于范围的 for 循环?.
I've also seen few documents at http://www.open-std.orgthat indicate some efforts to incorporate parallel constructs/algorithms into future C++, but don't know what's their current status.
我还在http://www.open-std.org上看到了一些表明将并行构造/算法合并到未来 C++ 中的努力的文档,但不知道它们目前的状态。
UPDATE
更新
Just adding some exemplary code:
只需添加一些示例代码:
template <typename RAIter>
void loop_in_parallel(RAIter first, RAIter last) {
const size_t n = std::distance(first, last);
#pragma omp parallel for
for (size_t i = 0; i < n; i++) {
auto& elem = *(first + i);
// do whatever you want with elem
}
}
The number of threads can be set at runtime via the OMP_NUM_THREADS
environment variable.
线程数可以在运行时通过OMP_NUM_THREADS
环境变量设置。
回答by bobah
std::async
may be a good fit here, if you are happy to let the C++
runtime control the parallelism.
std::async
如果您愿意让C++
运行时控制并行度,可能很适合这里。
Example from the cppreference.com:
来自 cppreference.com 的示例:
#include <iostream>
#include <vector>
#include <algorithm>
#include <numeric>
#include <future>
template <typename RAIter>
int parallel_sum(RAIter beg, RAIter end)
{
auto len = end - beg;
if(len < 1000)
return std::accumulate(beg, end, 0);
RAIter mid = beg + len/2;
auto handle = std::async(std::launch::async,
parallel_sum<RAIter>, mid, end);
int sum = parallel_sum(beg, mid);
return sum + handle.get();
}
int main()
{
std::vector<int> v(10000, 1);
std::cout << "The sum is " << parallel_sum(v.begin(), v.end()) << '\n';
}
回答by arkan
With C++11 you can parallelize a for loop with only a few lines of codes. This splits a for loop into smaller chunks and assign each sub loop to a thread:
使用 C++11,您只需几行代码就可以并行化 for 循环。这将 for 循环拆分为更小的块并将每个子循环分配给一个线程:
/// Basically replacing:
void sequential_for(){
for(int i = 0; i < nb_elements; ++i)
computation(i);
}
/// By:
void threaded_for(){
parallel_for(nb_elements, [&](int start, int end){
for(int i = start; i < end; ++i)
computation(i);
} );
}
Or withing a class:
或与一个类:
struct My_obj {
/// Replacing:
void sequential_for(){
for(int i = 0; i < nb_elements; ++i)
computation(i);
}
/// By:
void threaded_for(){
parallel_for(nb_elements, [this](int s, int e){ this->process_chunk(s, e); } );
}
void process_chunk(int start, int end)
{
for(int i = start; i < end; ++i)
computation(i);
}
};
To do this, you only need to put the code below in a header file and use it at will:
为此,您只需要将下面的代码放在一个头文件中并随意使用即可:
#include <algorithm>
#include <thread>
#include <functional>
#include <vector>
/// @param[in] nb_elements : size of your for loop
/// @param[in] functor(start, end) :
/// your function processing a sub chunk of the for loop.
/// "start" is the first index to process (included) until the index "end"
/// (excluded)
/// @code
/// for(int i = start; i < end; ++i)
/// computation(i);
/// @endcode
/// @param use_threads : enable / disable threads.
///
///
static
void parallel_for(unsigned nb_elements,
std::function<void (int start, int end)> functor,
bool use_threads = true)
{
// -------
unsigned nb_threads_hint = std::thread::hardware_concurrency();
unsigned nb_threads = nb_threads_hint == 0 ? 8 : (nb_threads_hint);
unsigned batch_size = nb_elements / nb_threads;
unsigned batch_remainder = nb_elements % nb_threads;
std::vector< std::thread > my_threads(nb_threads);
if( use_threads )
{
// Multithread execution
for(unsigned i = 0; i < nb_threads; ++i)
{
int start = i * batch_size;
my_threads[i] = std::thread(functor, start, start+batch_size);
}
}
else
{
// Single thread execution (for easy debugging)
for(unsigned i = 0; i < nb_threads; ++i){
int start = i * batch_size;
functor( start, start+batch_size );
}
}
// Deform the elements left
int start = nb_threads * batch_size;
functor( start, start+batch_remainder);
// Wait for the other thread to finish their task
if( use_threads )
std::for_each(my_threads.begin(), my_threads.end(), std::mem_fn(&std::thread::join));
}
Lastly you could define a macro to get even more compact expression:
最后,您可以定义一个宏以获得更紧凑的表达式:
#define PARALLEL_FOR_BEGIN(nb_elements) tbx::parallel_for(nb_elements, [&](int start, int end){ for(int i = start; i < end; ++i)
#define PARALLEL_FOR_END()})
PARALLEL_FOR_BEGIN(nb_edges)
{
computation(i);
}PARALLEL_FOR_END();
回答by uSeemSurprised
This can be done using threads
specifically pthreads
library function that can be used to perform operations concurrently.
这可以使用可用于并发执行操作的threads
特定pthreads
库函数来完成。
You can read more about them here : http://www.tutorialspoint.com/cplusplus/cpp_multithreading.htm
您可以在此处阅读有关它们的更多信息:http: //www.tutorialspoint.com/cplusplus/cpp_multithreading.htm
std::thread can also be used : http://www.cplusplus.com/reference/thread/thread/
std::thread 也可以使用:http: //www.cplusplus.com/reference/thread/thread/
Below is a code in which i use the thread id of each thread to split the array into two halves :
下面是一个代码,其中我使用每个线程的线程 id 将数组分成两半:
#include <iostream>
#include <cstdlib>
#include <pthread.h>
using namespace std;
#define NUM_THREADS 2
int arr[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
void *splitLoop(void *threadid)
{
long tid;
tid = (long)threadid;
//cout << "Hello World! Thread ID, " << tid << endl;
int start = (tid * 5);
int end = start + 5;
for(int i = start;i < end;i++){
cout << arr[i] << " ";
}
cout << endl;
pthread_exit(NULL);
}
int main ()
{
pthread_t threads[NUM_THREADS];
int rc;
int i;
for( i=0; i < NUM_THREADS; i++ ){
cout << "main() : creating thread, " << i << endl;
rc = pthread_create(&threads[i], NULL,
splitLoop, (void *)i);
if (rc){
cout << "Error:unable to create thread," << rc << endl;
exit(-1);
}
}
pthread_exit(NULL);
}
Also remember while compiling you have to use the -lpthread
flag.
还要记住,在编译时必须使用-lpthread
标志。
Link to solution on Ideone : http://ideone.com/KcsW4P
Ideone 解决方案链接:http://ideone.com/KcsW4P
回答by doctorlai
The Concurrency::parallel_for (PPL) is also one of the nice opions to do task parallelism.
Concurrency::parallel_for (PPL) 也是实现任务并行的不错选择之一。
Taken from C++ Coding Exercise – Parallel For – Monte Carlo PI Calculation
取自C++ 编码练习 – Parallel For – Monte Carlo PI Calculation
int main() {
srand(time(NULL)); // seed
const int N1 = 1000;
const int N2 = 100000;
int n = 0;
int c = 0;
Concurrency::critical_section cs;
// it is better that N2 >> N1 for better performance
Concurrency::parallel_for(0, N1, [&](int i) {
int t = monte_carlo_count_pi(N2);
cs.lock(); // race condition
n += N2; // total sampling points
c += t; // points fall in the circle
cs.unlock();
});
cout < < "pi ~= " << setprecision(9) << (double)c / n * 4.0 << endl;
return 0;
}
回答by Adam
As this thread has been my answer almost everytime i've looked for a method to paralleize something, i've decided to add a bit to it. based on the method by arkan(see above).
由于几乎每次我都在寻找一种方法来并行化某些东西时,这个线程一直是我的答案,因此我决定为其添加一些内容。基于 arkan 的方法(见上文)。
The two next methods are almost the same and allow a simple syntax. Simply include the header file in your project and call one of the parallel version:
接下来的两个方法几乎相同,并且允许使用简单的语法。只需在您的项目中包含头文件并调用并行版本之一:
example:
例子:
#include "par_for.h"
int main() {
//replace -
for(unsigned i = 0; i < 10; ++i){
std::cout << i << std::endl;
}
//with -
//method 1:
pl::thread_par_for(0, 10, [&](unsigned i){
std::cout << i << std::endl; //do something here with the index i
}); //changing the end to },false); will make the loop sequantial
//or method 2:
pl::async_par_for(0, 10, [&](unsigned i){
std::cout << i << std::endl; //do something here with the index i
}); //changing the end to },false); will make the loop sequantial
return 0;
}
header file - par_for.h:
头文件 - par_for.h:
#include <thread>
#include <vector>
#include <functional>
#include <future>
using namespace std;
namespace pl{
void thread_par_for(unsigned start, unsigned end, function<void(unsigned i)> fn, bool par = true){
//internal loop
auto int_fn = [&fn](unsigned int_start, unsigned seg_size){
for (unsigned j = int_start; j < int_start+seg_size; j++){
fn(j);
}
};
//sequenced for
if(!par){
return int_fn(start, end);
}
//get number of threads
unsigned nb_threads_hint = thread::hardware_concurrency();
unsigned nb_threads = nb_threads_hint == 0 ? 8 : (nb_threads_hint);
//calculate segments
unsigned total_length = end - start;
unsigned seg = total_length/nb_threads;
unsigned last_seg = seg + total_length%nb_threads;
//launch threads - parallel for
auto threads_vec = vector<thread>();
threads_vec.reserve(nb_threads);
for(int k = 0; k < nb_threads-1; ++k){
unsigned current_start = seg*k;
threads_vec.emplace_back(thread(int_fn, current_start, seg));
}
{
unsigned current_start = seg*(nb_threads-1);
threads_vec.emplace_back(thread(int_fn, current_start, last_seg));
}
for (auto& th : threads_vec){
th.join();
}
}
void async_par_for(unsigned start, unsigned end, function<void(unsigned i)> fn, bool par = true){
//internal loop
auto int_fn = [&fn](unsigned int_start, unsigned seg_size){
for (unsigned j = int_start; j < int_start+seg_size; j++){
fn(j);
}
};
//sequenced for
if(!par){
return int_fn(start, end);
}
//get number of threads
unsigned nb_threads_hint = thread::hardware_concurrency();
unsigned nb_threads = nb_threads_hint == 0 ? 8 : (nb_threads_hint);
//calculate segments
unsigned total_length = end - start;
unsigned seg = total_length/nb_threads;
unsigned last_seg = seg + total_length%nb_threads;
//launch threads - parallel for
auto fut_vec = vector<future<void>>();
fut_vec.reserve(nb_threads);
for(int k = 0; k < nb_threads-1; ++k){
unsigned current_start = seg*k;
fut_vec.emplace_back(async(int_fn, current_start, seg));
}
{
unsigned current_start = seg*(nb_threads-1);
fut_vec.emplace_back(async(launch::async, int_fn, current_start, last_seg));
}
for (auto& th : fut_vec){
th.get();
}
}
}
Some simple tests suggest the method with async is faster, probably because the standrad library controlls whether to actually launch a new thread or not.
一些简单的测试表明使用 async 的方法更快,可能是因为标准库控制是否实际启动一个新线程。