如何构建 C++ 应用程序以使用多核处理器

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2166425/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 22:21:40  来源:igfitidea点击:

How to structure a C++ application to use a multicore processor

c++multicore

提问by Mr Bell

I am building an application that will do some object tracking from a video camera feed and use information from that to run a particle system in OpenGL. The code to process the video feed is somewhat slow, 200 - 300 milliseconds per frame right now. The system that this will be running on has a dual core processor. To maximize performance I want to offload the camera processing stuff to one processor and just communicate relevant data back to the main application as it is available, while leaving the main application kicking on the other processor.

我正在构建一个应用程序,该应用程序将从摄像机馈送进行一些对象跟踪,并使用其中的信息在OpenGL 中运行粒子系统。处理视频源的代码有点慢,现在每帧 200 - 300 毫秒。将在其上运行的系统具有双核处理器。为了最大限度地提高性能,我想将相机处理工作卸载到一个处理器,并在相关数据可用时将相关数据传送回主应用程序,同时让主应用程序在另一个处理器上运行。

What do I need to do to offload the camera work to the other processor and how do I handle communication with the main application?

我需要做什么才能将相机工作卸载到另一个处理器,以及如何处理与主应用程序的通信?

Edit: I am running Windows 7 64-bit.

编辑:我正在运行 Windows 7 64 位。

采纳答案by pestilence669

Basically, you need to multithread your application. Each thread of execution can only saturate one core. Separate threads tend to be run on separate cores. If you are insistent that each thread ALWAYS execute on a specific core, then each operating system has its own way of specifying this (affinity masks & such)... but I wouldn't recommend it.

基本上,您需要对应用程序进行多线程处理。每个执行线程只能使一个内核饱和。单独的线程往往在单独的内核上运行。如果您坚持每个线程总是在特定核心上执行,那么每个操作系统都有自己的指定方式(亲和掩码等)……但我不建议这样做。

OpenMP is great, but it's a tad fat in the ass, especially when joining back up from a parallelization. YMMV. It's easy to use, but not at all the best performing option. It also requires compiler support.

OpenMP 很棒,但它有点笨拙,尤其是在从并行化备份时。天啊。它易于使用,但根本不是性能最佳的选项。它还需要编译器支持。

If you're on Mac OS X 10.6 (Snow Leopard), you can use Grand Central Dispatch. It's interesting to read about, even if you don't use it, as its design implements some best practices. It also isn't optimal, but it's better than OpenMP, even though it also requires compiler support.

如果您使用的是 Mac OS X 10.6 (Snow Leopard),则可以使用Grand Central Dispatch。即使您不使用它,阅读它也很有趣,因为它的设计实现了一些最佳实践。它也不是最优的,但它比 OpenMP 好,即使它也需要编译器支持。

If you can wrap your head around breaking up your application into "tasks" or "jobs," you can shove these jobs down as many pipes as you have cores. Think of batching your processing as atomic units of work. If you can segment it properly, you can run your camera processing on both cores, and your main thread at the same time.

如果您可以将您的应用程序分解为“任务”或“作业”,那么您可以将这些作业推入与核心数量一样多的管道中。将批处理视为工作的原子单元。如果您可以正确分割它,您就可以同时在两个内核和主线程上运行相机处理。

If communication is minimized for each unit of work, then your need for mutexes and other locking primitives will be minimized. Course grained threading is much easier than fine grained. And, you can always use a library or framework to ease the burden. Consider Boost's Thread libraryif you take the manual approach. It provides portable wrappers and a nice abstraction.

如果每个工作单元的通信最小化,那么您对互斥锁和其他锁定原语的需求将最小化。粗粒度线程比细粒度线程容易得多。而且,您始终可以使用库或框架来减轻负担。如果您采用手动方法,请考虑Boost 的 Thread 库。它提供了可移植的包装器和一个很好的抽象。

回答by Eric

It depends on how many cores you have. If you have only 2 cores (cpu, processors, hyperthreads, you know what i mean), then OpenMP cannot give such a tremendous increase in performance, but will help. The maximum gain you can have is divide your time by the number of processors so it will still take 100 - 150 ms per frame.

这取决于您拥有多少个内核。如果您只有 2 个内核(cpu、处理器、超线程,您知道我的意思),那么 OpenMP 无法提供如此巨大的性能提升,但会有所帮助。您可以获得的最大收益是将您的时间除以处理器数量,因此每帧仍然需要 100 - 150 毫秒。

The equation is
parallel time = (([total time to perform a task] - [code that cannot be parallelized]) / [number of cpus]) + [code that cannot be parallelized]

等式是
并行时间 = (([执行任务的总时间] - [无法并行化的代码]) / [CPU 数量]) + [无法并行化的代码]

Basically, OpenMP rocks at parallel loops processing. Its rather easy to use

基本上,OpenMP 在并行循环处理中摇摆不定。它相当容易使用

#pragma omp parallel for
for (i = 0; i < N; i++)
    a[i] = 2 * i;

and bang, your for is parallelized. It does not work for every case, not every algorithm can be parallelized this way but many can be rewritten (hacked) to be compatible. The key principle is Single Instruction, Multiple Data (SIMD), applying the same convolution code to multiple pixels for example.

砰,你的 for 是并行化的。它并不适用于所有情况,并非每个算法都可以通过这种方式并行化,但许多算法可以重写(黑客)以兼容。关键原理是单指令多数据(SIMD),例如将相同的卷积码应用于多个像素。

But simply applying this cookbook receipe goes against the rules of optimization.
1-Benchmark your code
2-Find the REAL bottlenecks with "scientific" evidence (numbers) instead of simply guessing where you think there is a bottleneck
3-If it is really processing loops, then OpenMP is for you

但简单地应用这本食谱是违反优化规则的。
1-对您的代码进行基准测试
2-使用“科学”证据(数字)找到真正的瓶颈,而不是简单地猜测您认为存在瓶颈的位置
3-如果它真的是在处理循环,那么 OpenMP 适合您

Maybe simple optimizations on your existing code can give better results, who knows?

也许对现有代码进行简单的优化可以获得更好的结果,谁知道呢?

Another road would be to run opengl in a thread and data processing on another thread. This will help a lot if opengl or your particle rendering system takes a lot of power, but remember that threading can lead to other kind of synchronization bottlenecks.

另一种方法是在一个线程中运行 opengl 并在另一个线程上进行数据处理。如果 opengl 或您的粒子渲染系统需要大量功率,这将有很大帮助,但请记住,线程可能会导致其他类型的同步瓶颈。

回答by Anycorn

I would recommend against OpenMP, OpenMP is more for numerical codes rather than consumer/producer model that you seem to have.

我建议不要使用 OpenMP,OpenMP 更适用于数字代码,而不是您似乎拥有的消费者/生产者模型。

I think you can do something simple using boost threads to spawn worker thread, common segment of memory (for communication of acquired data), and some notification mechanism to tell on your data is available (look into boost thread interrupts).

我认为您可以使用 boost 线程来做一些简单的事情来生成工作线程、公共内存段(用于获取数据的通信)以及一些通知机制来告诉您的数据是否可用(查看 boost 线程中断)。

I do not know what kind of processing you do, but you may want to take a look at the Intel thread building blocks and Intel integrated primitives, they have several functions for video processing which may be faster (assuming they have your functionality)

我不知道你做什么样的处理,但你可能想看看英特尔线程构建块和英特尔集成原语,它们有几个视频处理功能可能会更快(假设它们有你的功能)

回答by Kornel Kisielewicz

You need some kind of framework for handling multicores. OpenMPseems a fairly simple choice.

您需要某种框架来处理多核。OpenMP似乎是一个相当简单的选择。

回答by blwy10

Like what Pestilence said, you just need your app to be multithreaded. Lots of frameworks like OpenMP have been mentioned, so here's another one:

就像 Pestilence 所说的,你只需要你的应用程序是多线程的。已经提到了很多像 OpenMP 这样的框架,所以这里是另一个:

Intel Thread Building Blocks

英特尔线程构建块

I've never used it before, but I hear great things about it.

我以前从未使用过它,但我听说它很棒。

Hope this helps!

希望这可以帮助!