C++11 中的 async(launch::async) 是否使线程池过时以避免昂贵的线程创建？

Question

提问by Philipp Cla?en

It is loosely related to this question: Are std::thread pooled in C++11?. Though the question differs, the intention is the same:

它与这个问题松散相关：Are std::thread pooled in C++11? . 尽管问题不同，但意图是相同的：

Question 1: Does it still make sense to use your own (or 3rd-party library) thread pools to avoid expensive thread creation?

问题 1：使用您自己的（或第 3 方库）线程池来避免昂贵的线程创建仍然有意义吗？

The conclusion in the other question was that you cannot rely on std::threadto be pooled (it might or it might be not). However, std::async(launch::async)seems to have a much higher chance to be pooled.

另一个问题的结论是，你不能依赖于std::thread被合并（可能会也可能不会）。然而，std::async(launch::async)似乎有更高的机会被汇集。

It don't think that it is forced by the standard, but IMHO I would expect that all good C++11 implementations would use thread pooling if thread creation is slow. Only on platforms where it is inexpensive to create a new thread, I would expect that they always spawn a new thread.

它不认为这是标准强制的，但恕我直言，如果线程创建速度慢，我希望所有好的 C++11 实现都使用线程池。只有在创建新线程成本低廉的平台上，我才期望它们总是产生一个新线程。

Question 2: This is just what I think, but I have no facts to prove it. I may very well be mistaken. Is it an educated guess?

问题2：这只是我的想法，但我没有事实证明。我很可能误会了。这是一个有根据的猜测吗？

Finally, here I have provided some sample code that first shows how I think thread creation can be expressed by async(launch::async):

最后，我在这里提供了一些示例代码，首先展示了我认为线程创建可以通过以下方式表达async(launch::async)：

Example 1:

示例 1：

 thread t([]{ f(); });
 // ...
 t.join();

becomes

变成

 auto future = async(launch::async, []{ f(); });
 // ...
 future.wait();

Example 2: Fire and forget thread

示例 2：即发即弃线程

 thread([]{ f(); }).detach();

becomes

变成

 // a bit clumsy...
 auto dummy = async(launch::async, []{ f(); });

 // ... but I hope soon it can be simplified to
 async(launch::async, []{ f(); });

Question 3: Would you prefer the asyncversions to the threadversions?

问题 3：与async版本相比，您更喜欢thread版本吗？

The rest is no longer part of the question, but only for clarification:

其余的不再是问题的一部分，而只是为了澄清：

Why must the return value be assigned to a dummy variable?

为什么必须将返回值分配给虚拟变量？

Unfortunately, the current C++11 standard forces that you capture the return value of std::async, as otherwise the destructor is executed, which blocks until the action terminates. It is by some considered an error in the standard (e.g., by Herb Sutter).

不幸的是，当前的 C++11 标准强制您捕获的返回值std::async，否则会执行析构函数，它会阻塞直到操作终止。有人认为这是标准中的错误（例如，Herb Sutter）。

This example from cppreference.comillustrates it nicely:

这个来自cppreference.com 的例子很好地说明了它：

{
  std::async(std::launch::async, []{ f(); });
  std::async(std::launch::async, []{ g(); });  // does not run until f() completes
}

Another clarification:

另一个澄清：

I know that thread pools may have other legitimate uses but in this question I am only interested in the aspect of avoiding expensive thread creation costs.

我知道线程池可能有其他合法用途，但在这个问题中，我只对避免昂贵的线程创建成本方面感兴趣。

I think there are still situations where thread pools are very useful, especially if you need more control over resources. For example, a server might decide to handle only a fixed number of requests simultaneously to guarantee fast response times and to increase the predictability of memory usage. Thread pools should be fine, here.

我认为仍然存在线程池非常有用的情况，尤其是当您需要更多地控制资源时。例如，服务器可能决定只同时处理固定数量的请求，以保证快速响应时间并提高内存使用的可预测性。线程池应该没问题，在这里。

Thread-local variables may also be an argument for your own thread pools, but I'm not sure whether it is relevant in practice:

线程局部变量也可能是您自己的线程池的参数，但我不确定它在实践中是否相关：

Creating a new thread with std::threadstarts without initialized thread-local variables. Maybe this is not what you want.
In threads spawned by async, it is somewhat unclear for me because the thread could have been reused. From my understanding, thread-local variables are not guaranteed to be resetted, but I may be mistaken.
Using your own (fixed-size) thread pools, on the other hand, gives you full control if you really need it.

创建一个std::thread没有初始化线程局部变量的新线程。也许这不是你想要的。
在由产生async的线程中，我有点不清楚，因为该线程可能已被重用。根据我的理解，线程局部变量不能保证被重置，但我可能会误会。
另一方面，如果您真的需要，使用您自己的（固定大小的）线程池可以让您完全控制。

Answer 1

采纳答案by Omnifarious

Question 1:

问题 1：

I changed this from the original because the original was wrong. I was under the impression that Linux thread creation was very cheapand after testing I determined that the overhead of function call in a new thread vs. a normal one is enormous. The overhead for creating a thread to handle a function call is something like 10000 or more times slower than a plain function call. So, if you're issuing a lot of small function calls, a thread pool might be a good idea.

我从原来的改变了这个，因为原来是错误的。我的印象是Linux 线程创建非常便宜，经过测试，我确定新线程中函数调用的开销与普通线程相比是巨大的。创建线程来处理函数调用的开销大约比普通函数调用慢 10000 倍或更多倍。因此，如果您发出大量小函数调用，线程池可能是个好主意。

It's quite apparent that the standard C++ library that ships with g++ doesn't have thread pools. But I can definitely see a case for them. Even with the overhead of having to shove the call through some kind of inter-thread queue, it would likely be cheaper than starting up a new thread. And the standard allows this.

很明显，g++ 附带的标准 C++ 库没有线程池。但我绝对可以看到他们的案例。即使有必须通过某种线程间队列推送调用的开销，它也可能比启动新线程便宜。标准允许这样做。

IMHO, the Linux kernel people should work on making thread creation cheaper than it currently is. But, the standard C++ library should also consider using pool to implement launch::async | launch::deferred.

恕我直言，Linux 内核人员应该致力于使线程创建比目前更便宜。但是，标准的 C++ 库也应该考虑使用 pool 来实现launch::async | launch::deferred.

And the OP is correct, using ::std::threadto launch a thread of course forces the creation of a new thread instead of using one from a pool. So ::std::async(::std::launch::async, ...)is preferred.

并且 OP 是正确的，::std::thread用于启动一个线程当然会强制创建一个新线程，而不是使用池中的一个。所以::std::async(::std::launch::async, ...)是首选。

Question 2:

问题2：

Yes, basically this 'implicitly' launches a thread. But really, it's still quite obvious what's happening. So I don't really think the word implicitly is a particularly good word.

是的，基本上这“隐式”启动了一个线程。但实际上，正在发生的事情仍然很明显。所以我真的不认为隐含这个词是一个特别好的词。

I'm also not convinced that forcing you to wait for a return before destruction is necessarily an error. I don't know that you should be using the asynccall to create 'daemon' threads that aren't expected to return. And if they are expected to return, it's not OK to be ignoring exceptions.

我也不相信强迫你在销毁之前等待返回一定是一个错误。我不知道您是否应该使用async调用来创建预计不会返回的“守护进程”线程。如果他们预计会返回，忽略异常是不行的。

Question 3:

问题 3：

Personally, I like thread launches to be explicit. I place a lot of value on islands where you can guarantee serial access. Otherwise you end up with mutable state that you always have to be wrapping a mutex around somewhere and remembering to use it.

就个人而言，我喜欢线程启动是明确的。我非常重视可以保证串行访问的岛屿。否则你最终会得到可变状态，你总是必须在某处包裹一个互斥锁并记住使用它。

I liked the work queue model a whole lot better than the 'future' model because there are 'islands of serial' lying around so you can more effectively handle mutable state.

我比“未来”模型更喜欢工作队列模型，因为周围有“串行孤岛”，因此您可以更有效地处理可变状态。

But really, it depends on exactly what you're doing.

但实际上，这取决于你在做什么。

Performance Test

性能测试

So, I tested the performance of various methods of calling things and came up with these numbers on an 8 core (AMD Ryzen 7 2700X) system running Fedora 29 compiled with clang version 7.0.1 and libc++ (not libstdc++):

因此，我测试了各种调用方法的性能，并在运行 Fedora 29 的 8 核（AMD Ryzen 7 2700X）系统上得出了这些数字，该系统使用 clang 版本 7.0.1 和 libc++（不是 libstdc++）编译：

   Do nothing calls per second:   35365257                                      
        Empty calls per second:   35210682                                      
   New thread calls per second:      62356                                      
 Async launch calls per second:      68869                                      
Worker thread calls per second:     970415

And native, on my MacBook Pro 15" (Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz) with Apple LLVM version 10.0.0 (clang-1000.10.44.4)under OSX 10.13.6, I get this:

和本机，在我的 MacBook Pro 15" (Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz) 和Apple LLVM version 10.0.0 (clang-1000.10.44.4)OSX 10.13.6 下，我得到这个：

   Do nothing calls per second:   22078079
        Empty calls per second:   21847547
   New thread calls per second:      43326
 Async launch calls per second:      58684
Worker thread calls per second:    2053775

For the worker thread, I started up a thread, then used a lockless queue to send requests to another thread and then wait for a "It's done" reply to be sent back.

对于工作线程，我启动了一个线程，然后使用无锁队列将请求发送到另一个线程，然后等待发送回“完成”的回复。

The "Do nothing" is just to test the overhead of the test harness.

“什么都不做”只是为了测试测试工具的开销。

It's clear that the overhead of launching a thread is enormous. And even the worker thread with the inter-thread queue slows things down by a factor of 20 or so on Fedora 25 in a VM, and by about 8 on native OS X.

很明显，启动一个线程的开销是巨大的。甚至带有线程间队列的工作线程在 VM 中的 Fedora 25 上也会减慢 20 倍左右，在本机 OS X 上减慢大约 8 倍。

I created a Bitbucket project holding the code I used for the performance test. It can be found here: https://bitbucket.org/omnifarious/launch_thread_performance

我创建了一个 Bitbucket 项目，其中包含我用于性能测试的代码。可以在这里找到：https: //bitbucket.org/omnifarious/launch_thread_performance

C++11 中的 async(launch::async) 是否使线程池过时以避免昂贵的线程创建？

提问by Philipp Cla?en

采纳答案by Omnifarious

Performance Test

性能测试

相关推荐

最近更新

标签

C++11 中的 async(launch::async) 是否使线程池过时以避免昂贵的线程创建？

提问by Philipp Cla?en

采纳答案by Omnifarious

Performance Test

性能测试

相关推荐

C++ 如何从 Visual Studio 命令提示符以外的命令行运行 regasm.exe？

C++ 在 opencv 中使用 Mat::at(i,j) 作为二维 Mat 对象

C++ 应用程序崩溃说：访问冲突读取位置

C++ 在 CMake 中添加多个可执行文件

相关推荐

最近更新

标签