C++ 创建线程时有多少开销？

Question

提问by jdt141

I just reviewed some really terrible code - code that sends messages on a serial port by creating a new thread to package and assemble the message in a new thread for every single message sent. Yes, for every message a pthread is created, bits are properly set up, then the thread terminates. I haven't a clue why anyone would do such a thing, but it raises the question - how much overhead is there when actually creating a thread?

我刚刚回顾了一些非常糟糕的代码 - 通过创建一个新线程来在串行端口上发送消息的代码，以便为发送的每条消息在新线程中打包和组装消息。是的，对于创建 pthread 的每条消息，正确设置位，然后线程终止。我不知道为什么有人会做这样的事情，但它提出了一个问题 - 实际创建线程时有多少开销？

Answer 1

采纳答案by Tony Delroy

...sends Messages on a serial port ... for every message a pthread is created, bits are properly set up, then the thread terminates. ...how much overhead is there when actually creating a thread?

...在串行端口上发送消息 ...对于创建 pthread 的每条消息，正确设置位，然后线程终止。...实际创建线程时有多少开销？

This is highly system specific. For example, last time I used VMS threading was nightmarishly slow (been years, but from memory one thread could create something like 10 more per second (and if you kept that up for a few seconds without threads exiting you'd core)), whereas on Linux you can probably create thousands. If you want to know exactly, benchmark it on your system. But, it's not much use just knowing that without knowing more about the messages: whether they average 5 bytes or 100k, whether they're sent contiguously or the line idles in between, and what the latency requirements for the app are are all as relevant to the appropriateness of the code's thread use as any absolute measurement of thread creation overhead. And performance may not have needed to be the dominant design consideration.

这是高度系统特定的。例如，上次我使用 VMS 线程时速度非常慢（已经好几年了，但是从内存中，一个线程每秒可以创建 10 个以上的内容（如果你保持它几秒钟而不退出你的核心），而在 Linux 上，您可能可以创建数千个。如果您想确切知道，请在您的系统上对其进行基准测试。但是，只知道在不了解更多消息的情况下并没有多大用处：它们是平均 5 个字节还是 100k，它们是连续发送的还是中间的线路空闲，以及应用程序的延迟要求都是相关的代码线程使用的适当性作为线程创建开销的任何绝对度量。性能可能不需要成为主要的设计考虑因素。

Answer 2

回答by Nafnlaus

To resurrect this old thread, I just did some simple test code:

为了复活这个旧线程，我只是做了一些简单的测试代码：

#include <thread>

int main(int argc, char** argv)
{
  for (volatile int i = 0; i < 500000; i++)
    std::thread([](){}).detach();
  return 0;
}

I compiled it with g++ test.cpp -std=c++11 -lpthread -O3 -o test. I then ran it three times in a row on an old (kernel 2.6.18) heavily loaded (doing a database rebuild) slow laptop (Intel core i5-2540M). Results from three consecutive runs: 5.647s, 5.515s, and 5.561s. So we're looking at a tad over 10 microseconds per thread on this machine, probably much less on yours.

我用g++ test.cpp -std=c++11 -lpthread -O3 -o test. 然后我在旧的（内核 2.6.18）重载（重建数据库）慢速笔记本电脑（英特尔酷睿 i5-2540M）上连续运行了 3 次。三个连续运行的结果：5.647s、5.515s 和 5.561s。所以我们在这台机器上看到每个线程超过 10 微秒，在你的机器上可能要少得多。

That's not much overhead at all, given that serial ports max out at around 1 bit per 10 microseconds. Now, of course there's various additional thread losses one can get involving passed/captured arguments (although function calls themselves can impose some), cache slowdowns between cores (if multiple threads on different cores are battling over the same memory at the same time), etc. But in general I highly doubt the use case you presented will adversely impact performance at all (and could provide benefits, depending), despite having you already preemptively labeled the concept "really terrible code" without even knowing how much time it takes to launch a thread.

考虑到串行端口的最大输出速度为每 10 微秒 1 位左右，这根本不算什么开销。现在，当然还有各种额外的线程损失，涉及传递/捕获的参数（尽管函数调用本身可以强加一些），内核之间的缓存速度减慢（如果不同内核上的多个线程同时争夺同一内存），等等。但总的来说，我非常怀疑您提出的用例是否会对性能产生不利影响（并且可能会带来好处，具体取决于），尽管您已经预先将概念标记为“非常糟糕的代码”，甚至不知道需要多少时间启动一个线程。

Whether it's a good idea or not depends a lot on the details of your situation. What else is the calling thread responsible for? What precisely is involved in preparing and writing out the packets? How frequently are they written out (with what sort of distribution? uniform, clustered, etc...?) and what's their structure like? How many cores does the system have? Etc. Depending on the details, the optimal solution could be anywhere from "no threads at all" to "shared thread pool" to "thread for each packet".

这是否是一个好主意在很大程度上取决于您的具体情况。调用线程还负责什么？准备和写出数据包究竟涉及什么？它们多久被写出（以什么样的分布？均匀的、聚集的等等......？）它们的结构是什么样的？系统有多少核？等等。根据细节，最佳解决方案可以是从“根本没有线程”到“共享线程池”到“每个数据包的线程”的任何地方。

Note that thread pools aren't magic and can in some cases be a slowdown versus unique threads, since one of the biggest slowdowns with threads is synchronizing cached memory used by multiple threads at the same time, and thread pools by their very nature of having to look for and process updates from a different thread have to do this. So either your primary thread or child processing thread can get stuck having to wait if the processor isn't sure whether the other process has altered a section of memory. By contrast, in an ideal situation, a unique processing thread for a given task only has to share memory with its calling task once (when it's launched) and then they never interfere with each other again.

请注意，线程池并不神奇，在某些情况下可能会降低与唯一线程相比的速度，因为线程最大的减速之一是同步多个线程同时使用的缓存内存，而线程池的本质是具有要查找和处理来自不同线程的更新，必须执行此操作。因此，如果处理器不确定其他进程是否更改了一段内存，则您的主线程或子处理线程可能会卡住而不得不等待。相比之下，在理想情况下，给定任务的唯一处理线程只需与其调用任务共享一次内存（当它启动时），然后它们就再也不会相互干扰了。

Answer 3

回答by ubiquibacon

I have always been told that thread creation is cheap, especially when compared to the alternative of creating a process. If the program you are talking about does not have a lot of operations that need to run concurrently then threading might not be necessary, and judging by what you wrote this might well be the case. Some literature to back me up:

我一直被告知创建线程很便宜，尤其是与创建进程的替代方案相比时。如果您正在谈论的程序没有很多需要并发运行的操作，那么线程可能就没有必要了，从您编写的内容来看，这很可能就是这种情况。一些支持我的文献：

http://www.personal.kent.edu/~rmuhamma/OpSystems/Myos/threads.htm

Threads are cheap in the sense that
They only need a stack and storage for registers therefore, threads are cheap to create.
Threads use very little resources of an operating system in which they are working. That is, threads do not need new address space, global data, program code or operating system resources.
Context switching are fast when working with threads. The reason is that we only have to save and/or restore PC, SP and registers.

从某种意义上说，线程很便宜
它们只需要一个堆栈和寄存器存储空间，因此创建线程的成本很低。
线程使用它们工作的操作系统的很少资源。也就是说，线程不需要新的地址空间、全局数据、程序代码或操作系统资源。
使用线程时上下文切换很快。原因是我们只需要保存和/或恢复 PC、SP 和寄存器。

More of the same here.

更多相同的在这里。

In Operating System Concepts 8th Edition(page 155) the authors write about the benefits of threading:

在操作系统概念第 8 版（第 155 页）中，作者写了线程的好处：

Allocating memory and resources for process creation is costly.Because threads share the resource of the process to which they belong, it is more economical to create and context-switch threads. Empirically gauging the difference in overhead can be difficult, but in general it is much more time consuming to create and manage processes than threads. In Solaris, for example, creating a process is about thirty times slower than is creating a thread, and context switching is about five times slower.

为进程创建分配内存和资源的成本很高。由于线程共享其所属进程的资源，因此创建和上下文切换线程更经济。从经验上衡量开销的差异可能很困难，但总的来说，创建和管理进程比线程更耗时。例如，在 Solaris 中，创建进程比创建线程慢大约 30 倍，上下文切换大约慢 5 倍。

Answer 4

回答by Michael Goldshteyn

You definitely do not want to do this. Create a single thread or a pool of threads and just signal when messages are available. Upon receiving the signal, the thread can perform any necessary message processing.

你绝对不想这样做。创建单个线程或线程池，并在消息可用时发出信号。收到信号后，线程可以执行任何必要的消息处理。

In terms of overhead, thread creation/destruction, especially on Windows, is fairly expensive. Somewhere on the order of tens of microseconds, to be specific. It should, for the most part, only be done at the start/end of an app, with the possible exception of dynamically resized thread pools.

在开销方面，线程创建/销毁，尤其是在 Windows 上，是相当昂贵的。具体来说，大约是几十微秒。大多数情况下，它应该只在应用程序的开始/结束时完成，动态调整大小的线程池可能除外。

Answer 5

回答by ruslik

There is some overhead in thread creation, but comparing it with usually slow baud rates of the serial port (19200 bits/sec being the most common), it just doesn't matter.

线程创建有一些开销，但与通常较慢的串行端口波特率（19200 位/秒是最常见的）相比，这并不重要。

Answer 6

回答by ruslik

I used the above "terrible" design in a VOIP app I made. It worked very well ... absolutely no latency or missed/dropped packets for locally connected computers. Each time a data packet arrived in, a thread was created and handed that data to process it to the output devices. Of course the packets were large so it caused no bottleneck. Meanwhile the main thread could loop back to wait and receive another incoming packet.

我在我制作的 VOIP 应用程序中使用了上述“糟糕”的设计。它工作得很好......对于本地连接的计算机，绝对没有延迟或丢失/丢失数据包。每次数据包到达时，都会创建一个线程并将该数据交给输出设备进行处理。当然，数据包很大，因此不会造成瓶颈。同时主线程可以循环回等待并接收另一个传入的数据包。

I have tried other designs where the threads I need are created in advance but this creates it's own problems. First you need to design your code properly for threads to retrieve the incoming packets and process them in a deterministic fashion. If you use multiple (pre-allocated) threads it's possible that the packets may be processed 'out of order'. If you use a single (pre-allocated) thread to loop and pick up the incoming packets, there is a chance that thread might encounter a problem and terminate leaving no threads to process any data.

我已经尝试了其他设计，其中我需要的线程是预先创建的，但这会产生它自己的问题。首先，您需要为线程正确设计代码以检索传入的数据包并以确定性的方式处理它们。如果您使用多个（预先分配的）线程，则数据包可能会被“乱序”处理。如果您使用单个（预先分配的）线程来循环并获取传入的数据包，则线程可能会遇到问题并终止，从而没有线程处理任何数据。

Creating a thread to process each incoming data packet works very cleanly, especially on multi-core systems and where incoming packets are large. Also to answer your question more directly, the alternative to thread creation is to create a run-time process that manages the pre-allocated threads. Being able to synchronize data hand-off and processing as well as detecting errors may add just as much, if not more overhead as just simply creating a new thread. It all depends on your design and requirements.

创建一个线程来处理每个传入的数据包工作得非常干净，尤其是在多核系统和传入数据包很大的情况下。同样为了更直接地回答您的问题，创建线程的替代方法是创建一个管理预分配线程的运行时进程。能够同步数据切换和处理以及检测错误可能会增加与仅仅创建新线程一样多的开销，如果不是更多的话。这一切都取决于您的设计和要求。

Answer 7

回答by Lunar Mushrooms

For comparison , take a look of OSX: Link

为了进行比较，请查看 OSX：链接

Kernel data structures : Approximately 1 KB Stack space: 512 KB (secondary threads) : 8 MB (OS X main thread) , 1 MB (iOS main thread)
Creation time: Approximately 90 microseconds

内核数据结构：大约 1 KB 堆栈空间：512 KB（辅助线程）：8 MB（OS X 主线程），1 MB（iOS 主线程）
创建时间：约 90 微秒

The posix thread creation also should be around this (not a far away figure) I guess.

posix 线程的创建也应该围绕这个（不是一个遥远的数字）我猜。

Answer 8

回答by Mario The Spoon

Thread creation and computing in a thread is pretty expensive. All data strucutres need to be set up, the thread registered with the kernel and a thread switch must occur so that the new thread actually gets executed (in an unspecified and unpredictable time). Executing thread.start does not mean that the thread main function is called immediately. As the article (mentioned by typoking) points out creation of a thread is cheap only compared to the creation of a process. Overall, it is pretty expensive.

线程中的线程创建和计算非常昂贵。需要设置所有数据结构、向内核注册的线程和线程切换必须发生，以便新线程实际执行（在未指定和不可预测的时间）。执行 thread.start 并不意味着立即调用线程 main 函数。正如文章（通过拼写提到的）指出的，线程的创建仅比进程的创建便宜。总的来说，它是相当昂贵的。

I would never use a thread

我永远不会使用线程

for a short computation
a computation where I need the result in my flow of code (that means, I am starting the thread and wait for it to return the result of it's computation

一个简短的计算
在我的代码流中需要结果的计算（这意味着，我正在启动线程并等待它返回它的计算结果

In your example, it would make sense (as has already been pointed out) to create a thread that handles all of the serial communication and is eternal.

在您的示例中，创建一个处理所有串行通信并且是永恒的线程是有意义的（正如已经指出的那样）。

hth

第

Mario

马里奥

Answer 9

回答by R.. GitHub STOP HELPING ICE

On any sane implementation, the cost of thread creation should be proportional to the number of system calls it involves, and on the same order of magnitude as familiar system calls like openand read. Some casual measurements on my system showed pthread_createtaking about twice as much time as open("/dev/null", O_RDWR), which is very expensive relative to pure computation but very cheap relative to any IO or other operations which would involve switching between user and kernel space.

在任何合理的实现中，线程创建的成本应该与其涉及的系统调用数量成正比，并且与熟悉的系统调用（如open和）处于同一数量级read。在我的系统上的一些随意测量显示pthread_create花费的时间大约是的两倍open("/dev/null", O_RDWR)，这相对于纯计算来说非常昂贵，但相对于涉及用户空间和内核空间之间切换的任何 IO 或其他操作来说非常便宜。

C++ 创建线程时有多少开销？

提问by jdt141

采纳答案by Tony Delroy

回答by Nafnlaus

回答by ubiquibacon

回答by Michael Goldshteyn

回答by ruslik

回答by ruslik

回答by Lunar Mushrooms

回答by Mario The Spoon

回答by R.. GitHub STOP HELPING ICE

相关推荐

最近更新

标签

C++ 创建线程时有多少开销？

提问by jdt141

采纳答案by Tony Delroy

回答by Nafnlaus

回答by ubiquibacon

回答by Michael Goldshteyn

回答by ruslik

回答by ruslik

回答by Lunar Mushrooms

回答by Mario The Spoon

回答by R.. GitHub STOP HELPING ICE

相关推荐

C++ 缺少 vtable 通常意味着第一个非内联虚拟成员函数没有定义

C++ 无法从“std::string”转换为“LPSTR”

C++ VS2015 cl 在命令行上构建时找不到 CRT 库（stdio.h、ctype.h 等）

C++ Visual Studio 2015 中的编译器是什么

相关推荐

最近更新

标签